Resource Deletion Concerns #101

sebastianrosch · 2019-03-12T12:01:57Z

We're concerned with resources being deleted in production environments. We consider the current Faros approach a risk and therefore have introduced proposals that would mitigate this risk.

This issue summarises those concerns and proposals. I'm looking forward to your take on those and we're happy to work on those proposals.

Deleting GitTrack resource deletes all resources managed by Faros: Prevent deleting all deployed resources when GitTrack resources deleted #98
--cascade=false not working with Faros: Faros does not seem to respect --cascade=false when deleting GitTrack resource #99
Potential misconfiguration leading to all resources managed by Faros being deleted: Explicit DeleteStrategy to prevent deleting resources #100

Our previous issue #92 already hinted at some of those concerns, but addresses the wrong issue.

The text was updated successfully, but these errors were encountered:

sebastianrosch · 2019-03-12T12:14:47Z

As for the proposal to implement a Webhook Admission controller to prevent modification of the GitTrack resource mentioned in #92, we believe that this is only addressing a part of those concerns. Changes in the repository leading to resources being deleted cannot be prevented this way. Also, it would make intentional changes to the GitTrack harder.

sebastianrosch · 2019-03-18T16:34:59Z

Any other thoughts on this?

JoelSpeed · 2019-03-19T10:23:02Z

Hi @sebastianroesch, thanks for raising this issue.

This is something we have been thinking about internally for a few weeks since we had an incident in which the GitTrack managing our kube-system namespace on one of our test clusters was accidentally overwritten and, well... everything got deleted 🤦‍♂️

Forgive me if I have misunderstood, but #98 and #100 aim to solve the same problem as far as I can tell?

When designing Faros, we had intended for Faros to be the owner of each of the resources and therefore, if the GitTrack were to be deleted, the child resources should also be deleted. There are however, occasions where you may not want this to happen as you have raised.

I like the idea of #100 personally and think this would probably be the best way to go about solving this issue.

We could add the DeleteStrategy (I think I would maybe rename this to DeletionStrategy as it fits with fields like DeletionTimestamp, WDYT?) field to the GitTracks as you've described.

There are three cases to handle then:

cleanup (default): No change to the existing behaviour. OwnerReferences and the GarbageCollector should handle all deletion.
resource-strategy: The DeleteStrategy and a Finalizer would need to be propagated onto the GitTrackObjects and ClusterGitTrackObjects when they are created/updated. Then, when a GitTrack is deleted, the GarbageCollector will mark the GTO and CGTOs for deletion, the GitTrackObjectController would check for this on each reconcile and handle the deletion logic reading an annotation from the child resources data. If the option is to not delete, then the controller removes the OwnerReference from the child object and then removes it's Finalizer allowing the GTO/CGTO to be GC'ed.
never: Similar to the resource-strategy, except we just always cleanup the OwnerReference and Finalizer

What do you think of this plan? Does this make sense?

Unfortunately, there is a kind of prerequisite to this work being done. This would be a change in the API of the faros.pusher.com group, so we would need to create the v1beta1 group (something that has been a long time coming). Before we can do that we need to work out the migration path, I'm currently kind of waiting on kubernetes-sigs/controller-runtime#351 to make progress before tackling the problem of creating the new API group.

I'm semi tempted to make a release branch that we can continue to release on while we work on the migration, allowing us to start work on these breaking changes but leave the migration logic until later, would that work do you think?

tshak · 2019-03-19T11:02:56Z

I like DeletionStrategy as the name too.

I think the idea of separating #98 and #100 is just to iterate and get the simplest change possible into master as quickly as possible. For example, if it's simple to implement a DeletionStrategy with just cleanup and never we could start there. We could maybe reword them a bit once we agree on an approach.

Maybe I'm missing something regarding kubernetes-sigs/controller-runtime#351, but could we just make DeletionStrategy an additive feature and therefore not require us to change the API version? I was under the (possibly wrong) impression that k8s API versions are kind of like the Major version in SemVer which only require changing when the change is breaking.

I'd like to avoid branching further. I'm a big fan of keeping one code line as much as possible. I realize that this is tricky with larger changes as well as the async nature of OSS. I still think we should try and break this down into smaller changes if possible and keep them in master if we can.

JoelSpeed · 2019-03-19T12:00:15Z

We could maybe reword them a bit once we agree on an approach.

Naming is incredibly hard! Do we have any alternate suggestions for these names? Do they need changing? Are cleanup and never obvious as to what they should be doing? I'm gonna have a think on this 🤔

Maybe I'm missing something regarding kubernetes-sigs/controller-runtime#351, but could we just make DeletionStrategy an additive feature and therefore not require us to change the API version? I was under the (possibly wrong) impression that k8s API versions are kind of like the Major version in SemVer which only require changing when the change is breaking.

Yeah you're right here, I'm being a bit overly cautious! I was wondering what would happen if you start adding random fields to objects when they aren't in the OpenAPI schema, turns out, nothing! It just accepts them anyway. So yeah, we can totally just add this field to the existing API group, ignore what I said earlier 🙈

I'd like to avoid branching further. I'm a big fan of keeping one code line as much as possible. I realize that this is tricky with larger changes as well as the async nature of OSS. I still think we should try and break this down into smaller changes if possible and keep them in master if we can.

Yep, I agree completely, was just trying to think of a nice way to workaround the issue of waiting for the API group to bump but we don't need to so Ignore that.

Thanks for the sanity check btw @tshak !

sebastianrosch · 2019-03-19T12:07:43Z

Thanks @JoelSpeed for following up on this.

#98 is concerned with the cascade-deletion of all resources when the GitTrack object is deleted while #100 is concerned with deleting (or not deleting) resources that are no longer in Git. If I understood correctly, the proposed DeletionStrategy does not prevent unintended cascade-delete of resources linked to GitTrackObjects and GitTrack via ownerReference.

We could add the DeleteStrategy (I think I would maybe rename this to DeletionStrategy as it fits with fields like DeletionTimestamp, WDYT?) field to the GitTracks as you've described.

I like the name DeletionStrategy and the other proposed names.

The DeleteStrategy and a Finalizer would need to be propagated onto the GitTrackObjects and ClusterGitTrackObjects when they are created/updated. Then, when a GitTrack is deleted, the GarbageCollector will mark the GTO and CGTOs for deletion, the GitTrackObjectController would check for this on each reconcile and handle the deletion logic reading an annotation from the child resources data.

Maybe I'm not understanding correctly how Finalizers and GC works. The resource-strategy was intended as a way to be explicit about deletes. In cases where it's not acceptable that an empty folder or misconfigured GitTrack delete all resources, this strategy would only delete the GitTrackObjects of resources that have a specific annotation in Git. How does this relate to the GitTrack being deleted? If the GitTrack is gone, I assume Faros will stop creating, updating and deleting resources completely? Or does this handle a different case than I have in mind?

never: Similar to the resource-strategy, except we just always cleanup the OwnerReference and Finalizer

My impression is that this makes sense, except for cases where, as proposed, an ownerReference hasn't been created in the first place.

JoelSpeed · 2019-03-19T13:51:27Z

#98 is concerned with the cascade-deletion of all resources when the GitTrack object is deleted while #100 is concerned with deleting (or not deleting) resources that are no longer in Git.

So I think we can kill 2 birds with 1 stone here 😄

Maybe I'm not understanding correctly how Finalizers and GC works.

The reason I think we can Kill two birds with one stone is because of the Finalizers! So, the theory here is that we make our own cascade using the Finalizer and the DeletionStrategy, whether you --cascade=false or not, the Finalizer on the GTO/CGTOs prevents them from being deleted until the deletion logic is handled by Faros.

If cascade=true, then deleting the GitTrack will cause the GC to attempt to delete all GTO/CGTOs, the Finalizer blocks the delete until we handle removing OwnerReferences from the child objects meant to be kept. Once the OwnerReferences have been removed, remove the Finalizer allowing the GTO/CGTO to be deleted, this in turn would cause any resource owned by the GTO/CGTO to be deleted, but since we removed the ownership relation, this propagation of deletion is halted.

Does that make any more sense?

My impression is that this makes sense, except for cases where, as proposed, an ownerReference hasn't been created in the first place.

Where is this proposed? I don't think we should be modifying the existing behaviour of adding OwnerReferences to objects meant to be managed. Faros's design relies heavily on the OwnerReference relations to make lots of bits internally work (all the resource watching relies on OwnerReferences for example)

sebastianrosch · 2019-03-21T17:53:24Z

I finally found the time to process this, and I think it's a good proposal. This should address all our concerns. Thanks @JoelSpeed for the clarification.

sebastianrosch mentioned this issue Mar 12, 2019

Support deleting resources #92

Closed

sebastianrosch mentioned this issue Mar 12, 2019

Resources are not cleaned up when repository subpath is empty #102

Open

This was referenced Jun 17, 2019

Auth0 finalizers auth0/faros#6

Closed

Feedback: Added resource finalizers #144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource Deletion Concerns #101

Resource Deletion Concerns #101

sebastianrosch commented Mar 12, 2019

sebastianrosch commented Mar 12, 2019

sebastianrosch commented Mar 18, 2019

JoelSpeed commented Mar 19, 2019

tshak commented Mar 19, 2019

JoelSpeed commented Mar 19, 2019

sebastianrosch commented Mar 19, 2019

JoelSpeed commented Mar 19, 2019

sebastianrosch commented Mar 21, 2019

Resource Deletion Concerns #101

Resource Deletion Concerns #101

Comments

sebastianrosch commented Mar 12, 2019

sebastianrosch commented Mar 12, 2019

sebastianrosch commented Mar 18, 2019

JoelSpeed commented Mar 19, 2019

tshak commented Mar 19, 2019

JoelSpeed commented Mar 19, 2019

sebastianrosch commented Mar 19, 2019

JoelSpeed commented Mar 19, 2019

sebastianrosch commented Mar 21, 2019