Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Resource Deletion Concerns #101

Open
sebastianrosch opened this issue Mar 12, 2019 · 8 comments
Open

Resource Deletion Concerns #101

sebastianrosch opened this issue Mar 12, 2019 · 8 comments

Comments

@sebastianrosch
Copy link
Contributor

We're concerned with resources being deleted in production environments. We consider the current Faros approach a risk and therefore have introduced proposals that would mitigate this risk.

This issue summarises those concerns and proposals. I'm looking forward to your take on those and we're happy to work on those proposals.

  1. Deleting GitTrack resource deletes all resources managed by Faros: Prevent deleting all deployed resources when GitTrack resources deleted #98
  2. --cascade=false not working with Faros: Faros does not seem to respect --cascade=false when deleting GitTrack resource #99
  3. Potential misconfiguration leading to all resources managed by Faros being deleted: Explicit DeleteStrategy to prevent deleting resources #100

Our previous issue #92 already hinted at some of those concerns, but addresses the wrong issue.

@sebastianrosch
Copy link
Contributor Author

As for the proposal to implement a Webhook Admission controller to prevent modification of the GitTrack resource mentioned in #92, we believe that this is only addressing a part of those concerns. Changes in the repository leading to resources being deleted cannot be prevented this way. Also, it would make intentional changes to the GitTrack harder.

@sebastianrosch
Copy link
Contributor Author

Any other thoughts on this?

@JoelSpeed
Copy link
Contributor

Hi @sebastianroesch, thanks for raising this issue.

This is something we have been thinking about internally for a few weeks since we had an incident in which the GitTrack managing our kube-system namespace on one of our test clusters was accidentally overwritten and, well... everything got deleted 🤦‍♂️

Forgive me if I have misunderstood, but #98 and #100 aim to solve the same problem as far as I can tell?

When designing Faros, we had intended for Faros to be the owner of each of the resources and therefore, if the GitTrack were to be deleted, the child resources should also be deleted. There are however, occasions where you may not want this to happen as you have raised.

I like the idea of #100 personally and think this would probably be the best way to go about solving this issue.

We could add the DeleteStrategy (I think I would maybe rename this to DeletionStrategy as it fits with fields like DeletionTimestamp, WDYT?) field to the GitTracks as you've described.

There are three cases to handle then:

  • cleanup (default): No change to the existing behaviour. OwnerReferences and the GarbageCollector should handle all deletion.
  • resource-strategy: The DeleteStrategy and a Finalizer would need to be propagated onto the GitTrackObjects and ClusterGitTrackObjects when they are created/updated. Then, when a GitTrack is deleted, the GarbageCollector will mark the GTO and CGTOs for deletion, the GitTrackObjectController would check for this on each reconcile and handle the deletion logic reading an annotation from the child resources data. If the option is to not delete, then the controller removes the OwnerReference from the child object and then removes it's Finalizer allowing the GTO/CGTO to be GC'ed.
  • never: Similar to the resource-strategy, except we just always cleanup the OwnerReference and Finalizer

What do you think of this plan? Does this make sense?

Unfortunately, there is a kind of prerequisite to this work being done. This would be a change in the API of the faros.pusher.com group, so we would need to create the v1beta1 group (something that has been a long time coming). Before we can do that we need to work out the migration path, I'm currently kind of waiting on kubernetes-sigs/controller-runtime#351 to make progress before tackling the problem of creating the new API group.

I'm semi tempted to make a release branch that we can continue to release on while we work on the migration, allowing us to start work on these breaking changes but leave the migration logic until later, would that work do you think?

@tshak
Copy link
Contributor

tshak commented Mar 19, 2019

I like DeletionStrategy as the name too.

I think the idea of separating #98 and #100 is just to iterate and get the simplest change possible into master as quickly as possible. For example, if it's simple to implement a DeletionStrategy with just cleanup and never we could start there. We could maybe reword them a bit once we agree on an approach.

Maybe I'm missing something regarding kubernetes-sigs/controller-runtime#351, but could we just make DeletionStrategy an additive feature and therefore not require us to change the API version? I was under the (possibly wrong) impression that k8s API versions are kind of like the Major version in SemVer which only require changing when the change is breaking.

I'd like to avoid branching further. I'm a big fan of keeping one code line as much as possible. I realize that this is tricky with larger changes as well as the async nature of OSS. I still think we should try and break this down into smaller changes if possible and keep them in master if we can.

@JoelSpeed
Copy link
Contributor

We could maybe reword them a bit once we agree on an approach.

Naming is incredibly hard! Do we have any alternate suggestions for these names? Do they need changing? Are cleanup and never obvious as to what they should be doing? I'm gonna have a think on this 🤔

Maybe I'm missing something regarding kubernetes-sigs/controller-runtime#351, but could we just make DeletionStrategy an additive feature and therefore not require us to change the API version? I was under the (possibly wrong) impression that k8s API versions are kind of like the Major version in SemVer which only require changing when the change is breaking.

Yeah you're right here, I'm being a bit overly cautious! I was wondering what would happen if you start adding random fields to objects when they aren't in the OpenAPI schema, turns out, nothing! It just accepts them anyway. So yeah, we can totally just add this field to the existing API group, ignore what I said earlier 🙈

I'd like to avoid branching further. I'm a big fan of keeping one code line as much as possible. I realize that this is tricky with larger changes as well as the async nature of OSS. I still think we should try and break this down into smaller changes if possible and keep them in master if we can.

Yep, I agree completely, was just trying to think of a nice way to workaround the issue of waiting for the API group to bump but we don't need to so Ignore that.

Thanks for the sanity check btw @tshak !

@sebastianrosch
Copy link
Contributor Author

Thanks @JoelSpeed for following up on this.

#98 is concerned with the cascade-deletion of all resources when the GitTrack object is deleted while #100 is concerned with deleting (or not deleting) resources that are no longer in Git. If I understood correctly, the proposed DeletionStrategy does not prevent unintended cascade-delete of resources linked to GitTrackObjects and GitTrack via ownerReference.

We could add the DeleteStrategy (I think I would maybe rename this to DeletionStrategy as it fits with fields like DeletionTimestamp, WDYT?) field to the GitTracks as you've described.

I like the name DeletionStrategy and the other proposed names.

The DeleteStrategy and a Finalizer would need to be propagated onto the GitTrackObjects and ClusterGitTrackObjects when they are created/updated. Then, when a GitTrack is deleted, the GarbageCollector will mark the GTO and CGTOs for deletion, the GitTrackObjectController would check for this on each reconcile and handle the deletion logic reading an annotation from the child resources data.

Maybe I'm not understanding correctly how Finalizers and GC works. The resource-strategy was intended as a way to be explicit about deletes. In cases where it's not acceptable that an empty folder or misconfigured GitTrack delete all resources, this strategy would only delete the GitTrackObjects of resources that have a specific annotation in Git. How does this relate to the GitTrack being deleted? If the GitTrack is gone, I assume Faros will stop creating, updating and deleting resources completely? Or does this handle a different case than I have in mind?

never: Similar to the resource-strategy, except we just always cleanup the OwnerReference and Finalizer

My impression is that this makes sense, except for cases where, as proposed, an ownerReference hasn't been created in the first place.

@JoelSpeed
Copy link
Contributor

#98 is concerned with the cascade-deletion of all resources when the GitTrack object is deleted while #100 is concerned with deleting (or not deleting) resources that are no longer in Git.

So I think we can kill 2 birds with 1 stone here 😄

Maybe I'm not understanding correctly how Finalizers and GC works.

The reason I think we can Kill two birds with one stone is because of the Finalizers! So, the theory here is that we make our own cascade using the Finalizer and the DeletionStrategy, whether you --cascade=false or not, the Finalizer on the GTO/CGTOs prevents them from being deleted until the deletion logic is handled by Faros.

If cascade=true, then deleting the GitTrack will cause the GC to attempt to delete all GTO/CGTOs, the Finalizer blocks the delete until we handle removing OwnerReferences from the child objects meant to be kept. Once the OwnerReferences have been removed, remove the Finalizer allowing the GTO/CGTO to be deleted, this in turn would cause any resource owned by the GTO/CGTO to be deleted, but since we removed the ownership relation, this propagation of deletion is halted.

Does that make any more sense?

My impression is that this makes sense, except for cases where, as proposed, an ownerReference hasn't been created in the first place.

Where is this proposed? I don't think we should be modifying the existing behaviour of adding OwnerReferences to objects meant to be managed. Faros's design relies heavily on the OwnerReference relations to make lots of bits internally work (all the resource watching relies on OwnerReferences for example)

@sebastianrosch
Copy link
Contributor Author

I finally found the time to process this, and I think it's a good proposal. This should address all our concerns. Thanks @JoelSpeed for the clarification.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants