Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

All resources are deleted on etcd leader loss #143

Open
wimdec opened this issue Jun 7, 2019 · 7 comments
Open

All resources are deleted on etcd leader loss #143

wimdec opened this issue Jun 7, 2019 · 7 comments
Labels
bug Something isn't working Project

Comments

@wimdec
Copy link
Contributor

wimdec commented Jun 7, 2019

We had a couple of incident that when the etcd leader node is restated that the Kubernetes garbage collector is deleting all Faros managed resources.

After analysis, root cause is probably the following:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents

Note: Cross-namespace owner references is disallowed by design. This means: 
1) Namespace-scoped dependents can only specify owners in the same namespace, and owners that are cluster-scoped. 
2) Cluster-scoped dependents can only specify cluster-scoped owners, but not namespace-scoped owners.

https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L311

Currently GitTrack is namespace-scoped. This means that all ClusterGitTrackObject and GitTrackObject in other namespace than GitTrack have an illegal ownerreference currently.

To solve this, GitTrack should become cluster-scoped.

Details:

  • Kubernetes version: 1.11.9
  • kops version: 1.11.1
  • etcd version: 3.3.10
  • HA cluster with 3 masters
  • single GitTrack in faros-system namespace
  • lot's of resources in cluster scope and different namespace

Trigger:

  • terminate leader etcd VM in AWS console

After some time, following logs will appear in the kube-controller-manager:

I0607 09:17:18.175766       1 controller_utils.go:1032] Caches are synced for garbage collector controller
I0607 09:17:18.175785       1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I0607 09:17:18.188106       1 controller_utils.go:1032] Caches are synced for garbage collector controller
I0607 09:17:18.188124       1 garbagecollector.go:245] synced garbage collector
I0607 09:17:18.188147       1 garbagecollector.go:408] processing item [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 68d5610b-8240-11e9-bd80-12537198d31e]
I0607 09:17:19.192572       1 garbagecollector.go:521] delete object [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 68d5610b-8240-11e9-bd80-12537198d31e] with propagation policy Background
...

Only GitTrackObject in same namespace as GitTrack are not deleted.

@wimdec wimdec changed the title All resources are deleted on etcd master leader All resources are deleted on etcd master leader loss Jun 7, 2019
@wimdec wimdec changed the title All resources are deleted on etcd master leader loss All resources are deleted on etcd leader loss Jun 7, 2019
@wimdec
Copy link
Contributor Author

wimdec commented Jun 7, 2019

I enabled more detailed logging on the kube-controller-manager.
In normal operation, the GC will map the owner based on the UID if the owner is already in the GC graph. Since the GitTrack is created before all GitTrackObject, the GitTrack is present and the correct link will be made in the GC graph.
While on etcd leader loss, the garbage collector also typically changes from node and starts from empty graph. When starting from empty graph it handles the GitTrackObjects before the GitTrack and notices that the owner does not exist yet in the graph by UID and also cannot find the owner using API server call in same namespace. Hence the GitTrackObject can be deleted.

I0607 13:27:24.137199       1 request.go:530] Throttling request took 900.610093ms, request: GET:https://127.0.0.1/apis/faros.pusher.com/v1alpha1/namespaces/platform-system/gittracks/platform-k8s-gitops-cluster
I0607 13:27:24.140526       1 garbagecollector.go:354] object 8099b1cf-8927-11e9-b0d2-0acea4dc0106's owner faros.pusher.com/v1alpha1/GitTrack, platform-k8s-gitops-cluster is not found
I0607 13:27:26.190371       1 garbagecollector.go:333] according to the absentOwnerCache, object 844bc631-8927-11e9-b0d2-0acea4dc0106's owner faros.pusher.com/v1alpha1/GitTrack, platform-k8s-gitops-cluster does not exist
I0607 13:27:26.190402       1 garbagecollector.go:459] classify references of [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 844bc631-8927-11e9-b0d2-0acea4dc0106].
solid: []v1.OwnerReference(nil)
dangling: []v1.OwnerReference{v1.OwnerReference{APIVersion:"faros.pusher.com/v1alpha1", Kind:"GitTrack", Name:"platform-k8s-gitops-cluster", UID:"998a3811-823d-11e9-bd80-12537198d31e", Controller:(*bool)(0xc425c67090), BlockOwnerDeletion:(*bool)(0xc425c67091)}}
waitingForDependentsDeletion: []v1.OwnerReference(nil)
I0607 13:27:26.190454       1 garbagecollector.go:521] delete object [faros.pusher.com/v1alpha1/GitTrackObject, namespace: platform-system, name: serviceaccount-kube-state-metrics, uid: 844bc631-8927-11e9-b0d2-0acea4dc0106] with propagation policy Background

(GitTrack is in namespace faros-system while it searches in platform-system)

@JoelSpeed
Copy link
Contributor

This is a fairly major architectural problem and I'm sorry we haven't worked this out sooner.

There will need to be some fairly major changes to Faros if we want to fix this.

There are two approaches that I can see going forward, though I'm not sure which approach is likely to be best, I've been thinking about these for the last couple of days and I think they would both work but please do pick holes

Approach 1 - Namespaced and cluster separate

With this approach, we would create a new CRD called a ClusterGitTrack. ClusterGitTracks would own cluster scoped resources only and if the subPath points to any namespaced object, this would end up as an ignored file.

We would then also change the GitTrack controller to ignore cluster scoped resources.

This approach would then mean that you can use a GitTrack and ClusterGitTrack point them at the same folder and each would ignore the namespaced or non-namespaced resources respectively.

We would then also have to disable the cross namespace mode in Faros and insist that you run 1 instance per namespace plus 1 additional instance to manage cluster scoped resources.

Approach 2 - Proxies

This approach would use two similar CRD ClusterGitTrackOwner and GitTrackOwner which would reference the GitTrack name and namespace it lives in.

GTO and CGTO owner references would point to the ClusterGitTrackOwner and GitTrackOwner CRDs rather than the GitTrack. The GitTrack controller would then watch for children of the Owners to cause reconcile events to happen but not actually own the (Cluster)GitTrackOwners.

The GT controller would then need to add a finalizer to the GT so that upon deletion, it can delete all of the (Cluster)GitTrackOwners as the GC won't know there is a relationship between them.

This approach would allow us to retain the 1 instance per cluster model.

@sebastianrosch
Copy link
Contributor

sebastianrosch commented Jun 19, 2019

Thanks for looking into this.

I'll add my thoughts, @wimdec might have more.

Approach 1:

  • Currently, our resources in the Git repository are structured by namespace, with cluster-scoped resources being on the root level. This would lead to Faros also scanning all namespaced resources in subfolders every time and having to ignore them. We would have to reorganize this structure to prevent this.
  • With a lot of namespaces you would have to manage a lot of GitTracks while this could be automated in Approach 2.

Approach 2:

  • So far I like this approach better as it keeps the operational simplicity but is a little bit more complicated under the hood.

What are your thoughts on making GitTrack cluster-scoped, as suggested by @wimdec ?

@wonderhoss
Copy link
Contributor

One problem with simply making GitTrack cluster-scoped is that it eliminates the separation of concerns between namespace owners.

Currently RBAC allows restricting users to creating and editing GitTrack only in one namespace (or a set of namespaces). They are then not able to interfere with GitTrack in other namespaces.

If they were cluster-scoped it would be quite easy to accidentally break payloads in namespaces owned by other teams simply by applying or editing an incorrect GitTrack and I would be somewhat reluctant to go down that route.

@JoelSpeed
Copy link
Contributor

@sebastianroesch Just going to add my responses to your comments

Currently, our resources in the Git repository are structured by namespace, with cluster-scoped resources being on the root level. This would lead to Faros also scanning all namespaced resources in subfolders every time and having to ignore them. We would have to reorganize this structure to prevent this.

We currently have a lot of cluster scoped resources in folders with namespaced files too, so yeah, probably will also need a bit of a restructure, or to just live with it ignoring these files

With a lot of namespaces you would have to manage a lot of GitTracks while this could be automated in Approach 2.

It was always our intention that a GitTrack should own resources in the same namespace as itself, so much so, I'd consider it a bug that I can create a GitTrack in kube-system that would manage a resource in kube-public for instance. I think for the short term we are going to fix that and make a GitTrack ignore a resource if it is not in the same namespace as itself.

What are your thoughts on making GitTrack cluster-scoped, as suggested by @wimdec ?

I'm with @gargath on this, I think this really reduces the usability of Faros for self service multi tenant clusters

@wimdec
Copy link
Contributor Author

wimdec commented Jun 21, 2019

I would propose the following:

  • When running Faros in namespaced mode (--namespace) option, it uses a namespaced GitTrack object. And only namespaced resources in the same namespace are allowed. Cluster scoped resources or resources from a different namespace in subPath are ignored. This is the best way to secure multi tenant clusters. And avoids issues with the garbage collector.
  • When running Faros without namespace option, it only watches for new ClusterGitTrack resource. In this mode, all resources are possible.

This allows running a global Faros instance for cluster operator using ClusterGitTrack. And every team can run Faros in own namespace without security risk. The global Faros instance will no longer watch the namespaced GitTrack resources so that global and namespaced Faros instances do not conflict. Of course, it is also possible to add a flag to watch for this: only ClusterGitTrack or both ClusterGitTrack and GitTrack. So that users can choose.

These changes are of course not backwards compatible. So thought needs to be given on upgrade path. One option is to give the new restricted namespaced GitTrack a new apiVersion: v1alpha2. v1alpha2 then means that the GitTrack is will ignore all resources not in same namespace.

@JoelSpeed
Copy link
Contributor

We have a proposal for fixing this which the Pusher team are going to start implementing during this week. The Garbage collector has been historically broken which allowed this problem to go unnoticed for a large part and only manifest itself in certain scenarios. I believe this has recently been fixed and when trying to upgrade to 1.15.3, our test cluster kinda exploded, so this is now very very urgent to fix for us

https://paper.dropbox.com/doc/Introducing-ClusterGitTracks--AixMM37BMMhcugDznYFKsuD8AQ-XslbpUtYZeUTkigSSV4SS

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working Project
Projects
Development

Successfully merging a pull request may close this issue.

4 participants