-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete metrics with policy label on policy delete #2256
Comments
Hello @lambdanis We are talking about the Policy_events_total metric. which does not have any Pod Delete Handler like this, registered in main. which makes sure that when a tetragon policy is deleted, corresponding metrics are also deleted. So, to resolve this, following things need to be done:
If I got it right, what is the deadline for resolution of this issue ? I would like to work on it. |
Hi @prateek041
Yes, maybe "policy_kind" and "policy_namespace" (empty for clusterwide) would be good.
Something similar. I think metrics deletion logic should be triggered from DeleteTracingPolicy in the sensors manager rather than from the k8s API watcher (WatchTracePolicy). That way stale metrics will be cleaned up also if policies are created via Tetragon API rather than via k8s CRD.
No deadline. I don't recall any reports of this being a big problem, so it's just a nice to have. Feel free to open a PR or if anything is unclear then post here. |
Sure, thanks for the pointers, I have started working on it. Please assign it to me. |
tetragon_policy_events_total
metric has apolicy
label. When a TracingPolicy is deleted, the corresponding metrics are still exposed by Tetragon, leading to an overhead (usually not big, as policies are not churned frequently, but might be significant in some cases).There was a similar problem with metrics that have a
pod
label. Now when a pod is deleted, Tetragon deletes its metrics from the registry (see https://github.com/cilium/tetragon/blob/main/pkg/metrics/metricwithpod.go).We should add a similar handler for policy delete events. One small gotcha is that at the moment metrics don't distinguish between clusterwide and namespaced policies, so we should first add another label to distinguish between different policy kinds.
The text was updated successfully, but these errors were encountered: