Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silencing alerts in alertmanager should be ignored in kured #499

Open
codestalkerr opened this issue Jan 18, 2022 · 18 comments
Open

Silencing alerts in alertmanager should be ignored in kured #499

codestalkerr opened this issue Jan 18, 2022 · 18 comments
Assignees
Labels
enhancement good first issue help wanted keep This won't be closed by the stale bot.

Comments

@codestalkerr
Copy link

It would be nice to have this set up where we can silence some alerts in alert manager and then those alerts should be ignored in Kured.
It would be instant and help to handle random alerts also don't have to wait for the code to be deployed for it to reboot.

@ckotzbauer
Copy link
Member

Main challenge is, that Prometheus is not aware of silences which are made in Alertmanager. To make this work we would also have to integrate Alertmanager in kured for checks.

@github-actions
Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@justinrush
Copy link

re-opening this one - it would be helpful

@ckotzbauer
Copy link
Member

@codestalkerr @justinrush Can you give some more information about what would be needed here and how this should behave?
I think we need to integrate the Alertmanager-API (https://github.com/prometheus/alertmanager/blob/main/api/v2/openapi.yaml)

@ckotzbauer ckotzbauer reopened this Aug 13, 2022
@ckotzbauer ckotzbauer added enhancement help wanted good first issue keep This won't be closed by the stale bot. and removed no-issue-activity labels Aug 13, 2022
@justinrush
Copy link

thinking through this more, I think we want something more like this: #385, but more generic. Ideally we can provide an arbitrary promQL query and if it has data, then it means hold off on the reboot - if its empty, it means good to go.

I can create a new issue for this if it seems like something that would be acceptable to add.

@ckotzbauer
Copy link
Member

Okay. Yes, please create a new issue for this 👍

@codestalkerr
Copy link
Author

@ckotzbauer My thinking behind this was integrating with Alert manager coz there are times where we silence few alerts in alert manager for reasons and if Kured could also ignore those at the same time then it would have been smooth but now we create a PR to add it, so its ignored and then again to remove it when we remove silence based on situations.
By using with Alertmanager silencing it would be pretty quick and no need to edit stuff in Kured separately and maintain.

On side note, do we have any filter to add specific alert to block on (opposite of ignoring alert filter)? Asking this coz we have many alerts to ignore and would be nice to just block on the ones we want :)

@ckotzbauer
Copy link
Member

@codestalkerr That pretty much sounds like a negative-lookahead of regexp. Would that be an option? Golag doesn't support them, but that would be a solvable problem.

@ckotzbauer
Copy link
Member

@justinrush Would this also solve your use-case?

@justinrush
Copy link

Maybe? but we don't always silence in alert manager - sometimes we'll just modify the label that routes the alert to dev/null rather than a person. But i guess if we can get the label out of the alert in alertmanager and then negative regex on it, that would work?

@codestalkerr
Copy link
Author

I see scenarios are different here and getting the label out and negative regex could work but I think it will again come down to modifying the yaml file and committing changes which I was trying to avoid. We have git ops approach and if we modify manually then the next deploys will override and maintaining that will be crazy. But feels like its a specific scenario for me maybe?

@ckotzbauer
Copy link
Member

Why it is not possible to consolidate the ways to remove/mark alerts? Are they too different to catch them with one regex which has not to be changed every time?

@codestalkerr
Copy link
Author

Yeah so we have many different alerts and we have put that in one regex which is a huge one liner separated by or. So let's say we have some temporary issue which we expect it to stay for few hours or a day/two then we need to update that list right also we silence in alertmanager. Good thing about alert manager is that we can temporarily silence in the UI without doing any code changes and then we commit removing or adding the alert to apply to kured.

@atighineanu
Copy link
Contributor

Hi. I can have a look.
One question though: in my understanding, kured would need to send requests to the alert manager, correct? (or is prometheus aware of any silencers ???)

@ckotzbauer
Copy link
Member

Hi @atighineanu,
thanks for your interest. Kured would need to query the Alertmanager-API https://github.com/prometheus/alertmanager/blob/main/api/v2/openapi.yaml to get silences, Prometheus is not aware of them.
However, I think it might not be the best idea to use the prometheus/alertmanager project as go-module here, as we would reference all alertmanager dependencies as indirects then. So maybe just do a HTTP-Call or there's another slim Alertmanager Go-Client out there.

@atighineanu
Copy link
Contributor

I've created a draft, but I need more input from you regarding kured itself. Is it okay to create several more flags?
See #873 and the comment there.

@atighineanu
Copy link
Contributor

@ckotzbauer, @dholbach any input?

@ckotzbauer
Copy link
Member

I'll have a look in a few days or next week @atighineanu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue help wanted keep This won't be closed by the stale bot.
Projects
None yet
Development

No branches or pull requests

4 participants