Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refacto: write more accurate descriptions for faster troubleshooting #85

Open
samber opened this issue Mar 8, 2020 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@samber
Copy link
Owner

samber commented Mar 8, 2020

Example:

From:

- name: Prometheus rule evaluation failures
  description: 'Prometheus encountered {{ $value }} rule evaluation failures.'
  query: 'increase(prometheus_rule_evaluation_failures_total[3m]) > 0'

To:

- name: Prometheus rule evaluation failures
  description: 'Prometheus encountered {{ $value }} rule evaluation failures, leading to potentially ignored alerts.'
  query: 'increase(prometheus_rule_evaluation_failures_total[3m]) > 0'

An effect field would enable us to improve alert template.

@samber samber added the enhancement New feature or request label Mar 8, 2020
@robert-will-brown
Copy link
Contributor

I would welcome an effect field. I've solved this locally by including an effect and it's very helpful to reduce the size of the description when only that's required (on a status board) but include specific resolutions in slack messages for example.

e.g.
Screenshot 2020-04-30 at 14 22 15

@samber
Copy link
Owner Author

samber commented May 3, 2020

We can probably find a balance between:

  • Description/cause
  • Effects
  • Resolution guidelines

Gitlab infrastructure team adds a reference to a troubleshooting markdown.

See:

@paulfantom
Copy link

Effects

This is what alert name should be about as this is the first thing operator sees when receives alert. Additionally, this could be enhanced by summary annotation field.

Description/cause

In prometheus community this is usually done with either message field (for example in kubernetes-monitoring/kubernetes-mixin project or with description field (example in node-mixin project).

Resolution guidelines

This is basically a runbook/SOP. For example kubernetes-mixin project includes those as runbook_url as a field in alert annotations.

Such runbooks are located in one file, and links are made to specific anchors.

This field is usually the most problematic one, as creating a runbook needs a deep knowledge of the system itself.


Essentially those are problems already solved by the prometheus community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants