Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/Severity status #487

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bestwebua
Copy link

@bestwebua bestwebua commented May 22, 2023

Summary

This PR has been inspired by: #227, #412

  • Added ConditionResult.Severity
  • Added Result.Severity
  • Added Result.SeverityStatus
  • Added Result#determineSeverityStatus, tests
  • Added SeverityStatus type
  • Added Condition#sanitizeSeverityCondition, tests
  • Updated Condition#evaluate, tests
  • Updated Endpoint#EvaluateHealth, tests
  • Updated AlertProvider#buildRequestBody, tests
  • Updated storage adapters
  • Updated Endpoint component (vue)
  • Recompiled js assets
  • Updated readme

Screenshorts

Screenshot 2023-05-22 at 12 13 42 PM Screenshot 2023-05-22 at 12 13 50 PM Screenshot 2023-05-22 at 12 13 56 PM

Checklist

  • Tested and/or added tests to validate that the changes work as intended, if applicable.
  • Updated documentation in README.md, if applicable.

README.md Outdated Show resolved Hide resolved
@bestwebua bestwebua force-pushed the feature/serverity-status branch 3 times, most recently from 8f3471c to 97e431f Compare June 13, 2023 20:40
@bestwebua bestwebua requested a review from TwiN June 13, 2023 20:40
@bestwebua bestwebua force-pushed the feature/serverity-status branch 4 times, most recently from 5fe383a to 161bf83 Compare June 22, 2023 06:21
README.md Show resolved Hide resolved
@bestwebua
Copy link
Author

This PR covers "Support severity for PagerDuty" feature: #412

@Exagone313
Copy link
Contributor

Exagone313 commented Jul 15, 2023

I am interested in this feature so I will make a review and propose some changes.

Please note that I am not a maintainer and my views may different than TwiN's.

Copy link
Contributor

@Exagone313 Exagone313 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished my review, I still have some things to try out.

Here are additional things I couldn't add to my review (if you know how to prepare comments in a review, let me know how, I'm more used to GitLab on that):


issue: Can you rebase you PR? Both commits require a frontend rebuild (those are the only conflicts).


issue: The GitLab alerting provider was added recently, is it possible to add support for sending the severity on this provider? Currently, there is a alerting.gitlab.severity parameter, but it should be removed and replaced by the severity from this feature. (Need more input from @TwiN about this)

can be one of critical, high, medium, low, info, unknown


suggestion: It could be possible, for each alerting provider, to disable alerts if severity doesn't reach a minimal threshold (by default, the least critical severity). This can be done in a future issue.

I think I will work on this.


question/suggestion: I don't understand why both severity and severity_status exist in results.

I think the endpoint_result_conditions table should have a new column severity with the level used when success is false. Then the endpoint_results table can have a single severity column.

The Severity struct can then be removed.

I will try to make this improvement.


thought: I'm wondering if the type for a condition could be something else than a string (while keeping it possibly a string), e.g.:

conditions:
  - condition: "[RESPONSE_TIME] < 1s"
    severity: high
  - "[CERTIFICATE_EXPIRATION] > 72h" # critical severity (the default)

I don't know if Go permits this, so I will have to try.


thought(non-blocking) : It doesn't seem to be possible to add multiple conditions that consume the same variables, with different severity. The second time a variable is used, it returns 0. This can be addressed in another issue.

(It was an issue on my end)

type SeverityStatus int

// Severity status enums
const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names for severity levels were suggested previously and seem to be what GitLab uses for alert severity.

suggestion: What about using levels that are more often used by other software, such as PagerDuty (as used here in PagerDuty alerting) or Python (with logging levels):

  • critical
  • error (instead of high)
  • warning (instead of medium)
  • info (instead of low)

(Note that Python also has a debug level but it doesn't have any meaning here.)

_, _ = s.db.Exec(`
ALTER TABLE endpoint_results
ADD IF NOT EXISTS domain_expiration BIGINT NOT NULL DEFAULT 0,
ADD IF NOT EXISTS severity_status INTEGER NOT NULL DEFAULT 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I know this is also the case for domain_expiration, but with this command you end up with a schema that is different than from creating the table with the CREATE TABLE IF NOT EXISTS statement present above, since the rows don't have a DEFAULT attribute in that statement. It is also the case in sqlite. (cc @TwiN)


// Severity status enums
const (
None SeverityStatus = iota
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(non-blocking): The default severity (used for database migrations) could be named unknown instead of none.

_, _ = s.db.Exec(`ALTER TABLE endpoint_results ADD domain_expiration INTEGER NOT NULL DEFAULT 0`)
_, _ = s.db.Exec(`
ALTER TABLE endpoint_results ADD domain_expiration INTEGER NOT NULL DEFAULT 0;
ALTER TABLE endpoint_results ADD severity_status INTEGER NOT NULL DEFAULT 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: If the first ADD statement fails (and it is likely to), the second will not be run (Parse error: duplicate column domain_expiration).

// Returns separated SeverityStatus and condition
func (c Condition) sanitizeSeverityCondition() (SeverityStatus, string) {
severityStatus, complexCondition := Critical, string(c)
regex, _ := regexp.Compile(`(Low|Medium|High|Critical) :: (.+)`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The regexp could be case-insensitive.

Suggested change
regex, _ := regexp.Compile(`(Low|Medium|High|Critical) :: (.+)`)
regex, _ := regexp.Compile(`(?i)(Low|Medium|High|Critical) :: (.+)`)

@Exagone313
Copy link
Contributor

Exagone313 commented Jul 16, 2023

@bestwebua Can we work together? I didn't know you were active and I made most of changes from my review here: https://github.com/Exagone313/gatus/tree/severity-status

A notable change is that I simplified how the severity is computed from the results, there is a field severity on both condition results and results now.

I also made various improvements in the code (for example, the switch were needlessly making boolean comparisons).

I think commits could be squashed, as long as Co-Authored-By lines are added adequately.

@bestwebua
Copy link
Author

bestwebua commented Jul 16, 2023

@Exagone313 Of course I'm active. I have received your comments just today. Thanks, let it be. So what the next actions?

@Exagone313
Copy link
Contributor

So what the next actions?

Are you up for a pair programming session on Jitsi (or something like that)? I think it would be best to be able to share some code in real time.

@bestwebua
Copy link
Author

@Exagone313 Sorry, today I'm not able to pair session. But we can combine our commits in the different branch and make a review.

@Exagone313
Copy link
Contributor

In my branch there are still some issues or improvements to be made:

  • For now I only tested the volatile storage and database needs to be more thoroughly tested (on both SQLite and PostgreSQL).
  • I don't save severity for each condition results in the database (I think it is needed).
  • I haven't added yet support for alert severity level threshold.
  • I haven't added severity support for GitLab provider and other alerting providers that could support it.

* Added ConditionResult.Severity
* Added Result.Severity
* Added Result.SeverityStatus
* Added Result#determineSeverityStatus, tests
* Added SeverityStatus type
* Added Condition#sanitizeSeverityCondition, tests
* Updated Condition#evaluate, tests
* Updated Endpoint#EvaluateHealth, tests
* Updated storage adapters
* Updated Endpoint component (vue)
* Recompiled assets
* Updated readme
* Added critical severity, add result severity to pagerduty
@ser
Copy link

ser commented Apr 27, 2024

is there any chance of progress of this pull request or it's abandoned? This feature is essential for real world monitoring system.

@Exagone313
Copy link
Contributor

I was thinking about taking over the PR but I have been working on other things, I may tackle it again in a few weeks though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants