some easy heuristic to supress anomaly bit for noisy metrics #14993

andrewm4894 · 2023-05-02T12:43:13Z

andrewm4894
May 2, 2023

Sometimes we see individual metrics that for whatever reason "get stuck" in a bad state and with a bad model that just stays anomalous too consistently:

Its obvious that if you observe a really persistently high anomaly rate over an extended period of time for a dimension that 9 times out of 10 its just a symptom of a bad model or noisy model for that dim.

Without getting too fancy in terms of the ML (which would introduce more complexity), i'm wondering if there could be some rules or heuristics we could introduce as config options in the [ml] section of netdata.conf such that bad metrics like this would just have their anomaly bit supressed and/or be ignored by prediction until they get naturally retrained next.

Am thinking if there could be some sort of silencing layer in the ML that would turn off anomaly detection until next training once we observe enough to say that to metric is just subject to a poor model.

For example a very simple rule would be that if the anomaly rate in last 30 minutes is above 50% then silence:

[ml]
    # if a dimension has an anomaly rate above 50% in the last 30 minutes, supress it until next training
    dimension anomaly rate suppression window = 1800
    dimension anomaly rate suppression threshold = 0.5

Idea of above is a simple rule to just turn off (until next training where hopefully a better model may be trained eg based on more data etc) obviously noisy dimensions.

@vkalintiris @ktsaou fyi as think it might make sense to try think about this - should we build a process into the agent to just turn off obviously bad models based on observing anomaly rates themselves?

Ideally would like to start with something as simple as possible so that can be as easy to reason about and easy enough to implement too.

note: probably what we want is some notion of "firing rate" e.g. to control for if anomaly bit is just consistently going on/off (bad) vs clumping together (which could be valid and actually something we defo would not want to then supress) etc - but maybe the AR itself if a good proxy for this if we use a big enough window.

vkalintiris · 2023-05-03T14:11:27Z

vkalintiris
May 3, 2023
Collaborator

What is a "model" in this case? Is it the latest KMeans model, or the entire history of models we are maintaining for each dimension (ie. are we talking about suppressing models or dimensions)? If it's the later should there be any case where we are reactivating the model? (or should we drop such models entirely when they become the second most recent model of a dimension?).

1 reply

andrewm4894 May 3, 2023
Author

good question - model can be an individual trained kmeans model, or the more abstract (and probabilistic even) set of models applied to generate an individual anomaly bit. So can get messy here - that's why was thinking better for focus on the outcome itself, i.e. the series of anomaly bits generated, rather than focusing too much on exactly the low level detail of the model or set of models that generated each individual bit.

so thinking is - if we observe poor quality looking sequences of anomaly bits, we just deal with that in some sensible way, regardless of root causing very specifically what generated it as i think that a harder problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some easy heuristic to supress anomaly bit for noisy metrics #14993

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

some easy heuristic to supress anomaly bit for noisy metrics #14993

andrewm4894 May 2, 2023

Replies: 1 comment · 1 reply

vkalintiris May 3, 2023 Collaborator

andrewm4894 May 3, 2023 Author

andrewm4894
May 2, 2023

Replies: 1 comment 1 reply

vkalintiris
May 3, 2023
Collaborator

andrewm4894 May 3, 2023
Author