Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: Modify Drain algorithm for better patterns #12974

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

benclive
Copy link
Contributor

This PR is a draft: I'm not proposing to merge this unless we are happy with the approach

What this PR does / why we need it:

  • This is an experimental change to the Drain algorithm in an attempt to generate "better" patterns.
    • Seems to generate less generic patterns but also avoids very specific patterns so there is a nicer middleground.
  • It uses what I describe as "lazy preprocessing" to utilise the existing pre-processing code to categorise tokens on-the-fly.
  • Truncates all tokens to 50 characters to remove really long tokens (pod names etc.) and increase chance of matching.
  • It generates a much better parse tree as we end up with branches for different types (<HEX>, <NUM>, <IP>) while still allowing actual values if there is only one to show which aids readability. (e.g. if logs only contain log.go:48 we retain that instead of log.go:<NUM>)
  • Makes use of multiple tokenization strategies: "logfmt" tokenizer will attempt to parse logfmt logs, "adaptive" with use the tokenizer already in the repo which works better on unstructured/json logs.

Several caveats apply:

  • The code is not optimised and is generally more CPU intensive than before. Parse tree is flatter which might compensate for this a bit.
  • Picking which tokenizer to use isn't great - I can clean this up if this direction is promising.

Examples:
Old:

"<_> caller=http.go:194 level=debug <_> <_> msg=\"POST <_> <_> <_>"

New

"ts=<_> caller=http.go:194 level=debug traceID=<_> orgID=<_> msg=\"POST /push.v1.PusherService/Push (200) <_>\"",
"ts=<_> caller=http.go:194 level=debug traceID=<_> orgID=<_> msg=\"POST /ingest?aggregationType=sum&from=17146522271076410<_> (200) <_>\"",
"ts=<_> caller=http.go:194 level=debug traceID=<_> orgID=<_> msg=\"POST /push.v1.PusherService/Push (<_>) <_>\"",
"ts=<_> caller=http.go:194 level=debug traceID=<_> orgID=<_> msg=\"POST /pyroscope/ingest?aggregationType=sum&from=1714652<_> (200) <_>\"",
"ts=<_> caller=http.go:194 level=debug traceID=<_> orgID=<_> msg=\"POST /ingest?aggregationType=&from=1714652227232613927&<_> (200) <_>\"",
"ts=<_> caller=http.go:194 level=debug traceID=<_> orgID=<_> msg=\"POST /ingest?aggregationType=average&from=1714652227232<_> (200) <_>\""

This is available in image: grafana/enterprise-logs:custom-k201-21e25369a1bf-3e6ea71e3efd for testing further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant