Add alternate mechanism for parsing log lines #494
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Our current log line parsing mechanism relies on matching each log
line to one of a set of recognized prefix patterns, then pulling
relevant information out of the prefix "manually" according to their
regex groups.
This works, but is difficult to maintain and requires customers to use
one of our prefixes. Occasionally, we add prefixes (e.g., to support a
provider that does not allow changing log_line_prefix), but this is
error-prone and requires the customer to wait for a collector release.
This patch takes a different approach: we inspect the log_line_prefix,
generate a regular expression to match lines coming with that specific
prefix pattern, and use that to pull relevant metadata out of the log
stream.
Currently, this patch is a proof of concept, with only
ParseLogLineWithPrefix implemented. However, it passes all the tests
of our existing ParseLogLineWithPrefix function. Given it uses a
single purpose-built regex, it can also be more lenient with patterns
for specific fields in a match (e.g., roles with spaces like "Admin
User").
TODO:
ParseLogLineWithPrefix
ParseLogLineWithPrefix
collector setting to have an escape hatch in case of issues
set that to 'auto' (the new behavior), 'legacy' (the current behavior)
or a literal prefix (the new behavior but prefix can't be read from
settings for whatever reason)
ParseLogLineWithPrefix2, but that's not ideal)
allegedly-supported prefixes do not have tests in the old code,
so it's not clear if the new code supports them—they should just
work, but it'd be good to verify this.)
confirm)
I'm not sure about the best path forward for these. I think we could
store log_line_prefix along with the LogTimezone we are already
storing on state.Server. If we follow the same pattern, it would be
easy to refresh it on every full snapshot. We may want to keep the
regex on state.Server as well, so we can recompile it as necessary (if
the prefix changes) but otherwise leave it alone. We'd need to keep
the matchers on the state.Server object as well.
Support for calling both the old and new code may be unnecessary, but
since this is a rewrite of a fundamental piece of collector code, it's
probably wise.
I'd love to hear any thoughts or concerns.