Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted: Allow collector to receive logs via OpenTelemetry #544

Merged
merged 10 commits into from
May 15, 2024

Conversation

keiko713
Copy link
Contributor

@keiko713 keiko713 commented Apr 30, 2024

Successor of #503

This is mainly intended to support Kubernetes, by using fluentbit with the opentelemetry output plugin to send pod logs to the collector. The logs received can optionally be filtered by pod name, or via label selectors.

Note this requires the log output to be jsonlog (Postgres 15+), with optional K8s context added, or plain logs without any additional context.

When the collector is deployed to k8s using the helm chart, update service of the values.yaml to create: true so that the service of otel server will be created.

service:
  create: true
  name: pganalyze-collector-otel-service
  type: ClusterIP
  port: 4318
  targetPort: 4318

An example fluentbit output configuration:

  [OUTPUT]
    name  opentelemetry
    match kube.*postgres*
    host  pganalyze-collector-otel-service
    port  4318

With a corresponding collector configuration:

  db_log_otel_server = 0.0.0.0:4318

When utilizing CloudNativePG, a filter like the following can be used to filter only to the primary's logs for a given cluster:

  db_log_otel_k8s_labels = cnpg.io/cluster=cluster-example,cnpg.io/instanceRole=primary

lfittl and others added 5 commits April 30, 2024 15:04
This is mainly intended to support Kubernetes, by using fluentbit with the
opentelemetry output plugin to send pod logs to the collector. The logs
received can optionally be filtered by pod name, or via label selectors.

Note this requires the log output to be jsonlog (Postgres 15+), with
optional K8s context added, or plain logs without any additional context.

An example fluentbit output configuration:

  [OUTPUT]
    name  opentelemetry
    match kube.*postgres*
    host  127.0.0.1
    port  4318

With a corresponding collector configuration:

  db_log_otel_server = 127.0.0.1:4318

When utilizing CloudNativePG, a filter like the following can be used
to filter only to the primary's logs for a given cluster:

  db_log_otel_k8s_labels = cnpg.io/cluster=cluster-example,cnpg.io/instanceRole=primary
@keiko713
Copy link
Contributor Author

@keiko713 keiko713 marked this pull request as ready for review April 30, 2024 12:16
@keiko713 keiko713 requested a review from a team April 30, 2024 12:16
pganalyze-collector-setup
/pganalyze-collector
/pganalyze-collector-helper
/pganalyze-collector-setup
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing the contrib/helm/pganalyze-collector dir to be ignored. Adding / will enforce that only the root dir one is going to be ignored.

Copy link
Contributor

@msakrejda msakrejda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but looks good (with the caveat that I am not familiar with otel logs). Does this work with --test and --test-logs? If not, what's our plan there?

contrib/helm/pganalyze-collector/README.md Show resolved Hide resolved
return true
}
} else {
prefixedLogger.PrintWarning("Pod specification for OTel server not valid (need zero or one / separator): \"%s\", skipping log record\n", server.Config.LogOtelK8SPod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put this in config/read.go? It seems silly to warn about this for every log line. I think we could also consider erroring out on this at startup, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also SplitN this while parsing, and set values on Config that don't correspond to config settings directly (one for name, one for namespace)? I doubt it's a bottleneck here, but that might be easier to follow. Lines 102-114 could become

if server.Config.LogOtelK8SPod != "" {
  if server.Config.LogOtelK8SPodNamespace != "" && server.Config.LogOtelK8SPodNamespace != k8sNamespaceName {
    return true
  }
  if server.Config.LogOtelK8SPod != k8sPodName {
    return true
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea! I refactored this, and also added the test for the label/selector matching part as it was actually in TODO of the original PR. Ideally we want to use the k8s code for this I'd say, but that'll involve lots of pulling deps so I decided not to do that at least in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push up those changes? I don't see them in this branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I swear I did update this (well I did everything to prepare this, which is already pushed), but I don't see them too 🙃 Now I'm modifying, maybe I did in fact forget to update this part 🤦 Thanks for noticing! Update pushed 👍

return true
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I think it'd be better parse this into a structured form when reading the config, and make it easier to work with for consumers.

if detailLine != nil {
parsedLogStream <- state.ParsedLogStreamItem{Identifier: server.Config.Identifier, LogLine: *detailLine}
}
} else if logger == "" && hasErrorSeverity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we requiring error_severity? It might be good to add a comment here or above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This part is written by Lukas and actually don't have good context. This is about the following part:

Note this requires the log output to be jsonlog (Postgres 15+), with optional K8s context added, or plain logs without any additional context.

It is "plan logs without any additional context". I actually haven't tested this case. I think error_severity is somewhat "must be there" JSON key as a JSON format logs, therefore if that key exists, we can assume that it's a Postgres log? @lfittl does it sound right?
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-JSONLOG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I updated the comment assuming that it's correct, but happy to update)

@keiko713
Copy link
Contributor Author

keiko713 commented May 8, 2024

Thanks for the review, @msakrejda !

Does this work with --test and --test-logs? If not, what's our plan there?

No it does not work, and the plan is to support with the branch that I have in my local machine 😬 (aka planning to do that as a follow-up)

util/kubernetes.go Show resolved Hide resolved
var K8sSelectorRegexp = regexp.MustCompile(`\s*([^!=\s]+)\s*([!=]+)\s*([^\s]+)\s*`)

// CheckLabelSelectorMismatch checks if selectors do not match the given labels
func CheckLabelSelectorMismatch(labels map[string]string, selectors []string) bool {
Copy link
Contributor

@msakrejda msakrejda May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually called anywhere (other than the tests)? I didn't see any call sites.

It might also be clearer to invert the condition here (i.e., CheckLabelSelectorMatch or CheckLabelSelectorMatches or maybe just SelectorsMatchLabels) and then negate it in the caller.

Separately, since we're dealing with multiple selectors and multiple labels, it might be good to include the full expected behavior (i.e., if I'm reading this correctly, that every selector must match its corresponding label, if any) in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually called anywhere (other than the tests)? I didn't see any call sites.

No, CheckLabelSelectorMismatch is not called (though K8sSelectorRegexp is called, and that's the reason behind of making this util). I'd like to use the k8s code for this and refactor, so that I can retire this whole util. As this was a good chunk, I decided to keep them together (also makes easier to write tests focusing on this).

It might also be clearer to invert the condition here (i.e., CheckLabelSelectorMatch or CheckLabelSelectorMatches or maybe just SelectorsMatchLabels) and then negate it in the caller.

I thought about it, but the nice part of the current logic is that it can early return easily when something doesn't match (also logic is a bit simpler). Well, given neither labels and selectors won't be that long, maybe I don't need to worry about too much for the performance?

Separately, since we're dealing with multiple selectors and multiple labels, it might be good to include the full expected behavior (i.e., if I'm reading this correctly, that every selector must match its corresponding label, if any) in the comment.

I'll point to k8s label selector docs (that I linked in the other comment) 👍 FYI, the short answer to this is the following:

In the case of multiple requirements, all must be satisfied so the comma separator acts as a logical AND (&&) operator.

@keiko713 keiko713 merged commit 190c5e1 into main May 15, 2024
3 checks passed
@keiko713 keiko713 deleted the logs-otel-server branch May 15, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants