-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ServiceMonitor using secrets that are created later #6018
Comments
Yeah! |
Hi @jbnjohnathan I created the Service Monitor before the secrets, and then I created secret by secret and the Prometheus Operator reconciles every secret added. As you can see here:
Prometheus Operator is watching any secret that is added, updated, or removed from the cluster and then adds the object in that case Prometheus (Server or Agent) to the reconcile queue again. You can test it by yourself and check the following expression:
In case you didn't see any change on the expression above, check if you have the I also noticed you're using a version 0.60 and the latest one is the 0.68. |
This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions. |
I am facing a similar problem where target is not active for a servicemonitor.
The target shown below error before it disappeared:
However the secret exists and in prometheus POD I can see above mentioned file which prometheusoperator is complaining about.
Below is the servicemonitor definition:
|
sh-4.4$ /usr/bin/operator -version |
Looks similar to #6309, I commented there for a possible solution: #6309 (comment) |
The cause of the issue is that the addition or update of a secret/configmap only triggers a reconciliation of Prometheus/PrometheusAgent objects living in the same namespace as the secret. But it doesn't reconcile when the secret/configmap's namespace is different from the Prometheus/PrometheusAgent namespace. prometheus-operator/pkg/prometheus/server/operator.go Lines 703 to 712 in ed3aede
The fix should be to call both Having said that, we should avoid thundering herd problem: reconcliing on every secret/configmap update could increase the number of operations significantly. The operator should keep an index of all secrets/configmaps being referenced by ServiceMonitors, PodMonitors, ... and only trigger the reconciliation if there's a match. |
What did you do?
I am deploying Elastic using the elastic operator.
When deploying the Elastic custom resource the operator creates elastic pods, as well as secrets that contain certificates and the password to the instance.
I want to scrape metrics from the Elastic instance, so I deploy a
serviceMonitor
to instruct the prometheus operator to do this.The
serviceMonitor
needs the username, password and CA certificate in order to connect to the elastic instance.This information is contained within secrets that is created by the elastic operator.
I am deploying both the elastic custom resource and the
ServiceMonitor
using helm.After applying the elastic custom resource it takes a little while for the operator to deploy the elastic pods and create the secrets containing the certificated and password for the instance.
This means that the
ServiceMonitor
is created before these secrets exists.When this happens prometheus disables the
ServiceMonitor
and does not re-try it.A possible solution to this is to deploy the secrets using helm, but empty. The operator will then update the secrets with the correct information later.
I did this, but got another error
Did you expect to see some different?
First I expected prometheus to re-try adding the
ServiceMonitor
even after the secret did not exist at the first attempt.This would be the best solution and would require no work-around on my part.
I then have a question. If I add empty secrets instead, will prometheus re-load the info from the secrets when the content changes? It looks like prometheus mounts the secret into itself as a file and reads it. When the secret updates it will not automatically reflect on the prometheus server as it needs to reload the secret. Is there a mechanism for this?
And does this mechanism work, even if it fails to load the certificate the first attempt because it is empty?
Environment
operator: 0.60.1
prometheus: 2.39.1
v1.25.4+a34b9e9
Kubernetes cluster kind:
OpenShift
Manifests:
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: