Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting frequent restarts for Prometheus-msteams[BUG] #267

Open
firoshaq opened this issue Oct 12, 2022 · 4 comments
Open

Getting frequent restarts for Prometheus-msteams[BUG] #267

firoshaq opened this issue Oct 12, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@firoshaq
Copy link

We are seeing the prometheus-msteams being restarted at times and it is showing OOM errors.

We have increased cpu and memory to below as well. Still, the pods are getting restarted.

resources:
  limits:
    cpu: 50m
    memory: 100Mi
  requests:
    cpu: 50m
    memory: 100Mi

Interestingly we don't see the pod hitting the limits anywhere.

kubectl top pod prometheus-msteams-58bcd967fc-pdpw9
NAME                                  CPU(cores)   MEMORY(bytes)
prometheus-msteams-58bcd967fc-pdpw9   6m           50Mi

Here is some of the event logs

13m         Warning   Unhealthy                pod/prometheus-msteams-58bcd967fc-pdpw9                                            Readiness probe failed: Get "http://IP:2000/config": dial tcp IP:2000: connect: connection refused
57m         Warning   Unhealthy                pod/prometheus-msteams-58bcd967fc-pdpw9                                            Liveness probe failed: Get "http://IP:2000/config": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
57m         Warning   Unhealthy                pod/prometheus-msteams-58bcd967fc-pdpw9                                            Readiness probe failed: Get "http://IP:2000/config": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
58m         Warning   Unhealthy                pod/prometheus-msteams-58bcd967fc-pdpw9                                            Liveness probe failed: Get "http://IP:2000/config": dial tcp IP:2000: connect: connection refused
58m         Warning   BackOff                  pod/prometheus-msteams-58bcd967fc-pdpw9                                            Back-off restarting failed container

Version
1.5.0

Expected behavior
A stable prometheus-msteams

@firoshaq firoshaq added the bug Something isn't working label Oct 12, 2022
@ckotzbauer
Copy link
Collaborator

Can you try to increase the CPU limits? 50m is not that much, maybe the probes are not answered in time.

@firoshaq
Copy link
Author

Sure we can do that, but what we have understood is that prometheus-msteams is a really lightweight Go Web Server that does only the api calls and it shouldn't be consuming that much.

Even we started with the default values mentioned here and then increased to the value mentioned above.

So we want to understand if there are any memory leakages happening that might lead to this issue.

Regards,
Firos Haq

@ckotzbauer
Copy link
Collaborator

The resource consumption depends on the load and your setup. In general it should be low, but as I said that depends on your specific setup. Therefore the default values from the chart are only a suggestion.

When the OOMs are now gone with the increased memory, then I don't see where there should be a massive memory leak.
Or are there OOMs after several days and the memory usage increases over time?

The probe-failures may be because of the low cpu limit, as I described above.

@byroncollins
Copy link

We had a similar problem with prometheus-msteams pods restarting occasionally and made adjustments to the container resources and haven't had an issue since.

We overrode default resources in our values.yml file to

resources:
  limits:
    cpu: 50m
    memory: 64Mi
  requests:
    cpu: 25m
    memory: 25Mi 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants