Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] A render query with many metrics takes a very long time (at least a minute) #823

Open
interfan7 opened this issue Feb 18, 2024 · 0 comments
Labels
performance performance related issues and questions

Comments

@interfan7
Copy link

Problem description
Query a.*b*.c.d.* times out after a minute.

carbonapi's version
v0.16.0

Does this happened before
irrelevant (I didn't have an earlier version)

carbonapi's config
Will provide if relevant.

backend software and config
Go-Carbon 0.17.3.
Config will be provided if needed.

carbonapi performance metrics

  1. Please enable expvars in the config and provide it's output
    I believe this is the required expvars config:
expvar:
  enabled: true
  pprofEnabled: true
  listen: ""

IDK what output this question refers to.

  1. What's system load on a server? What's the memory consumption?
    The top doesn't seem to indicate any kind of throttling. I believe the userspace CPU load is somewhat higher during the query but still very low.
    Mem - free -h output:
me@graphite-dev:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:           30Gi       3.1Gi       628Mi       1.0Mi        27Gi        27Gi
Swap:            0B          0B          0B

This is our dev host.

  1. If possible, please provide profiler's output (https://blog.golang.org/pprof for instruction how to do that, debug handlers can be enabled in carbonapi's config on a separate port). Golang allows different types of profiles. It's always better to give CPU Profile, Memory profile and Lock profile, but that depends on what's the symptoms.
    2 CPU profiles which are captured during the request are attached:
    curl_cpu_3
    curl_cpu_4

  2. Debug logs could be helpful as well
    Setting log-level to debug has immense implication on performance so for now I've set it to "info" level. The only 2 relevant log entries for this request (grepped a\.\*b\*\.c\.d\.\*) are:

"level":"ERROR","timestamp":"2024-02-18T13:51:21.454Z","logger":"access","message":"request failed","data":{"handler":"render","carbonapi_uuid":"d1118526-3647-4c52-99ea-49d19b549f74","url":"/render","peer_ip":"10.42.136.247","host":"{our graphite dev host}:4040","format":"json","use_cache":true,"targets":["a.*b*.c.d.*"],"cache_timeout":180,"runtime":59.89556336,"http_code":503,"reason":"timeout while fetching Response","from":1708263920,"until":1708264222,"from_raw":"1708263920","until_raw":"1708264222","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Grafana-Org-Id":"1","X-Panel-Id":"1"}}}

and some 10's MB's long entry which starts like this:

{"level":"WARN","timestamp":"2024-02-18T13:52:40.013Z","logger":"zipper","message":"errors occurred while getting results","type":"protoV2Group","name":"http://127.0.0.1:8080","type":"fetch","request":"&MultiFetchRequest{Metrics:[]FetchRequest{FetchRequest{Name:
{{  removed for brevity }}
,"carbonapi_uuid":"d1118526-3647-4c52-99ea-49d19b549f74","errors":"max tries exceeded","errorsVerbose":"max tries exceeded\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:25\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: failed to fetch data from server/group\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: error while fetching Response\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:34\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"

Query that causes problems

  1. GF queries metrics. It's render query. Query target is expressed by a.*b*.c.d.*.
  2. According to go-carbon logs, there are ~65.5K files (i.e. metrics) returned.
  3. resolution - I think it's irrelevant.
  4. datapoints amount - Also irrelevant to my mind?

Additional context
The carbonapi log reports status code of 503. The nginx reverse proxy indicates code 499 which is according to the internet means a client has halted the connection. Anyway, 1 minute feels too long anyway.

@interfan7 interfan7 added the performance performance related issues and questions label Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance performance related issues and questions
Projects
None yet
Development

No branches or pull requests

1 participant