are queries executed in parallel? #508

bmrobin · 2024-05-09T14:43:50Z

bmrobin
May 9, 2024

our team has been using sql-exporter successfully for a long time. we've grown in the amount of queries (spread out across multiple collector.yaml files), and we are starting to see Prometheus "dropping" metrics. our scrape interval is 3 sec, but we believe that because we have so many queries now we are still running queries when Prometheus runs next.

my question is -- are the queries defined in the collectors executed sequentially or in parallel? and if we increase "max connections" are we able to execute queries in parallel?

(i know - one thing i haven't tried is increasing the scrape interval -- we are using this for operations / support monitoring so would prefer to have close to real-time)

Answered by bmrobin

May 15, 2024

notice the sql query was killed after 10sec, even though we were using scrapeTimeout: 30s

we figured out that it was this part

Prometheus informs targets of its own scrape timeout (via the "X-Prometheus-Scrape-Timeout-Seconds" request header)
so the actual timeout is computed as:
min(scrape_timeout, X-Prometheus-Scrape-Timeout-Seconds - scrape_timeout_offset)

we traced it down to the Prometheus ServiceMonitor that was doing the actual scraping and set the scrapeTimeout there -- like how you describe here

sql_exporter/helm/templates/servicemonitor.yaml

Line 29 in 881da5b

scrapeTimeout: {{ .Values.serviceMonitor.scrapeTimeout }}

View full answer

burningalchemist · 2024-05-10T09:01:55Z

burningalchemist
May 10, 2024
Maintainer

Hi @bmrobin! Yeah, each query has its own goroutine, so design-wise it's harder to make them sequential rather than parallel. I'm curious about the number of queries and/or general situation. To me it feels we might be pushing to the limits, but hey - challenge accepted. 😄

Dropping metrics seems strange, given the way the collection works:

all queries are executed (in parallel);
results then are passed back from goroutine to the metric channel, which is then passed to a function to apply labels, validate duplicates, etc;
invalid metrics are dropped, the good ones are passed to the final stage;
the processed metrics are flushed to a http response to be exposed and fetched by Prometheus.

Here are some random ideas to begin our investigation with:

enable debug logs - this might help identify potentially invalid metrics being dropped. It can be tricky with queries returning flaky results. Worth a try to check;
set GOMAXPROCS env variable [1]. If you run it inside of a container, this value might be detected incorrectly (at least this was an issue in the past), so you might want to explicitly set the value up or at least experiment;
Check if there are any bottlenecks which might be related to the max number of connections, or the OS-related metrics (e.g. cpu iowait, memory, etc);
It's potentially possible that the number of queries or their cost might be too high for database+sql_exporter chain to return within 3 sec (might be database or sql_exporter alone); I'd say, this can be checked if you do some benchmark for database including/excluding the exporter for the same number of queries.

Please try it out and let me know, I'm interested in your scenario to succeed. 👍

[1] GOMAXPROCS limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. By default it's set to the number of detected cores.

0 replies

bmrobin · 2024-05-10T19:48:56Z

bmrobin
May 10, 2024
Author

thanks for the helpful description about how the collection works under the hood!

enable debug logs

how do you do this? i didn't see any options in the collector template for that

i think one of the issues we may be facing is scrape timeouts, we're going to experiment with those settings a bit

2 replies

burningalchemist May 10, 2024
Maintainer

Yeah, it's a command line flag: --log.level=debug.

There is no environment variable to control this flag, yet. I'll include it in the future releases. 👍

burningalchemist May 10, 2024
Maintainer

As for scrape timeout (makes sense to have a look, indeed), please check this note:

sql_exporter/documentation/sql_exporter.yml

Line 3 in 296a854

# Scrape timeouts ensure that:

bmrobin · 2024-05-15T16:40:24Z

bmrobin
May 15, 2024
Author

notice the sql query was killed after 10sec, even though we were using scrapeTimeout: 30s

we figured out that it was this part

Prometheus informs targets of its own scrape timeout (via the "X-Prometheus-Scrape-Timeout-Seconds" request header)
so the actual timeout is computed as:
min(scrape_timeout, X-Prometheus-Scrape-Timeout-Seconds - scrape_timeout_offset)

we traced it down to the Prometheus ServiceMonitor that was doing the actual scraping and set the scrapeTimeout there -- like how you describe here

sql_exporter/helm/templates/servicemonitor.yaml

Line 29 in 881da5b

scrapeTimeout: {{ .Values.serviceMonitor.scrapeTimeout }}

1 reply

burningalchemist May 15, 2024
Maintainer

Thanks for sharing, glad you got it solved! 🙂👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

are queries executed in parallel? #508

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

are queries executed in parallel? #508

bmrobin May 9, 2024

Replies: 3 comments · 3 replies

burningalchemist May 10, 2024 Maintainer

bmrobin May 10, 2024 Author

burningalchemist May 10, 2024 Maintainer

burningalchemist May 10, 2024 Maintainer

bmrobin May 15, 2024 Author

burningalchemist May 15, 2024 Maintainer

bmrobin
May 9, 2024

Replies: 3 comments 3 replies

burningalchemist
May 10, 2024
Maintainer

bmrobin
May 10, 2024
Author

burningalchemist May 10, 2024
Maintainer

burningalchemist May 10, 2024
Maintainer

bmrobin
May 15, 2024
Author

burningalchemist May 15, 2024
Maintainer