-
our team has been using sql-exporter successfully for a long time. we've grown in the amount of queries (spread out across multiple collector.yaml files), and we are starting to see Prometheus "dropping" metrics. our scrape interval is 3 sec, but we believe that because we have so many queries now we are still running queries when Prometheus runs next. my question is -- are the queries defined in the collectors executed sequentially or in parallel? and if we increase "max connections" are we able to execute queries in parallel? (i know - one thing i haven't tried is increasing the scrape interval -- we are using this for operations / support monitoring so would prefer to have close to real-time) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Hi @bmrobin! Yeah, each query has its own goroutine, so design-wise it's harder to make them sequential rather than parallel. I'm curious about the number of queries and/or general situation. To me it feels we might be pushing to the limits, but hey - challenge accepted. 😄 Dropping metrics seems strange, given the way the collection works:
Here are some random ideas to begin our investigation with:
Please try it out and let me know, I'm interested in your scenario to succeed. 👍 [1] GOMAXPROCS limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. By default it's set to the number of detected cores. |
Beta Was this translation helpful? Give feedback.
-
thanks for the helpful description about how the collection works under the hood!
how do you do this? i didn't see any options in the collector template for that i think one of the issues we may be facing is scrape timeouts, we're going to experiment with those settings a bit |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
notice the sql query was killed after 10sec, even though we were using
scrapeTimeout: 30s
we figured out that it was this part
we traced it down to the Prometheus
ServiceMonitor
that was doing the actual scraping and set thescrapeTimeout
there -- like how you describe heresql_exporter/helm/templates/servicemonitor.yaml
Line 29 in 881da5b