Produce high-level telemetry #2012

Nealm03 · 2024-04-02T15:49:59Z

Feature description

One of the key features to understanding and debugging production code is identifying sources of issues, from errors to latency. It would be great if the "key" steps of Postgraphile's execution model could be exposed as out of the box metrics that could be switched on and ingested by popular metric stacks, eg: Open Telemetry / Prometheus.

Motivating example

We have noticed high latency in some requests - yet the database reports low utilisation and relatively quick response times. We'd like to identify where our bottleneck is.

Ideally there are some "significant events" that happen in the lifecycle of a request which we can measure and understand better. Perhaps:

Planning (internal vs plugins): exposing the relative latency custom plugins add to the system in the planning phase, as well as significant planning steps in the pipeline of processing a request. This would help engineers more easily track down any slowness incurred by custom functionality / or just better understanding the planning model and where usage patterns are not ideal.
Execution (I/O latency): exposing the async steps which reach out to the database / and perhaps custom resolver steps would be very helpful in identifying where things might be going slow. This would help engineers identify whether there's a misconfiguration with the connection pooling / or general networking overhead.
Response (might be considered part of the former): exposing response mapping and validation timing would be useful to correlate large requests. Anecdotally, response validation has often caused performance degradations in my experience and is largely symptomatic of a pathological requests.

Supporting development

am interested in building this feature myself
am interested in collaborating on building this feature
am willing to help testing this feature before it's released
am willing to write a test-driven test suite for this feature (before it exists)
am a Graphile sponsor ❤️
have an active support or consultancy contract with Graphile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce high-level telemetry #2012

Produce high-level telemetry #2012

Nealm03 commented Apr 2, 2024

Produce high-level telemetry #2012

Produce high-level telemetry #2012

Comments

Nealm03 commented Apr 2, 2024

Feature description

Motivating example

Supporting development