feat: implement predicate_query to validate which are the metrics to be collected #4503

gabrosys · 2024-05-09T13:57:32Z

What

Implement a validation logic to verify if a query to collect metrics should be executed or not.

ref: [Feature]: For each metric to be collected define a mechanism to check if the query needs to be executed or not #4499

How

Define a property predicate_query.
Define a method isCollectable that is invoked by collect passing the same tx (to reduce the performance consumption) that returns a bool.
Add a new e2e test, with related files, to check the scenarios of predicate_query.

Test

Execute the e2e-test-local with FEATURE_TYPE= observability and TEST_DEPTH=3.

Open points - Questions

Do you think there is a need to add any specific log when a query is not executed due to the false response of the predicate_query? I want to avoid bloating PG with this kind of log so this is why currently it is not present.
isCollectable is intended to be invoked only within the collect method, maybe I could define it as anonymous method inside the collect?
We can simplify a lot the logic in the isCollectable method if we assume strong and tight rules regarding the output of the predicate_query, what do you think about that?
Do I have to update the doc within this PR or, when approved, I will create another PR to document what is and it is working the predicate_query?

Thanks at disposal for any question 🙇 🚀

github-actions · 2024-05-09T13:58:33Z

❗ By default, the pull request is configured to backport to all release branches.

To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

armru · 2024-05-09T14:09:35Z

pkg/management/postgres/metrics/collector.go

@@ -378,6 +387,66 @@ func (c QueryCollector) collect(conn *sql.DB, ch chan<- prometheus.Metric) error
 return nil
 }

+// isCollectable checks if a query to collect metrics can be executed or not.


Is there any reason why we are accepting multiple types?
In my opinion, we should only evaluate booleans, and it should be the responsibility of the query to return the intended type

I agree with Armando. It's better to rely on booleans to do that.
True means it's collectible, false means it's not, and NULL means we don't know.
And if we don't know, perhaps it's better not to collect it.

So to summarize we want to support ONLY bools so that even numbers are not tolerated. Also, do we want to force the developers to deliver a query that returns ONLY one column? If so we can simplify the whole logic with a single QueryRow and a Scan that accepts a bool variable.

Yes, I think evaluating a bool should be enough

Update the comment and the method accordingly. Now assume that the type is a bool if not we return false and log a warning.

Personally I disagree and would prefer to see it accept both a 0-row result and a null as a false result. I can see the argument both ways though, so what you say goes.

The updated code handles:

bool checking the first row first column.

no rows, returning false.

null value, returning false.

pkg/management/postgres/metrics/collector.go

leonardoce · 2024-05-09T14:13:30Z

Thank you, @gabrosys, for this! Welcome to the CNPG community!

pkg/management/postgres/metrics/collector.go

pkg/management/postgres/metrics/parser.go

ringerc · 2024-05-09T21:29:05Z

Do you think there is a need to add any specific log when a query is not executed due to the false response of the predicate_query? I want to avoid bloating PG with this kind of log so this is why currently it is not present.

No, it definitely should not log, or not at above debug level. That's part of the point, so unwanted log spam can be avoided when the target server doesn't support a specific view, column, function etc.

In general there is IMO a need to improve how CNP's metrics scraper diagnostics work and make it easier to understand its behaviour + detect errors. That's why I wrote a little tool to run a one-shot scrape against a pre-configured postgres recently. I'll see if I can tidy that up for a community contribution. But logging isn't IMO the answer to that. I already have enough problems with it quietly logging a complaint and happily continuing with some confusing behaviour.

It might make sense to emit metrics as scrape output to report on skips, but they should probably be optional, and I don't personally think they're necessary as part of the basic feature. They could easily bloat scrape results and add unwanted scrape cardinality, requiring more filtering by scrape result consumers to discard the expected ones.

We can simplify a lot the logic in the isCollectable method if we assume strong and tight rules regarding the output of the predicate_query, what do you think about that?

The problem with that is that the user has to reliably assume the query has the expected results under all circumstances. Part of the point of this is to make the CNP scraper robust in the face of differing server versions, extension installs, vendor flavours, etc etc.

OTOH if the query does something unexpected, a clear error log might be better than just silently returning false and not running the query. IDK. Right now it's hard to properly test CNP scraper configs across a wide matrix of possible server configs, so I prefer to be error-tolerant, but I can see the argument for strictness too.

I'd personally prefer it to accept a 0-row result as false, but it's not hard to wrap that in a SELECT EXISTS (...) if you want to require a strict boolean result.

mnencia · 2024-05-17T17:20:26Z

/test tl=4 l=local

github-actions · 2024-05-17T17:21:06Z

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131921366

ringerc · 2024-05-20T22:22:21Z

@mnencia What can we do to progress this PR?

ringerc · 2024-05-21T09:37:01Z

@gabrosys I see a test failure

[FAIL] Metrics [It] can gather metrics according with predicate query [observability]
/home/runner/work/cloudnative-pg/cloudnative-pg/tests/e2e/asserts_test.go:2926

https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131921366/job/25112487661#step:9:5716

It failed in some of the other runs too, but passed in others.

Details https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131921366/job/25112490237#step:9:2344

   Timeline >>
  Putting Tail on the operator log
  STEP: verifying the custom metrics ConfigMaps and Secrets exist @ 05/17/24 18:30:22.086
  STEP: setting up curl client pod @ 05/17/24 18:30:22.245
  STEP: having a predicate-query-metrics-e2e-9171 namespace @ 05/17/24 18:30:27.285
  STEP: creating a Cluster in the predicate-query-metrics-e2e-9171 namespace @ 05/17/24 18:30:27.291
  STEP: having a Cluster postgresql-metrics with each instance in status ready @ 05/17/24 18:30:27.335
  Cluster ready, took 1m12.298617472s
  STEP: ensuring metrics with positive predicate are collected @ 05/17/24 18:31:39.665
  STEP: checking metrics for pod: postgresql-metrics-1 @ 05/17/24 18:31:39.672
  [FAILED] in [It] - /home/runner/work/cloudnative-pg/cloudnative-pg/tests/e2e/asserts_test.go:2926 @ 05/17/24 18:31:40.062
  DUMPING tailed Operator Logs with error/warning (at most 10 lines ). Failed Spec: Metrics can gather metrics according with predicate query

  ================================================================================
  -- no error / warning logs --
  ================================================================================
  << Timeline

  [FAILED] Found no match for metric cnpg_pg_predicate_query_return_true_and_multiple_rows_fixed

  Priting rawMetricsOutput:
...
  cnpg_errors_total{errorUserQueries="pg_predicate_query_return_true_and_multiple_rows on db app: ERROR: subquery in FROM must have an alias (SQLSTATE 42601)"} 1
...
  # HELP cnpg_pg_predicate_query_return_true_and_multiple_columns_fixed Always 42, used to test predicate_query
  # TYPE cnpg_pg_predicate_query_return_true_and_multiple_columns_fixed gauge
  cnpg_pg_predicate_query_return_true_and_multiple_columns_fixed 42
  # HELP cnpg_pg_predicate_query_return_true_and_multiple_columns_multiple_rows_fixed Always 42, used to test predicate_query
  # TYPE cnpg_pg_predicate_query_return_true_and_multiple_columns_multiple_rows_fixed gauge
  cnpg_pg_predicate_query_return_true_and_multiple_columns_multiple_rows_fixed 42
  # HELP cnpg_pg_predicate_query_return_true_fixed Always 42, used to test predicate_query
  # TYPE cnpg_pg_predicate_query_return_true_fixed gauge
  cnpg_pg_predicate_query_return_true_fixed 42

so it looks like an issue with the spec for pg_predicate_query_return_true_and_multiple_rows

ringerc

Identified query bug causing test failure

tests/e2e/fixtures/metrics/custom-queries-with-predicate-query.yaml

jsilvela

couldn't isCollectable be substantially simplified?

pkg/management/postgres/metrics/collector.go

gabrosys · 2024-05-21T17:08:32Z

Update the PR as requested. Now we accept only predicate_query with at most one row and with a single bool column.
In detail, we handle:

true and false
no row as false
null as false

@jsilvela ready to be reviewed, thanks 🙇

jsilvela

nice work, Gabriele

armru · 2024-05-27T13:43:42Z

/test limit=local

github-actions · 2024-05-27T13:43:53Z

@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9255900078

armru · 2024-05-28T15:14:30Z

/test limit=local

github-actions · 2024-05-28T15:14:50Z

@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9271964382

armru · 2024-05-29T09:06:52Z

Hello @gabrosys could you please integrate the User defined metrics doc section with the new field and explain how to use it?

docs/src/monitoring.md

gabrosys · 2024-05-29T13:44:52Z

Update the doc with a new paragraph with a simple example and add the new prop to the prop list. Thank you all for your support 🙇

…cs to be collected Signed-off-by: Gabriele <[email protected]>

…te_query Signed-off-by: Gabriele <[email protected]>

Signed-off-by: Gabriele <[email protected]>

Signed-off-by: Jaime Silvela <[email protected]>

Signed-off-by: Armando Ruocco <[email protected]>

Signed-off-by: Gabriele <[email protected]>

gabrosys requested a review from a team as a code owner May 9, 2024 13:57

github-actions bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.21 release-1.22 release-1.23 labels May 9, 2024

armru reviewed May 9, 2024

View reviewed changes

leonardoce reviewed May 9, 2024

View reviewed changes

pkg/management/postgres/metrics/collector.go Outdated Show resolved Hide resolved

leonardoce changed the title ~~feat(4499): implement predicate_query to validate which are the metrics to be collected~~ feat: implement predicate_query to validate which are the metrics to be collected May 9, 2024

mnencia reviewed May 9, 2024

View reviewed changes

pkg/management/postgres/metrics/collector.go Outdated Show resolved Hide resolved

armru reviewed May 9, 2024

View reviewed changes

pkg/management/postgres/metrics/parser.go Show resolved Hide resolved

mnencia force-pushed the dev/4499 branch from 65c84fc to ba2fcc3 Compare May 17, 2024 17:19

mnencia requested review from jsilvela, NiccoloFei and litaocdl as code owners May 17, 2024 17:19

ringerc reviewed May 21, 2024

View reviewed changes

tests/e2e/fixtures/metrics/custom-queries-with-predicate-query.yaml Outdated Show resolved Hide resolved

jsilvela reviewed May 21, 2024

View reviewed changes

pkg/management/postgres/metrics/collector.go Outdated Show resolved Hide resolved

pkg/management/postgres/metrics/collector.go Outdated Show resolved Hide resolved

jsilvela approved these changes May 22, 2024

View reviewed changes

gabrosys force-pushed the dev/4499 branch from 6fa1a7c to f31259c Compare May 26, 2024 20:57

armru force-pushed the dev/4499 branch 2 times, most recently from d527164 to 8bb9d0b Compare May 27, 2024 13:43

github-actions bot added the ok to merge 👌 This PR can be merged label May 28, 2024

armru removed backport-requested ◀️ This pull request should be backported to all supported releases release-1.21 release-1.22 release-1.23 labels May 29, 2024

armru force-pushed the dev/4499 branch from 8bb9d0b to f09a53e Compare May 29, 2024 06:42

armru approved these changes May 29, 2024

View reviewed changes

armru self-requested a review May 29, 2024 09:07

gabrosys commented May 29, 2024

View reviewed changes

docs/src/monitoring.md Show resolved Hide resolved

gabrosys and others added 9 commits May 29, 2024 16:19

feat(4499): implement predicate_query to validate which are the metri…

1257893

…cs to be collected Signed-off-by: Gabriele <[email protected]>

fix(4499): import order and allow to return bool type only in predica…

003f2eb

…te_query Signed-off-by: Gabriele <[email protected]>

test(4499): fix multiline predicate query e2e test

80ddc1b

Signed-off-by: Gabriele <[email protected]>

fix(4499): accept only one row with one bool column

5bed23f

Signed-off-by: Gabriele <[email protected]>

chore: review, make the tests and signatures more explicit

834ec22

Signed-off-by: Jaime Silvela <[email protected]>

chore: review

2305e9b

Signed-off-by: Armando Ruocco <[email protected]>

fix(4499): user doc

b9c721e

Signed-off-by: Gabriele <[email protected]>

fix(4499): user doc

66bf187

Signed-off-by: Gabriele <[email protected]>

fix(4499): user doc

f7956ae

Signed-off-by: Gabriele <[email protected]>

armru force-pushed the dev/4499 branch from 4acbb8a to f7956ae Compare May 29, 2024 14:19

armru merged commit c7d241c into cloudnative-pg:main May 29, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement predicate_query to validate which are the metrics to be collected #4503

feat: implement predicate_query to validate which are the metrics to be collected #4503

gabrosys commented May 9, 2024 •

edited

github-actions bot commented May 9, 2024

armru May 9, 2024

leonardoce May 9, 2024

gabrosys May 9, 2024

armru May 9, 2024

gabrosys May 9, 2024

ringerc May 9, 2024

gabrosys May 13, 2024

leonardoce commented May 9, 2024

ringerc commented May 9, 2024

mnencia commented May 17, 2024

github-actions bot commented May 17, 2024

ringerc commented May 20, 2024

ringerc commented May 21, 2024

ringerc left a comment

jsilvela left a comment

gabrosys commented May 21, 2024

jsilvela left a comment

armru commented May 27, 2024

github-actions bot commented May 27, 2024

armru commented May 28, 2024

github-actions bot commented May 28, 2024

armru commented May 29, 2024

gabrosys commented May 29, 2024

feat: implement predicate_query to validate which are the metrics to be collected #4503

feat: implement predicate_query to validate which are the metrics to be collected #4503

Conversation

gabrosys commented May 9, 2024 • edited

What

How

Test

Open points - Questions

github-actions bot commented May 9, 2024

armru May 9, 2024

Choose a reason for hiding this comment

leonardoce May 9, 2024

Choose a reason for hiding this comment

gabrosys May 9, 2024

Choose a reason for hiding this comment

armru May 9, 2024

Choose a reason for hiding this comment

gabrosys May 9, 2024

Choose a reason for hiding this comment

ringerc May 9, 2024

Choose a reason for hiding this comment

gabrosys May 13, 2024

Choose a reason for hiding this comment

leonardoce commented May 9, 2024

ringerc commented May 9, 2024

mnencia commented May 17, 2024

github-actions bot commented May 17, 2024

ringerc commented May 20, 2024

ringerc commented May 21, 2024

ringerc left a comment

Choose a reason for hiding this comment

jsilvela left a comment

Choose a reason for hiding this comment

gabrosys commented May 21, 2024

jsilvela left a comment

Choose a reason for hiding this comment

armru commented May 27, 2024

github-actions bot commented May 27, 2024

armru commented May 28, 2024

github-actions bot commented May 28, 2024

armru commented May 29, 2024

gabrosys commented May 29, 2024

gabrosys commented May 9, 2024 •

edited