Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PromQL: Multiplication by 1 creates additional floating point errors (but should be a no-op) #14052

Open
gabrieljones opened this issue May 4, 2024 · 6 comments

Comments

@gabrieljones
Copy link

gabrieljones commented May 4, 2024

What did you do?

grafana PromQL:

sum by (le) (increase(ingress_timer_millis_bucket[$__interval]) * on (job, instance) group_left target_info)

What did you expect to see?

The same PromQL response as

sum by (le) (increase(ingress_timer_millis_bucket[$__interval]))
image

What did you see instead? Under which circumstances?

with the group_left join in place tiny non zeros are sprinkled throughout the buckets at random differing with each repeated request of the same query
image
image

System information

No response

Prometheus version

prometheus_build_info{
  branch="HEAD", 
  goarch="amd64", 
  goos="linux", 
  goversion="go1.21.6", 
  instance="0.0.0.0:8080", 
  revision="43e14844a33b65e2a396e3944272af8b3a494071", 
  tags="netgo,builtinassets,stringlabels", 
  version="2.49.1"
}

Prometheus configuration file

No response

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

@gabrieljones
Copy link
Author

gabrieljones commented May 6, 2024

compare (prometheus_http_request_duration_seconds_bucket)

@beorn7
Copy link
Member

beorn7 commented May 8, 2024

It's not so much the grouping or the join that creates these artifacts, but the multiplication. At least I can create the same "noisy" heatmap by multiplying with a 1 literal.

I'm confused why this is happening, as multiplication by 1 should be a no-op (and with the query linked above, it should be guaranteed that an actual 1.0 float arrives in the engine).

@beorn7 beorn7 changed the title group left join on histogram introduces spurious non zero values PromQL: Multiplication by 1 creates additional floating point errors (but should be a no-op) May 8, 2024
@beorn7
Copy link
Member

beorn7 commented May 8, 2024

Fundamentally, I would say there must be a different code path in the PromQL engine for "sum the result of a function call" and "sum the result of a bin-op", but from the top of my head, I cannot imagine how that would happen.

@gabrieljones
Copy link
Author

gabrieljones commented May 10, 2024

One possible workaround is to use something like round(_, 0.00001). Does this give the same result as the unmultiplied query?

compare:

  • round(sum by (le) (increase(prometheus_http_request_duration_seconds_bucket[1m]) * 1), 0.00001) link
  • sum by (le) (increase(prometheus_http_request_duration_seconds_bucket[1m])) link

@beorn7
Copy link
Member

beorn7 commented May 15, 2024

We discussed on the chat-system-that-shall-not-be-named. I guess the most likely reason for this is what @bboreham said there: "At a high level, it’s likely that each new expression changes the order of evaluation, which will give different results since addition is non-commutative."

@bboreham
Copy link
Member

Just wanted to highlight #14074, which I think could improve the optics, at some CPU cost.
I didn't understand this issue enough to make a test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants