-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: NaN behavior changes in compressed tables #6884
Labels
Comments
Sorry about that, I think it's a bug in implementation of vectorized filters in TimescaleDB. I didn't realize Postgres used a different convention for NaNs. Nice workaround, there's also a more heavy-handed way to work around it by disabling the bulk decompression and therefore all vectorization with |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Compression
What happened?
Summary
When data in a compressed chunk is queried for NaN values using a WHERE clause, the clause has no effect.
This is significant when running aggregate queries; for example,
avg()
will return NaN for an entire time bucket if just one NaN is contained therein. As a workaround, NaN values can be ignored by adding... WHERE value <> 'NaN' ...
but this only is working on uncompressed chunks.Details
Table DDL:
Sample data:
Run the below query examples both before and after compressing the chunk, or insert more data that will fall into a different chunk.
To compress the chunk:
Query examples:
select * from sensor_data where time between '2024-01-01' and '2024-01-02' and value = 'inf'::numeric;
select * from sensor_data where time between '2024-01-01' and '2024-01-02' and value = 'nan'::numeric;
select * from sensor_data where time between '2024-01-01' and '2024-01-02' and value <> 'nan'::numeric;
Inconsistent behavior occurs when updating into a compressed chunk: the updates succeed, but then the above example queries behave as expected. As in, they return/ignore NaN values when asked to. I'm assuming this is because TimescaleDB temporarily decompresses the chunk/slice.
I believe the root cause is due to inconsistent assumptions regarding the equality of NaN: in PostgreSQL, NaN == NaN. This can be shown by running
select 'nan'::numeric = 'nan'::numeric as isnan;
. However, the IEEE 754 spec indicates that NaN should NOT equal NaN (which seems to be the behavior when querying compressed chunks).See PostgreSQL note (https://www.postgresql.org/docs/14/datatype-numeric.html):
Workaround
As a workaround, the
nullif()
function can be used in the where clause. For example:Whereas using
...and value <> 'nan' ...
fails (returns NaNs when it should exclude them).The positive (
... and nullif(value, 'nan') is null
behaves as expected.Defining functions like
isnan()
(create or replace function isnan(double precision) returns boolean as $$ select $1 > 0 and -$1 > 0 $$ language sql;
fail as well.TimescaleDB version affected
2.14.2
PostgreSQL version used
14.11
What operating system did you use?
RHEL7, 64 bit
What installation method did you use?
RPM
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
The text was updated successfully, but these errors were encountered: