Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-Filling / Back-Filling Timeseries is Not Possible #4330

Open
jmsmdy opened this issue Mar 12, 2024 · 0 comments
Open

Forward-Filling / Back-Filling Timeseries is Not Possible #4330

jmsmdy opened this issue Mar 12, 2024 · 0 comments

Comments

@jmsmdy
Copy link

jmsmdy commented Mar 12, 2024

What's up?

It appears there is no way to do this in PRQL right now (at least not cleanly):

SELECT 
      location,
      day,
      LAST_VALUE(temperature) IGNORE NULLS OVER (
          PARTITION BY location
          ORDER BY day
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
     ) AS temperature,
     LAST_VALUE(pressure) IGNORE NULLS OVER  (
          PARTITION BY location
          ORDER BY day
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
     ) AS pressure
FROM weather_data

Some attempts:

from weather_data
group {location} (
    sort day
    window rows:..0 (
        filter pressure != null
        derive {pres = last pressure}
    )
    window rows:..0 (
        filter temperature != null
        derive {temp = last temperature}
    )
)

Semantically, you would expect a filter in window (filter ... | derive ...) to filter the records in that window used for the later derivation, but instead the filter is applied to the entire table, which is not the right thing.

It doesn't appear possible to move the filter inside the derive clause (like derive {temp = temperature | x -> filter x !=null | last}, because filter is typed as a function relation -> relation. I'm guessing this probably won't end up being added to PRQL (absent a huge amount of work), because while DuckDB supports filter clauses in window functions, most don't.

Another attempt:

let last_non_null = col <array> -> <scalar> s"LAST({col}) IGNORE NULLS"

from weather_data
group {location} (
    sort day
    window range:..0 (
        derive {
            pres = last_non_null pressure,
            temp = last_non_null temperature
        }
    )
)

This fails (no matter how you type hint) because prqlc fails to recognize last_non_null as a window function. In any case, even if this worked, you now need to change the let ... statement depending on what SQL dialect you are compiling to.

The only way I have gotten this to work is to do the windowing (partitioning + sorting + rows-between + aggregation) entirely in the target SQL dialect using a big s-string, which is not very clean. The limitations of s-strings make it impossible to fully parameterize the windowing process (e.g. there is no way to make s-string valued function that expands tuples inside the s-string, in case you want your SQL-valued function to partition by a variable number of columns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant