Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use quantile() in mutate() in DuckDB #1487

Open
blairqin opened this issue Mar 27, 2024 · 4 comments
Open

Unable to use quantile() in mutate() in DuckDB #1487

blairqin opened this issue Mar 27, 2024 · 4 comments

Comments

@blairqin
Copy link

I'm trying to calculate the 25th quantile value in DuckDB, using the mutate() function from dbplyr. I got an error message saying "ORDER BY is not implemented for window functions!". However, DuckDB's documentation indicates support for the quantile function via quantile_count(), saying "All aggregate functions can be used in a windowing context" (DuckDB Documentation on Window Functions).

Here's the reprex:

Here's the reprex:

library(tidyverse)
library(DBI)
library(dbplyr)

con <- DBI::dbConnect(duckdb::duckdb())

copy_to(con, mtcars)

tbl(con, "mtcars") |> 
  mutate(mpg_25th = quantile(mpg, 0.25, na.rm = TRUE))
# Shows Error in `collect()`:
# ! Failed to collect lazy table.
# Caused by error:
# ! {"exception_type":"Parser","exception_message":"ORDER BY is not implemented for window functions!"}
# Run `rlang::last_trace()` to see where the error occurred.

dbGetQuery(
  con,
  'SELECT *, quantile_cont(mpg, 0.25) OVER () AS mpg_25th FROM mtcars;'
)
# Works fine

Would you please to fix this issue? Thank you.

@hadley
Copy link
Member

hadley commented Apr 4, 2024

This is the SQL that we're generating for duckdb here:

SELECT
  mtcars.*,
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY mpg) OVER () AS mpg_25th
FROM mtcars

@blairqin
Copy link
Author

blairqin commented Apr 5, 2024

Thank you. I still get the following error running the SQL you recommended.

Error: {"exception_type":"Parser","exception_message":"ORDER BY is not implemented for window functions!"}

Here is the code I use.

library(tidyverse)
library(DBI)
library(dbplyr)

con <- DBI::dbConnect(duckdb::duckdb())

copy_to(con, mtcars)

dbGetQuery(
  con,
  'SELECT
  mtcars.*,
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY mpg) OVER () AS mpg_25th
  FROM mtcars;'
)
# Error: {"exception_type":"Parser","exception_message":"ORDER BY is not implemented for window functions!"}

Could this be related to an issue from Duckdb?

@hadley
Copy link
Member

hadley commented Apr 5, 2024

I was just including the SQL that dbplyr generates to complete your reprex.

@hadley hadley reopened this Apr 25, 2024
@blairqin
Copy link
Author

I recently found that the summarise function does not encounter the error above using the following code.

library(tidyverse)
library(DBI)
library(dbplyr)

con <- DBI::dbConnect(duckdb::duckdb())

copy_to(con, mtcars)

tbl(con, "mtcars") |> 
  summarize(mpg_25th = quantile(mpg, 0.25, na.rm = TRUE))

The corresponding SQL code is

SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY mpg) AS mpg_25th
FROM mtcars

As a temporary workaround, I could use the original dataframe to perform a left join with the summarised dataframe to get the same results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants