Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr 1.1.4 breaks on empty dataframes constructed from matrices #7004

Open
john-b-edwards opened this issue Mar 17, 2024 · 1 comment
Open

Comments

@john-b-edwards
Copy link

john-b-edwards commented Mar 17, 2024

Previous versions of dplyr would allow you to manipulate empty dataframes that had been constructed from empty matrices, e.g.

packageurl <- "https://cran.r-project.org/src/contrib/Archive/dplyr/dplyr_1.1.3.tar.gz"
install.packages(packageurl, repos=NULL, type="source")
#> Installing package into 'C:/Users/edwar/Documents/R/win-library/4.1'
#> (as 'lib' is unspecified)
packageVersion('dplyr')
#> [1] '1.1.3'
as.data.frame(matrix(nrow = 0, ncol = 0)) |> dplyr::slice(1)
#> data frame with 0 columns and 0 rows

In dplyr 1.1.4, this code causes an internal error:

packageurl <- "https://cran.r-project.org/src/contrib/dplyr_1.1.4.tar.gz"
install.packages(packageurl, repos=NULL, type="source")
#> Installing package into 'C:/Users/edwar/Documents/R/win-library/4.1'
#> (as 'lib' is unspecified)
packageVersion('dplyr')
#> [1] '1.1.4'
as.data.frame(matrix(nrow = 0, ncol = 0)) |> dplyr::slice(1)
#> Error: Internal error: `template` must have a `names` attribute.

Both versions are able to reliably manipulate empty dataframes that were not constructed from matrices (e.g. data.frame() |> dplyr::slice(1) works just fine).

@DavisVaughan
Copy link
Member

DavisVaughan commented Mar 21, 2024

I think this may actually be a base R bug. I believe that all data.frames should have a names attribute. If there are 0 columns then it should be set to character(). The fact that that isn't the case here is odd.

attributes(as.data.frame(matrix(nrow = 0, ncol = 0)))
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> integer(0)

attributes(data.frame())
#> $names
#> character(0)
#> 
#> $row.names
#> integer(0)
#> 
#> $class
#> [1] "data.frame"

Not having a names attribute can seriously break other parts of base R, making my guess that it is a base R bug feel more validated:

# Good OOB error
df <- data.frame()
df[1]
#> Error in `[.data.frame`(df, 1): undefined columns selected

# WTF is this
df <- as.data.frame(matrix(nrow = 0, ncol = 0))
df[1]
#> NULL
#> <0 rows> (or 0-length row.names)

In general dplyr works fine with the 2nd case, i.e. a well formed data frame


I think we should consider switching from names2() to names() in DataMask to try and catch this earlier on:

names_bindings <- chr_unserialise_unicode(names2(data))

i.e. something like

names <- names(data)

if (is.null(names)) {
  abort("Can't transform a data frame with `NULL` names.")
}

which would go nicely with our other names related checks there.

It did use names() in the past, but I switched to names2() here to supposedly fix this exact case. Supposedly not supporting NULL column names broke 2 packages and that was what I was out to fix, but that seems like a false alarm because current CRAN dplyr obviously already doesn't support NULL column names based on this issue. So the change above would just be giving a better error message for this invalid data frame.
#4896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants