Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ungroup() drops data frame attributes #6774

Open
mkoohafkan opened this issue Mar 4, 2023 · 4 comments · May be fixed by #6817
Open

ungroup() drops data frame attributes #6774

mkoohafkan opened this issue Mar 4, 2023 · 4 comments · May be fixed by #6817
Labels
feature a feature request or enhancement verbs 🏃‍♀️

Comments

@mkoohafkan
Copy link

This is referring to dataframe/tibble attributes, not tibble column attributes. The CRAN install of dplyr 1.1.0 sometimes drops attributes:

d = mtcars
attr(d, "foo") = "bar"
attr(d, "foo")
#> [1] "bar"

## preserved
d |> select(cyl) |> attr("foo")
#> [1] "bar"
d |> rename(cyl2 = cyl) |> attr("foo")
#> [1] "bar"
d |> arrange(cyl) |> attr("foo")
#> [1] "bar"
d |> group_by(cyl) |> attr("foo")
#> [1] "bar"
d |> mutate(mpg2 = mpg) |> attr("foo")
#> [1] "bar"
d |> transmute(mpg2 = mpg) |> attr("foo")
#> [1] "bar"
d |> filter(cyl == 1L) |> attr("foo")
#> [1] "bar"
d |> count(cyl) |> attr("foo") 
#> [1] "bar"
d |> bind_rows(d) |> attr("foo")
#> [1] "bar"
d |> bind_cols(select(d, cyl2 = cyl)) |> attr("foo")
#> [1] "bar"


## dropped
d |> group_by(cyl) |> ungroup() |> attr("foo")
#> NULL
d |> summarize(mpg = max(mpg)) |> attr("foo")
#> NULL

I can understand why it might make sense for summarize() to drop attributes (but then why does count() preserve them?), but I don't see why ungroup(group_by(...)) should drop attributes unilaterally. Given that almost all dplyr verbs preserve the attribute, I'm wondering if this is a bug/regression.

There are a variety of issues regarding attributes but they are all closed.

@DavisVaughan
Copy link
Member

DavisVaughan commented Mar 16, 2023

FWIW this happens in dplyr 1.0.10 and looks to have happened since at least 2020, so I don't consider this an immediate regression.

And I'm not 100% it is a bug. ungroup() is essentially the same as calling as_tibble(<grouped_df>), and we haven't decided yet if that is an operation that should drop "unknown" attributes or not. See also tidyverse/tibble#769

# pak::pak("tidyverse/[email protected]")
library(dplyr, warn.conflicts = FALSE)

d = mtcars
attr(d, "foo") = "bar"
attr(d, "foo")
#> [1] "bar"

# kept after group-by
d |> group_by(cyl) |> attr("foo")
#> [1] "bar"

# dropped here
d |> group_by(cyl) |> ungroup() |> attr("foo")
#> NULL

# really the `ungroup()` case is because of
d |> group_by(cyl) |> as_tibble() |> attr("foo")
#> NULL

# which looks like
dplyr:::as_tibble.grouped_df
#> function (x, ...) 
#> {
#>     new_tibble(dplyr_vec_data(x), nrow = nrow(x))
#> }
#> <bytecode: 0x7fa953f20e68>
#> <environment: namespace:dplyr>

Created on 2023-03-16 with reprex v2.0.2.9000

@DavisVaughan DavisVaughan changed the title dplyr 1.1.0 ungroup() drops data frame attributes ungroup() drops data frame attributes Mar 16, 2023
@mkoohafkan
Copy link
Author

mkoohafkan commented Mar 16, 2023

Interesting, I would have thought count(d, ...) would also drop the attributes then too. I guess because it is using dplyr::reconstruct() instead of ungroup()?

dplyr/R/count-tally.R

Lines 82 to 83 in 68ae922

# Ensure grouping is transient
out <- dplyr_reconstruct(out, x)

even though tally(d) drops the attributes.

@mkoohafkan
Copy link
Author

mkoohafkan commented Mar 16, 2023

Looking at tidyverse/tibble#769, while I understand the reasoning I don't think it's intuitive for ungroup() to drop attributes, because the ungroup() operation does not necessarily mean a structural change to the tibble or accompanying attributes. group()/ungroup() operations can be used for mutate() statements to do within/group calculations without changing the row structure of the tibble. So regardless of the the discussion in tidyverse/tibble#769, I don't think that d |> group_by(cyl) |> as_tibble() is conceptually equivalent to d |> group_by(cyl) |> ungroup() (even if the code is equivalent right now) as I'm not necessarily trying to return to a "bare" tibble when ungrouping.

@mkoohafkan
Copy link
Author

mkoohafkan commented Mar 21, 2023

Another test case for consideration: partial ungrouping does not drop the attribute:

library(dplyr)

d = mtcars
attr(d, "foo") = "bar"

d |> group_by(cyl, am) |> ungroup(cyl) |>attr("foo")
#> [1] "bar"

d |> group_by(cyl, am) |> ungroup(am) |>attr("foo")
#> [1] "bar"

d |> group_by(cyl, am) |> ungroup(am, cyl) |> attr("foo")
#> NULL

d |> group_by(cyl, am) |> ungroup(am) |> ungroup(cyl) |> attr("foo")
#> NULL

nor does ungrouping an ungrouped data frame:

d |> ungroup() |> attr("foo")
#> [1] "bar"

Which feels funny to me, since the treatment of attributes differs based on if I'm dropping some groups or all groups. The core of the matter is that as a user I would generally expect the following to be TRUE:

identical(
    d |> group_by(...) |> ungroup(),
    d
)

Relacing the as_tibble() call with new_tibble(groups = NULL) maintains attributes and kicks the can down the road until a decision is made on tidyverse/tibble#769, so

dplyr/R/group-by.R

Lines 157 to 159 in 3b696db

ungroup.grouped_df <- function(x, ...) {
if (missing(...)) {
as_tibble(x)

becomes

ungroup.grouped_df <- function(x, ...) { 
   if (missing(...)) { 
     new_tibble(x, groups = NULL)

(and the same change would be needed for ungroup.rowwise_df()).

Test:

d = as_tibble(mtcars)
attr(d, "foo") = "bar"

d |> group_by(cyl, am) |>tibble::new_tibble(groups = NULL) |> attr("foo")
#> [1] "bar"

identical(
  d |> group_by(cyl, am) |>tibble::new_tibble(groups = NULL),
  d
)
#> [1] TRUE

@mkoohafkan mkoohafkan linked a pull request Apr 7, 2023 that will close this issue
@DavisVaughan DavisVaughan added feature a feature request or enhancement verbs 🏃‍♀️ labels Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement verbs 🏃‍♀️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants