group_by not always working #736

DaZaM82 · 2023-04-24T13:06:46Z

For some data frames I get an error when applying group_by before skim(). Doesn’t seem to depend on the grouping variable type.

this issue doesn’t come up for iris though where it works.

I’m not sure how to provide a reprex for this. Is this a known issue? Any way to resolve it?

DaZaM82 · 2023-04-27T14:09:55Z

Some additional info...

If I run the code up to group_by() and nothing afterwards, then that works fine. Error is thrown by the skimr function

I tried to see if any of the groupings had zero length and found that none do, so not sure what is going on:

elinw · 2023-04-28T09:18:14Z

I don't think the issue is the number of rows. It's with the classes. Is it possible to isolate which data type is causing the problem by filtering prior to skimming?
Also, is the group_by() necessary to cause the error?

DaZaM82 · 2023-04-30T09:39:15Z

Hi @elinw
Thanks for that. I was able to identify the problem following your suggestion.
The error is when there are date columns that are all NA.

So now I can generate a reprex:
iris %>% mutate(date = NA_Date_) %>% group_by(Species) %>% skimr::skim()

This seems like a bug to me?

elinw · 2023-04-30T19:39:33Z

For sure a bug. Thanks for the report!

elinw · 2023-05-03T00:57:46Z

@michaelquinn32 When all values are NA and na.rm = TRUE (which we use everywhere) the functions like min() return Inf with a warning. . Whereas if na.rm = FALSE is set they return NA. mean() and others, however, return NA without a warning. So I think we (a) should decide what should be returned when skimming such data and (b) need to think about handling the warning. I suspect the Inf is possibly the ultimate cause of the issue reported.

elinw · 2023-05-03T01:32:30Z

Okay I take it back, because there is no such thing as NA_Date_ as far as I can tell, unless it is defined in a package. So I think that issue is getting absorbed into the pipe chain and then causing the errors. You can see this if you try

iris %>%  mutate(date = NA_date_)

So then the pipe ends up passing nothing along to the final steps.

Update:
So lubridate does does define NA_Date and loading it first solves the error, though not the warnings.

library(lubridate)
iris %>%  mutate(date = NA_Date_) |> skimr::skim()

Warning messages:
1: In min.default(c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, :
no non-missing arguments to min; returning Inf
2: In max.default(c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, :
no non-missing arguments to max; returning -Inf

I guess the question is, do we want to silence the warnings? Or would we want to handle like a numeric since I don't think min and max are defined for the date class. Consistency with Numeric would return NA for both min and max.

DaZaM82 · 2023-12-03T09:42:10Z

iris %>% mutate(date = NA_date_)

@elinw NA_Date_ is a valid object even without lubridate. Your above code throws an error because you have a lower case d where it should be a capital D. If you change it from NA_date_ to NA_Date_ then it works.

elinw added the bug label Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

group_by not always working #736

group_by not always working #736

DaZaM82 commented Apr 24, 2023

DaZaM82 commented Apr 27, 2023

elinw commented Apr 28, 2023 •

edited

DaZaM82 commented Apr 30, 2023 •

edited

elinw commented Apr 30, 2023

elinw commented May 3, 2023

elinw commented May 3, 2023 •

edited

DaZaM82 commented Dec 3, 2023 •

edited

group_by not always working #736

group_by not always working #736

Comments

DaZaM82 commented Apr 24, 2023

DaZaM82 commented Apr 27, 2023

elinw commented Apr 28, 2023 • edited

DaZaM82 commented Apr 30, 2023 • edited

elinw commented Apr 30, 2023

elinw commented May 3, 2023

elinw commented May 3, 2023 • edited

DaZaM82 commented Dec 3, 2023 • edited

elinw commented Apr 28, 2023 •

edited

DaZaM82 commented Apr 30, 2023 •

edited

elinw commented May 3, 2023 •

edited

DaZaM82 commented Dec 3, 2023 •

edited