Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by not always working #736

Open
DaZaM82 opened this issue Apr 24, 2023 · 7 comments
Open

group_by not always working #736

DaZaM82 opened this issue Apr 24, 2023 · 7 comments
Labels

Comments

@DaZaM82
Copy link

DaZaM82 commented Apr 24, 2023

For some data frames I get an error when applying group_by before skim(). Doesn’t seem to depend on the grouping variable type.
image

this issue doesn’t come up for iris though where it works.

I’m not sure how to provide a reprex for this. Is this a known issue? Any way to resolve it?

@DaZaM82
Copy link
Author

DaZaM82 commented Apr 27, 2023

Some additional info...

If I run the code up to group_by() and nothing afterwards, then that works fine. Error is thrown by the skimr function

I tried to see if any of the groupings had zero length and found that none do, so not sure what is going on:
image

@elinw
Copy link
Collaborator

elinw commented Apr 28, 2023

I don't think the issue is the number of rows. It's with the classes. Is it possible to isolate which data type is causing the problem by filtering prior to skimming?
Also, is the group_by() necessary to cause the error?

@DaZaM82
Copy link
Author

DaZaM82 commented Apr 30, 2023

Hi @elinw
Thanks for that. I was able to identify the problem following your suggestion.
The error is when there are date columns that are all NA.

So now I can generate a reprex:
iris %>% mutate(date = NA_Date_) %>% group_by(Species) %>% skimr::skim()

This seems like a bug to me?

@elinw
Copy link
Collaborator

elinw commented Apr 30, 2023

For sure a bug. Thanks for the report!

@elinw elinw added the bug label Apr 30, 2023
@elinw
Copy link
Collaborator

elinw commented May 3, 2023

@michaelquinn32 When all values are NA and na.rm = TRUE (which we use everywhere) the functions like min() return Inf with a warning. . Whereas if na.rm = FALSE is set they return NA. mean() and others, however, return NA without a warning. So I think we (a) should decide what should be returned when skimming such data and (b) need to think about handling the warning. I suspect the Inf is possibly the ultimate cause of the issue reported.

@elinw
Copy link
Collaborator

elinw commented May 3, 2023

Okay I take it back, because there is no such thing as NA_Date_ as far as I can tell, unless it is defined in a package. So I think that issue is getting absorbed into the pipe chain and then causing the errors. You can see this if you try

iris %>%  mutate(date = NA_date_)  

So then the pipe ends up passing nothing along to the final steps.

Update:
So lubridate does does define NA_Date and loading it first solves the error, though not the warnings.

library(lubridate)
iris %>%  mutate(date = NA_Date_) |> skimr::skim()

Warning messages:
1: In min.default(c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, :
no non-missing arguments to min; returning Inf
2: In max.default(c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, :
no non-missing arguments to max; returning -Inf

I guess the question is, do we want to silence the warnings? Or would we want to handle like a numeric since I don't think min and max are defined for the date class. Consistency with Numeric would return NA for both min and max.

@DaZaM82
Copy link
Author

DaZaM82 commented Dec 3, 2023

iris %>% mutate(date = NA_date_)

@elinw NA_Date_ is a valid object even without lubridate. Your above code throws an error because you have a lower case d where it should be a capital D. If you change it from NA_date_ to NA_Date_ then it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants