Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bespoke skim no longer working #738

Open
blueja5 opened this issue Jul 19, 2023 · 16 comments
Open

bespoke skim no longer working #738

blueja5 opened this issue Jul 19, 2023 · 16 comments

Comments

@blueja5
Copy link

blueja5 commented Jul 19, 2023

I had been using the code below for some time with no issues:

my_skim<-skim_with(base = sfl(complete = n_complete), numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, range = NULL, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, mad = NULL, empty = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

Stroops %>% my_skim() %>% yank("numeric") %>% kbl(format = "latex", booktabs = T, caption = "Inhibition Descriptives",digits = 2) %>% kable_styling(latex_options = c("striped","hold_position", "scale_down"),font_size = 12)

However it now throws an error:
Error in dplyr::summarize():
ℹ In argument: skimmed = purrr::map2(...).
Caused by error in purrr::map2():
ℹ In index: 1.
ℹ With name: numeric.
Caused by error in dplyr::summarize():
ℹ In argument: dplyr::across(variable_names, mangled_skimmers$funs).
Caused by error in across():
! .fns must be a function, a formula, or a list of functions/formulas.
Backtrace:

  1. ... %>% ...
  2. purrr::map2(...)
  3. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
  4. skimr:::skim_by_type.data.frame(.x[[i]], .y[[i]], ...)
  5. dplyr:::summarise.data.frame(data, dplyr::across(variable_names, mangled_skimmers$funs))
  6. dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
  7. dplyr:::expand_across(dot)

I have spent some time looking at the documentation but I can't work out what it wrong with 'myskim'. I assume there has been a change somewhere that I can't work out. Can you help? It had taken me a very long time to get to this descriptives table I wanted and I'd be very grateful if you can let me know what the problem is. It still occurs if I take out the skims using purrr specifically.

@elinw
Copy link
Collaborator

elinw commented Jul 20, 2023

Okay the other recent issue also seems to involve purrr.
Can you please let me know what version of purr and dplyr you are on?
It would be great if you could reproduce the problem in a simple example in which piping stops when the error occurs and using a data set like iris.

Is there any chance that you have haven labelled data?

Also can you please confirm that plain skim works with your data?

@michaelquinn32

@blueja5
Copy link
Author

blueja5 commented Jul 20, 2023 via email

@elinw
Copy link
Collaborator

elinw commented Jul 21, 2023

I have a feeling that we were on dangerous ground using some internal functions.
dplyr:::expand_across(dot)

across() has had an API change that impacts the use of ... and requires an anonymous function instead ... that I saw other people online complaining about.

! .fns must be a function, a formula, or a list of functions/formulas.

\() mean(.x, na.rm = TRUE)

@elinw
Copy link
Collaborator

elinw commented Jul 21, 2023

Hmm your example works for me and I am on r 4.2.2. I think the native short cut for anonymous functions was introduced in 4.1 so you should have that. But i wonder if there was a bug fix of some kind since 4.1.3. I'm going to look at the change log.

@blueja5
Copy link
Author

blueja5 commented Jul 21, 2023 via email

@elinw
Copy link
Collaborator

elinw commented Jul 22, 2023

Okay comparing your two bespoke skims, which both work for me without error, I notice a few issues.

First in the one that doesn't work you have range = NULL but range returns two values and thus can't be used in skimr without modification to pick one or the other. It is not in the default list of numeric skimmers for this reason. What happens if you take out just range = NULL?

Second I notice that in the custom skim that doesn't work you have
base = sfl(complete = n_complete), numeric = sfl( ... but in the one that does work you have
numeric = sfl(complete = n_complete ...

Could you try changing those one at a time and see if one or both of them fixes the issue?

@elinw
Copy link
Collaborator

elinw commented Jul 22, 2023

Yes I can confirm that for me, including range causes the error. I think we should come up with a more graceful message.

@elinw
Copy link
Collaborator

elinw commented Jul 22, 2023

See this post: https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/

There was 1 warning in dplyr::summarize().
ℹ In argument: skimmed = purrr::map2(...).
ℹ In group 2: skim_type = "numeric".
Caused by warning:
! Returning more (or less) than 1 row per summarise() group was
deprecated in dplyr 1.1.0.
ℹ Please use reframe() instead.
ℹ When switching from summarise() to reframe(), remember that
reframe() always returns an ungrouped data frame and adjust
accordingly.

So we would need to find a way to identify functions that return multiple values.

@elinw
Copy link
Collaborator

elinw commented Jul 23, 2023

@michaelquinn32 Since this is actually working as documented (only functions with a return of length 1 are allowed) what about catching the error and using our own error message?

@blueja5
Copy link
Author

blueja5 commented Jul 23, 2023 via email

@elinw
Copy link
Collaborator

elinw commented Jul 24, 2023

Okay so still, I think there is the issue that the tidyverse made an important change to handling of statistics that return multiple values. For me range "works" in the sense that I get multiple almost identical rows with the exception that they have two different values of range. But I'm not sure why it doesn't error given the tidyverse change (I did get the error one time but can't reproduce it).

So overall it really looks like something is going wrong with the NULLs. Can you try not including NULL setting for statistics that are not part of the default (mad, empty and range are not in the default numeric skimmers).

One thing is that these are the default numerics

mean sd p0 p25 p50 p75 p100 hist

I don't think you should be using NULL on anything besides them.
I also am concerned about doing much of anything with the base sfl since the columns defined by that are used for duck typing skim objects.

I'm assuming that using skim() unmodified works, correct? And also skim_without_charts()?

What I think would be helpful in identifying why you are getting this error is probably start from scratch with creating a skimmer by making one modification at a time and seeing if there is a specific one that throws the error.

If none by itself is causing it, then the next question is what combination is the trigger.

@blueja5
Copy link
Author

blueja5 commented Jul 25, 2023 via email

@blueja5
Copy link
Author

blueja5 commented Jul 25, 2023 via email

@elinw
Copy link
Collaborator

elinw commented Jul 31, 2023

What would be great is if you could make a minimal reproducible example, meaning do the smallest, simplest code ... no functions from other packages, no using purrr partials . That will really help us isolate the problem.
So starting with NULL and numeric

my_skim <- skim_with(numeric = sfl(mean = NULL))
my_skim(iris)

And keep changing what you NULLs until you have been through them all
or you get an error.

Then if you don't trigger the error, you should start adding some base functions like MAD etc. But only ones that only return a single value (not range).

my_skim <- skim_with(numeric = sfl(mad = mad))
my_skim(iris)

@blueja5
Copy link
Author

blueja5 commented Jul 31, 2023 via email

@elinw
Copy link
Collaborator

elinw commented Sep 8, 2023

Okay it's strange but I'll keep trying to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants