Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings when skimming #223

Open
elinw opened this issue Jan 4, 2018 · 4 comments
Open

Warnings when skimming #223

elinw opened this issue Jan 4, 2018 · 4 comments

Comments

@elinw
Copy link
Collaborator

elinw commented Jan 4, 2018

Warnings that happen when skimming display. This is informative for many purposes, specifically for understanding what is in an unknown data set. However for other purposes it is undesirable.

One option would be to make a quiet mode that defaults to FALSE so that users could suppress the printing of warnings.

There may be other ideas also, so I'm marking this as for discussion.

@rgayler
Copy link

rgayler commented Jan 4, 2018

Independently of whether the warning messages get displayed, I think it would be worth adding a warnings count (or logical) to skim_df for each variable in the input frame.

I often get given data sets with hundreds to tens of thousands of variables, about which I know nothing. So I really need a process that I can tip the data into and get some characterisation of what it is without me being forced to manually eye-check some output. (I will do that anyway, but I can't guarantee to notice everything that I should.)

@elinw
Copy link
Collaborator Author

elinw commented Jan 5, 2018

That's an interesting concept, though I think complex to implement because of how the processing works. It could potentially be part of the summary. I think in the immediate term the warnings go automatically right below the type outputs and (at least for the ones we have seen) they don't mention the variable involved, just the statistic, which is really not that helpful.

So implementation proposals welcome if you want to dig into the code; I'll also be looking into it and put anything I come up with in this issue report.

@elinw
Copy link
Collaborator Author

elinw commented Jan 5, 2018

@rgayler
Copy link

rgayler commented Jan 5, 2018

In Hadley Wickham's readr package all the problem reports (per row of input) are put in a data frame that is returned as an attribute of the output data frame and accessd via the problems() function. However, I suspect that the problem reports are designed into the column parsers rather than captured from a standard warning message.

Ideally, whatever is developed for skimr ought to be applied automatically to user-supplied statistic functions - so no special programming is needed by the user to capture warnings and enhance them with the variable and statistic identifiers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants