Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add native pipeline/verb logging option similar to tidylog package. #6990

Open
orgadish opened this issue Feb 13, 2024 · 0 comments

Comments

@orgadish
Copy link

The tidylog package is an amazing supplement to the dplyr and tidyr packages. It allows immediate tracking/debugging of a pipeline by giving high-level information on the results of each step in the pipeline (see below). It's especially useful in tracking joins, which I know was an important part of the error and warning handling made to the join functions in dplyr 1.1.0+.

The silly example below shows what messages tidylog provides about the process.

I always use tidylog by default and I would love to also teach it to my students. But I don't because it works by masking regular tidyverse verbs with a logging version, which eliminates use of auto-complete. In order to implement auto-complete, the maintainer of the tidylog would have to constantly keep up with tidyverse updates even though they are completely separate (elbersb/tidylog#56)

It would be great if this kind of functionality were implemented natively and maintained by the relevant teams so that updates to APIs (e.g. new arguments) would also involve updating the log structure.

suppressMessages({
  library(tidyverse)
  library(tidylog)
})

tbl1 <- dplyr::starwars |> 
  filter(height > 90) |> 
  select(name, height, mass, hair_color) |> 
  summarize(
    across(c(height, mass), mean),
    name_list = list(name),
    .by = hair_color
  )
#> filter: removed 9 rows (10%), 78 rows remaining
#> select: dropped 10 variables (skin_color, eye_color, birth_year, sex, gender, …)
#> summarize: now 12 rows and 4 columns, ungrouped
  
tbl2 <- dplyr::starwars |>
  select(hair_color, films) |> 
  unnest_longer(films) |> 
  summarize(
    films_list = list(films),
    .by = hair_color
  )
#> select: dropped 12 variables (name, height, mass, skin_color, eye_color, …)
#> summarize: now 12 rows and 2 columns, ungrouped

joined <- tbl1 |> 
  full_join(
    tbl2,
    by = join_by(hair_color)
  )
#> full_join: added one column (films_list)
#>            > rows only in tbl1   0
#>            > rows only in tbl2   0
#>            > matched rows       12
#>            >                   ====
#>            > rows total         12

Created on 2024-02-13 with reprex v2.0.2

@orgadish orgadish changed the title Add native pipeline/verb logging option similar to tidylog package. Request: Add native pipeline/verb logging option similar to tidylog package. Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant