You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
case_match() is great to have as a more pipe-able alternative to case_when() following modern tidyverse approaches. However, it lacks an equivalent capabilities to one nice feature of its predecessor recode(): namely, where non-recoded elements could be allowed to pass through into the output. While there is a workaround for this missing feature provided in the help examples, I'm realizing it may not generalize well across use cases.
An example similar to what I encountered:
# Some filenames:filenames<-
c("juniors_2022_OU.csv", "roster_2023_juniors_notre_dame.csv", "2022_SOU_juniors_roster.csv")
# Begin building tibble to (eventually) import & organize the content from these files:tibble::tibble(
# Import filenames (in practice, would be a call to list.files() ):file_names=filenames,
# Get key info like year and school out of filenames: # Year is straightforward:id_year=file_names|>stringr::str_extract("202[23]"),
#School, however, needs initialisms expanded to minimize ambiguity:# recode() can do this inside a single set of piped instructions. It# fills in the original data when items are not recoded (& same # type as recoded output):id_school_recode=file_names|># Remove non-school-name content:stringr::str_remove(".csv$") |>stringr::str_remove("_?202[23]_?") |>stringr::str_remove("_?juniors_?") |>stringr::str_remove("_?roster_?") |># Recode initialisms:dplyr::recode(
"OU"="oklahoma",
"SOU"="southern_oregon"
),
# Using case_match(), though, non-recoded elements become NA.id_school_casematch=file_names|>stringr::str_remove(".csv$") |>stringr::str_remove("_?202[23]_?") |>stringr::str_remove("_?juniors_?") |>stringr::str_remove("_?roster_?") |>dplyr::case_match(
"OU"~"oklahoma",
"SOU"~"southern_oregon"
)
)
#> # A tibble: 3 × 4#> file_names id_year id_school_recode id_school_casematch#> <chr> <chr> <chr> <chr> #> 1 juniors_2022_OU.csv 2022 oklahoma oklahoma #> 2 roster_2023_juniors_notre_dame.c… 2023 notre_dame <NA> #> 3 2022_SOU_juniors_roster.csv 2022 southern_oregon southern_oregon
Help docs for case_match() has an example that uses argument .default = <<varname>> (e.g. species) to fill back in original data. This would work in mutate(), but not here in tibble() - using the approach requires multiple arguments specifying the same column/variable, which tibble() forbids:
tibble::tibble(
# Locate and remove non-name content:id_school_casematch=filenames|>stringr::str_remove(".csv$") |>stringr::str_remove("_?202[23]_?") |>stringr::str_remove("_?juniors_?") |>stringr::str_remove("_?roster_?"),
# Recode names:id_school_casematch=id_school_casematch|>dplyr::case_match(
"OU"~"oklahoma",
"SOU"~"southern_oregon",
.default=id_school_casematch
)
)
#> Error in `tibble::tibble()`:#> ! Column name `id_school_casematch` must not be duplicated.#> Use `.name_repair` to specify repair.#> Caused by error in `repaired_names()`:#> ! Names must be unique.#> ✖ These names are duplicated:#> * "id_school_casematch" at locations 1 and 2.
(also, even when using mutate or standard variable creation [<-], specifying any new column/variable requires 2 separate arguments/calls. Admittedly my personal opinion, but that does seems less concise/readable to me, and/or not fully capitalizing on the pipe-ability that seems to be case_match()'s key offering.)
Could case_match() perhaps have an option/default added, e.g. case_match(.x, ..., .default = .x), that would mirror recode()'s capabilities? I recognize you'd still need equivalents to recode()'s checks ensuring that new & pass-through content have the same type--but wouldn't that be manageable, given the vctrs underpinnings of this function?
The text was updated successfully, but these errors were encountered:
case_match()
is great to have as a more pipe-able alternative tocase_when()
following modern tidyverse approaches. However, it lacks an equivalent capabilities to one nice feature of its predecessorrecode()
: namely, where non-recoded elements could be allowed to pass through into the output. While there is a workaround for this missing feature provided in the help examples, I'm realizing it may not generalize well across use cases.An example similar to what I encountered:
Help docs for case_match() has an example that uses argument
.default = <<varname>>
(e.g. species) to fill back in original data. This would work inmutate()
, but not here intibble()
- using the approach requires multiple arguments specifying the same column/variable, which tibble() forbids:(also, even when using mutate or standard variable creation [<-], specifying any new column/variable requires 2 separate arguments/calls. Admittedly my personal opinion, but that does seems less concise/readable to me, and/or not fully capitalizing on the pipe-ability that seems to be
case_match()
's key offering.)Could
case_match()
perhaps have an option/default added, e.g.case_match(.x, ..., .default = .x)
, that would mirrorrecode()
's capabilities? I recognize you'd still need equivalents torecode()
's checks ensuring that new & pass-through content have the same type--but wouldn't that be manageable, given thevctrs
underpinnings of this function?The text was updated successfully, but these errors were encountered: