Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: In across, have function for .names argument #6972

Open
kylebutts opened this issue Nov 30, 2023 · 2 comments
Open

Feature Request: In across, have function for .names argument #6972

kylebutts opened this issue Nov 30, 2023 · 2 comments

Comments

@kylebutts
Copy link

One common thing that I want to do in the dplyr workflow is take a set of variables, apply some function to them, and create a new set of variables. across makes this really easy, but it almost always requires me updating the names in a second step with rename_with. I'm wondering if a function would be passed to .names that accepts the vector of column names and returns the resulting columns names.

For example, I would want:

data |> 
  mutate(across(
    matches("emp_[0-9]+"),
    ~ .x / emp_all, 
    .names = "share_{.col}"
  )) |>
  rename_with(
    \(str) gsub("^share_emp", "share_", str)
  ) 

to become:

data |> 
  mutate(across(
    matches("emp_[0-9]+"),
    ~ .x / emp_all, 
    .names = \(str) gsub("^share_emp", "share_", str)
  ))

Admittedly, I haven't thought of how that would play with other features of the function (e.g. multiple functions). Thanks for the consideration!

@kylebutts
Copy link
Author

Ahh, I think I figured it out. Since it's a call to glue(), you can pass a function within the {} like .names = "{gsub("emp_", "share_", .col)}. Would y'all recommend a PR with an example of this in the colwise vignette?

library(dplyr)
N = 100
data <- tibble(
  draw1 = rnorm(N),
  draw2 = rnorm(N),
  draw3 = rnorm(N)
)

data |> 
  mutate(sum = draw1 + draw2 + draw3) |>
  mutate(across(
    draw1:draw3, 
    function(x) x / sum, 
    # .names passas string to `glue()` function
    # so you can do fancy things:
    .names = "share_{gsub('draw', '', .col)}"
  ))
#> # A tibble: 100 × 7
#>      draw1  draw2   draw3    sum share_1 share_2 share_3
#>      <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
#>  1 -0.779  -1.38   0.293  -1.87   0.418   0.739  -0.157 
#>  2 -0.184  -0.538 -0.331  -1.05   0.175   0.511   0.315 
#>  3 -1.07   -1.08  -0.160  -2.31   0.463   0.468   0.0693
#>  4 -0.309   1.03   0.0462  0.767 -0.403   1.34    0.0602
#>  5  0.359  -0.484  0.857   0.732  0.491  -0.662   1.17  
#>  6  0.330   0.491  0.871   1.69   0.195   0.290   0.515 
#>  7 -0.0563 -0.579 -1.16   -1.80   0.0313  0.322   0.647 
#>  8  0.608   1.33   0.796   2.74   0.222   0.487   0.291 
#>  9 -1.95   -0.158 -0.752  -2.86   0.682   0.0554  0.263 
#> 10  0.267   0.373 -0.383   0.256  1.04    1.46   -1.50  
#> # ℹ 90 more rows

Created on 2023-11-30 with reprex v2.0.2

@moodymudskipper
Copy link

You can also rename the output of across since it's a tibble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants