Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable scoping issue with .data inside lambda Ffunction used in across() #7016

Open
bakaburg1 opened this issue Apr 20, 2024 · 2 comments
Open

Comments

@bakaburg1
Copy link

Hello,

This error was driving me crazy and took me a while to isolate it.
if one wants to use .data into a lambda function into across, you cannott index .data with variables created into the lambda function itself, otherwise R will complain that such variable doesn't exist!

example:

iris |> mutate(
    across("Sepal.Length", \(x) {
        other_col_name <- "Sepal.Width"
        
        other_col_val <- .data[[other_col_name]]
        
        x + other_col_val
    })
)
> Error in .f(.x[[i]], ...) : object 'other_col_name' not found

Which was driving me crazy, since the variable clearly exists in the scope.

but if other_col_name is defined outside no problem:

other_col_name <- "Sepal.Width"
iris |> mutate(
    across("Sepal.Length", \(x) {
        
        other_col_val <- .data[[other_col_name]]
        
        x + other_col_val
    })
)
# All fine

If this is by design and you don't plan to fix it, could be useful to have a clearer error and some extra documentation somewhere!

There cases in which for example the index name is defined dynamically based on the x or cur_column() value. Now that I now the issue I'll use pick()[1], unless you have better solutions.

@bakaburg1 bakaburg1 changed the title Odd error if I index .data inside a lambda function in across() with variable created in the lambda Variable scope issue with .data inside lambda Ffunction used in across() Apr 20, 2024
@bakaburg1 bakaburg1 changed the title Variable scope issue with .data inside lambda Ffunction used in across() Variable scoping issue with .data inside lambda Ffunction used in across() Apr 20, 2024
@MrFlick
Copy link

MrFlick commented Apr 24, 2024

I noticed a possibly related problem from this Stack Overflow question. This code results in an error

iris |>
  group_nest(Species) |>
  mutate(boo = map(data, function(x) {
    colsym <- sym("Sepal.Length")
    x %>% mutate(newcol =! !colsym)
  }))
# Error: object 'colsym' not found

Same issue if using .data

iris |>
  group_nest(Species) |>
  mutate(boo = map(data, function(x) {
    colname <- "Sepal.Length"
    x %>% mutate(uncle=.data[[colname]])
  }))

But if you define the function first rather than inline, it will run

helper <- \(x) {
  colsym <- sym("Sepal.Length")
  x %>% mutate(newcol=!!colsym)
}

iris |>
  group_nest(Species) |>
  mutate(data = map(data, helper))

Looking at the trace, it seems the problem is actually coming from rlang::quos. Something else that will trigger the error is

quos(\(x) {bee <- colsym::sym("a"); mutate(x, newcol=!!colsym)})

Basically the !! part is being evaluated when defining the function, not when calling the function. Is there a way to delay the evaluation of the !! or .data[[]] when applied to functions? Tested with rlang_1.1.3, purrr_1.0.2, dplyr_1.1.4

@moodymudskipper
Copy link

You'd have the same issue if you nest bquote() calls, the .() are substituted by the outside calls, similar if you nest substitute() calls.

You can use the following trick:

protect <- function(expr) call("!", call("!", substitute(expr)))
rlang::expr(!!protect(hello))
#> !!hello
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.3
iris |>
  group_nest(Species) |>
  mutate(boo = map(data, function(x) {
    colsym <- sym("Sepal.Length")
    x %>% mutate(newcol = !!protect(colsym))
  }))
#> # A tibble: 3 × 3
#>   Species                  data boo              
#>   <fct>      <list<tibble[,4]>> <list>           
#> 1 setosa               [50 × 4] <tibble [50 × 5]>
#> 2 versicolor           [50 × 4] <tibble [50 × 5]>
#> 3 virginica            [50 × 4] <tibble [50 × 5]>

For data it's a bit different I think, in the given examples .data shouldn't be used, it's not to be considered as an object in scope, but as a special operator at the top level for mutate, it's not very clear from the dot though. It seems a macro is run (so if you set a browser() in the function it won't be triggered for instance) when we use .data, where using means calling it with brackets. This works around it :

iris |>
  group_nest(Species) |>
  mutate(boo = map(data, ~ {
    colname <- "Sepal.Length"
    .x %>% mutate(uncle= (.data)[[colname]])
  }))
#> # A tibble: 3 × 3
#>   Species                  data boo              
#>   <fct>      <list<tibble[,4]>> <list>           
#> 1 setosa               [50 × 4] <tibble [50 × 5]>
#> 2 versicolor           [50 × 4] <tibble [50 × 5]>
#> 3 virginica            [50 × 4] <tibble [50 × 5]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants