Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A proposed function to port plyr::mapvalues() to dplyr #7027

Closed
thomasjwood opened this issue May 14, 2024 · 1 comment
Closed

A proposed function to port plyr::mapvalues() to dplyr #7027

thomasjwood opened this issue May 14, 2024 · 1 comment

Comments

@thomasjwood
Copy link

A proposed function to adapt a typical plyr::mapvalues use case to dplyr

The advent of case_match has brought many of the recoding functionality of plyr::mapvalues into dplyr.

However, a use case remains where there is a long vector of from values, and a long vector of to values, and plyr::mapvalues lets you concisely express this in a way that case_when() and case_match() don't replicate.

My (primitive! risible!) attempt at a working function (I understand it would have to be adapted to match tidyverse style if deemed a candidate for inclusion

library(Rcpp)

# Define the C++ function for character vectors
cppFunction('
CharacterVector case_map_cpp(CharacterVector x, CharacterVector from, CharacterVector to) {
    int n = x.size();
    CharacterVector result(n);
    for (int i = 0; i < n; ++i) {
        std::string value = Rcpp::as<std::string>(x[i]);
        for (int j = 0; j < from.size(); ++j) {
            if (value == Rcpp::as<std::string>(from[j])) {
                result[i] = to[j];
                break;
            } else {
                result[i] = x[i];
            }
        }
    }
    return result;
}
')

# R wrapper function to handle any types by converting them to character vectors
case_map <- function(x, from, to) {
  # Convert all input to character vectors to standardize processing
  x_char <- as.character(x)
  from_char <- as.character(from)
  to_char <- as.character(to)
  
  # Call the C++ function
  mapped_values <- case_map_cpp(x_char, from_char, to_char)
  
  # Check if the original input was numeric and if all 'to' values can be converted to numeric
  # Use 'suppressWarnings' to avoid warnings during conversion checks
  should_convert_back <- is.numeric(x) && !any(is.na(suppressWarnings(as.numeric(to_char))))
  
  # Convert back to numeric if appropriate, otherwise return as character
  if (should_convert_back) {
    return(as.numeric(mapped_values))
  } else {
    return(mapped_values)
  }
}

It seems reasonably performant

fv <- sample(LETTERS[1:5], size = 1e4, T)

microbenchmark::microbenchmark(
  case_map(
    fv,
    LETTERS[1:4],
    letters[1:4]
    )
  )

microbenchmark::microbenchmark(
  plyr::mapvalues(
    fv,
    LETTERS[1:4],
    letters[1:4]
  )
)

microbenchmark::microbenchmark(
  case_match(
    fv,
    "A" ~ "a",
    "B" ~ "b",
    "C" ~ "c",
    "D" ~ "d"
    )
  )

should return

    expr   min    lq    mean median    uq    max neval
 688.2 710.3 729.576 714.55 720.6 1181.3   100

   expr   min    lq    mean median    uq  max neval
 545.1 694.5 823.852  759.2 785.4 8377   100
  
     expr    min     lq     mean median     uq    max neval
 1.1532 1.3287 1.517161  1.347 1.3875 9.4159   100

@DavisVaughan
Copy link
Member

DavisVaughan commented May 14, 2024

Eventually we will add vec_case_match() to vctrs, which is a more programmatic version you can do this with. It is currently prototyped in dplyr and powers case_match(). It would look like this to get the same results as above:

fv <- sample(LETTERS[1:5], size = 1e4, T)

dplyr:::vec_case_match(
  needles = fv,
  haystacks = as.list(LETTERS[1:4]),
  values = as.list(letters[1:4])
)

See also r-lib/vctrs#1622. Going to close this in favor of that, since it is the ultimate solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants