使用重新编码,使用命名向量的命名列表跨多个列进行变异

hvg*_*pta 13 r recode

我找不到与我在这里遇到的问题类似的问题。我有一个非常大的命名向量列表,与数据框中的列名称匹配。我想使用命名向量列表来替换数据帧列中与每个列表元素名称匹配的值。也就是说,列表中向量的名称与数据帧列的名称匹配,每个向量元素中的键值对将用于重新编码该列。

代表如下:

library(tidyverse)

# Starting tibble
test <- tibble(Names = c("Alice","Bob","Cindy"),
               A = c(3,"q",7),
               B = c(1,2,"b"),
               C = c("a","g",9))

# Named vector
A <- c("5" = "alpha", "7" = "bravo", "3" = "charlie", "q" = "delta")
B <- c("1" = "yes", "2" = "no", "b" = "bad", "c" = "missing")
C <- c("9" = "beta", "8" = "gamma", "a" = "delta", "g" = "epsilon")

# Named list of named vectors
dicts <- list("A" = A, "B" = B, "C" = C) # Same names as columns
Run Code Online (Sandbox Code Playgroud)

我可以mutate手动使用和指定列和列表项。

# Works when replacement vector is specified
test %>% 
  mutate(across(c("A"), 
                ~recode(., !!!dicts$A)))
#> # A tibble: 3 x 4
#>   Names A       B     C    
#>   <chr> <chr>   <chr> <chr>
#> 1 Alice charlie 1     a    
#> 2 Bob   delta   2     g    
#> 3 Cindy bravo   b     9
Run Code Online (Sandbox Code Playgroud)

但是,以下方法不起作用:

# Does not work when replacement vector using column names
test %>% 
  mutate(across(c("A", "B", "C"), 
                ~recode(., !!!dicts$.)))
Run Code Online (Sandbox Code Playgroud)

错误:mutate()输入有问题..1。x 不提供替换品。我的输入..1(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ....

此外,我发现map2_dfr只有在指定所有非重新编码的列时才有效:

# map2_dfr Sort of works, but requires dropping some columns
map2_dfr(test %>% select(names(dicts)), 
         dicts, 
         ~recode(.x, !!!.y))
#> # A tibble: 3 x 3
#>   A       B     C      
#>   <chr>   <chr> <chr>  
#> 1 charlie yes   delta  
#> 2 delta   no    epsilon
#> 3 bravo   bad   beta
Run Code Online (Sandbox Code Playgroud)

我希望使用列表中的名称重新编码列,而不删除列。

Tho*_*ing 6

您可以尝试下面的基本 R 代码

idx <- match(names(dicts), names(test))
test[idx] <- Map(`[`, dicts, test[idx])
Run Code Online (Sandbox Code Playgroud)

这使

> test
# A tibble: 3 x 4
  Names A       B     C
  <chr> <chr>   <chr> <chr>
1 Alice charlie yes   delta
2 Bob   delta   no    epsilon
3 Cindy bravo   bad   beta
Run Code Online (Sandbox Code Playgroud)


Tim*_*Fan 5

以下是三种方法:

dplyr::across首先,我们可以使用 使其在自定义函数中 工作dplyr::cur_column()

library(tidyverse)

myfun <- function(x) {
  mycol <- cur_column()
  dplyr::recode(x, !!! dicts[[mycol]])
}

test %>% 
  mutate(across(c("A", "B", "C"), myfun))

#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta
Run Code Online (Sandbox Code Playgroud)

第二个选项是将 转换dicts为表达式列表,然后mutate使用!!!运算符将​​其拼接:

expr_ls <-  imap(dicts, ~ quo(recode(!!sym(.y), !!!.x)))

test %>% 
  mutate(!!! expr_ls)

#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta
Run Code Online (Sandbox Code Playgroud)

最后,在更大的 tidyverse 中,我们可以使用purrr::lmap_at,但它使底层函数比需要的更复杂:

myfun2 <- function(x) {
  x_nm <- names(x)
  mutate(x, !! x_nm := recode(!! sym(x_nm), !!! dicts[[x_nm]]))
}

lmap_at(test, 
        names(dicts),
        myfun2)
#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta
Run Code Online (Sandbox Code Playgroud)

原始数据

# Starting tibble
test <- tibble(Names = c("Alice","Bob","Cindy"),
               A = c(3,"q",7),
               B = c(1,2,"b"),
               C = c("a","g",9))

# Named vector
A <- c("5" = "alpha", "7" = "bravo", "3" = "charlie", "q" = "delta")
B <- c("1" = "yes", "2" = "no", "b" = "bad", "c" = "missing")
C <- c("9" = "beta", "8" = "gamma", "a" = "delta", "g" = "epsilon")

# Named list of named vectors
dicts <- list("A" = A, "B" = B, "C" = C) # Same names as columns
Run Code Online (Sandbox Code Playgroud)

由reprex 包于 2021 年 12 月 15 日创建(v2.0.1)