在 mutate 语句中动态引用列名 - dplyr

Question

在 mutate 语句中动态引用列名 - dplyr

我对这个长问题表示歉意，但很长一段时间后我自己也找不到解决方案。

我有这个玩具数据框

set.seed(23)
df <- tibble::tibble(
  id = paste0("00", 1:6),
  cond = c(1, 1, 2, 2, 3, 3),
  A_1 = sample(0:9, 6, replace = TRUE), A_2 = sample(0:9, 6, replace = TRUE), A_3 = sample(0:9, 6, replace = TRUE),
  B_1 = sample(0:9, 6, replace = TRUE), B_2 = sample(0:9, 6, replace = TRUE), B_3 = sample(0:9, 6, replace = TRUE),
  C_1 = sample(0:9, 6, replace = TRUE), C_2 = sample(0:9, 6, replace = TRUE), C_3 = sample(0:9, 6, replace = TRUE)
)

# A tibble: 6 x 11
#   id     cond   A_1   A_2   A_3   B_1   B_2   B_3   C_1   C_2   C_3
#   <chr> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 001       1     6     3     9     5     0     5     6     0     6
# 2 002       1     4     5     0     8     5     0     1     6     6
# 3 003       2     4     2     8     8     8     6     5     2     5
# 4 004       2     4     4     0     7     2     6     7     5     7
# 5 005       3     1     7     0     9     9     0     5     7     8
# 6 006       3     3     8     7     0     2     5     0     9     4

Run Code Online (Sandbox Code Playgroud)

我想创建三个变量A_def, B_def，它们仅采用相应变量<LETTER_NUMBER>C_def之一的值，具体取决于它们的后缀等于 variable 的条件。cond

例如，对于其中的行cond == 1，A_def应该具有来自的值A_1，B_def应该具有来自的值B_1，C_def应该具有来自的值C_1。同样，如果cond == 2，*_def列应具有来自相应*_2变量的值。

我设法通过两种方式实现我想要的输出：一种是硬编码（可能是为了避免 ifcond包含许多值），一种是使用tidyr的旋转函数。

硬编码解决方案：

df %>% mutate( A_def = ifelse(cond == 1, A_1, ifelse(cond == 2, A_2, A_3)), B_def = ifelse(cond == 1, B_1, ifelse(cond == 2, B_2, B_3)), C_def = ifelse(cond == 1, C_1, ifelse(cond == 2, C_2, C_3)) ) %>% select(id, cond, contains("_def"))
Run Code Online (Sandbox Code Playgroud)
tidyr的解决方案：

df %>% pivot_longer(cols = contains("_")) %>% mutate( number = gsub("[A-Za-z_]", "", name), name = gsub("[^A-Za-z]", "", name) ) %>% filter(cond == number) %>% pivot_wider(id_cols = c(id, cond), names_from = name, values_from = value, names_glue = "{name}_def")
Run Code Online (Sandbox Code Playgroud)
两种情况下的输出

# A tibble: 6 x 5 # id cond A_def B_def C_def # <chr> <dbl> <int> <int> <int> # 1 001 1 6 5 6 # 2 002 1 4 8 1 # 3 003 2 2 8 2 # 4 004 2 4 2 5 # 5 005 3 0 0 8 # 6 006 3 7 5 4
Run Code Online (Sandbox Code Playgroud)

现在，我想知道是否可以使用mutate和/或across以动态方式获得相同的输出（也许使用ifelse内部的语句mutate？）。我尝试了以下代码片段，但结果并不符合预期。在其中之一中，我尝试将变量名称设置为ifelse语句中的符号，但出现错误。

df %>% mutate(across(paste0(c("A", "B", "C"), "_1"), ~ifelse(cond == 1, cur_column(), ifelse(cond == 2, cur_column(), paste0(gsub("[^A-Za-z]", "", cur_column()), "_3"))))) %>% select(id, cond, contains("_1")) df %>% mutate_at(paste0(c("A", "B", "C"), "_1"), ~ifelse(cond == 1, ., ifelse(cond == 2, ., paste0(., "_2")))) %>% select(id, cond, contains("_1")) df %>% mutate_at(paste0(c("A", "B", "C"), "_1"), ~ifelse(cond == 1, !!!rlang::syms(paste0(c("A", "B", "C"), "_1")), ifelse(cond == 2, !!!rlang::syms(paste0(c("A", "B", "C"), "_2")), !!!rlang::syms(paste0(c("A", "B", "C"), "_3")))))
Run Code Online (Sandbox Code Playgroud)
问题：有没有办法使用dplyr's 语句mutate（或其取代的作用域变体）和/或获得与上述相同的所需输出across？

Answer 1

Ian*_*ell 2

我同意其他tidyr使代码更具可读性的评论，但这里有另一种方法pmap：

library(purrr)
library(rlang)
pmap_dfr(df, ~with(list(...), 
               set_names(c(id, cond, 
                           map_dbl(c("A","B","C"),
                                 ~ eval_tidy(parse_expr(paste(.x,cond,sep = "_"))))),
                          c("id","cond","A_def","B_def","C_def"))
               ))
# A tibble: 6 x 5
     id  cond A_def B_def C_def
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     6     5     6
2     2     1     4     8     1
3     3     2     2     8     2
4     4     2     4     2     5
5     5     3     0     0     8
6     6     3     7     5     4

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，1 月前
查看次数：	749 次
最近记录：	6 年，1 月前