每当列名出现在不同列中时,将列值更改为 NA

use*_*797 6 r dplyr

假设我有以下(简化的)数据框:

set.seed(123)
df <- data.frame("extracolumns" = rep("random", 6),
                 "which.x" = c("x1", "x2", "x3", "x2", "x3", "x1"),
                 "which.y" = c("y2", "y2", "y2", "y1", "y1", "y1"),
                 "which.z" = c("z3", "z3", "z3", "z1", "z2", "z1"),
                 "x1" = rnorm(6),
                 "x2" = rnorm(6),
                 "x3" = rnorm(6),
                 "y1" = rnorm(6),
                 "y2" = rnorm(6),
                 "y3" = rnorm(6),
                 "z1" = rnorm(6),
                 "z2" = rnorm(6),
                 "z3" = rnorm(6)) %>%
  mutate_if(is.numeric, round, 2)
Run Code Online (Sandbox Code Playgroud)

这使

  extracolumns which.x which.y which.z    x1    x2    x3    y1    y2    y3    z1    z2    z3
1       random      x1      y2      z3 -0.56  0.46  0.40  0.70 -0.63  0.43  0.55 -1.27  0.78
2       random      x2      y2      z3 -0.23 -1.27  0.11 -0.47 -1.69 -0.30 -0.06  2.17 -0.08
3       random      x3      y2      z3  1.56 -0.69 -0.56 -1.07  0.84  0.90 -0.31  1.21  0.25
4       random      x2      y1      z1  0.07 -0.45  1.79 -0.22  0.15  0.88 -0.38 -1.12 -0.03
5       random      x3      y1      z2  0.13  1.22  0.50 -1.03 -1.14  0.82 -0.69 -0.40 -0.04
6       random      x1      y1      z1  1.72  0.36 -1.97 -0.73  1.25  0.69 -0.21 -0.47  1.37
Run Code Online (Sandbox Code Playgroud)

我想改变 df 中的值,使得每行包含“which.x”、“which.y”和“which.z”中出现的列的 NA 值。就像是:


for(i in 1:nrow(df)) {
  
  df[i, match(df$which.x, colnames(df))[i]] <- NA
  df[i, match(df$which.y, colnames(df))[i]] <- NA
  df[i, match(df$which.z, colnames(df))[i]] <- NA
  
}
Run Code Online (Sandbox Code Playgroud)

这给出了所需的输出:

> df
  extracolumns which.x which.y which.z    x1    x2    x3    y1    y2    y3    z1    z2    z3
1       random      x1      y2      z3    NA  0.46  0.40  0.70    NA  0.43  0.55 -1.27    NA
2       random      x2      y2      z3 -0.23    NA  0.11 -0.47    NA -0.30 -0.06  2.17    NA
3       random      x3      y2      z3  1.56 -0.69    NA -1.07    NA  0.90 -0.31  1.21    NA
4       random      x2      y1      z1  0.07    NA  1.79    NA  0.15  0.88    NA -1.12 -0.03
5       random      x3      y1      z2  0.13  1.22    NA    NA -1.14  0.82 -0.69    NA -0.04
6       random      x1      y1      z1    NA  0.36 -1.97    NA  1.25  0.69    NA -0.47  1.37
Run Code Online (Sandbox Code Playgroud)

我想获得相同的期望结果,但更有效/更优雅——也许使用像across()and这样的函数case_when()——但我无法让它工作。

先感谢您!

r2e*_*ans 3

文字路径:

df %>%
  mutate(
    across(x1:x3, ~ if_else(cur_column() == which.x, .[NA], .)), 
    across(y1:y3, ~ if_else(cur_column() == which.y, .[NA], .)), 
    across(z1:z3, ~ if_else(cur_column() == which.z, .[NA], .))
  ) 
#   extracolumns which.x which.y which.z    x1    x2    x3    y1    y2    y3    z1    z2    z3
# 1       random      x1      y2      z3    NA  0.46  0.40  0.70    NA  0.43  0.55 -1.27    NA
# 2       random      x2      y2      z3 -0.23    NA  0.11 -0.47    NA -0.30 -0.06  2.17    NA
# 3       random      x3      y2      z3  1.56 -0.69    NA -1.07    NA  0.90 -0.31  1.21    NA
# 4       random      x2      y1      z1  0.07    NA  1.79    NA  0.15  0.88    NA -1.12 -0.03
# 5       random      x3      y1      z2  0.13  1.22    NA    NA -1.14  0.82 -0.69    NA -0.04
# 6       random      x1      y1      z1    NA  0.36 -1.97    NA  1.25  0.69    NA -0.47  1.37
Run Code Online (Sandbox Code Playgroud)

我使用.[NA]代替只是NA为了清楚 的类型NA:R 至少有 8 种不同类型的NA,因此我们需要明确是哪一种;在这种情况下,它会是NA_real_.