dplyr mutate：解决唯一名称错误

Question

dplyr mutate：解决唯一名称错误

我有一个包含 10 列的数据框。例如，这是虚拟版本：

df = tbl_df(replicate(10,sample(0:1,1000,rep=TRUE)))

Run Code Online (Sandbox Code Playgroud)

我想在 dplyr 中做到这一点：

df %>% mutate(V2 = ifelse(is.na(V6), V2, paste(V2,V3,sep=" ")))

Run Code Online (Sandbox Code Playgroud)

我获得：

Error: Each variable must have a unique name.

Run Code Online (Sandbox Code Playgroud)

但如果我这样做：

df$V2 = ifelse(is.na(df$V6), df$V2, paste(df$V2,df$V3,sep=" "))

Run Code Online (Sandbox Code Playgroud)

有用。

我怎样才能用dplyr语句做最后一步？

Answer 1

Aur*_*èle 8

正如@Lamia 所说，问题很可能在于重复的列名。

创建具有重复列名的示例数据框。你永远不应该这样做：

wrong_df <- data.frame(
  V1 = 1:3,
  V2 = 1:3,
  V3 = 1:3,
  V6 = c(4, NA, 6),
  V1 = 7:9,
  check.names = FALSE
)
wrong_df
#   V1 V2 V3 V6 V1
# 1  1  1  1  4  7
# 2  2  2  2 NA  8
# 3  3  3  3  6  9

Run Code Online (Sandbox Code Playgroud)

重现问题：

library(dplyr)
wrong_df %>% 
  mutate(V2 = ifelse(is.na(V6), V2, paste(V2, V3, sep = " ")))
# Error: Each variable must have a unique name.
# Problem variables: 'V1'

Run Code Online (Sandbox Code Playgroud)

通过使用make.names(). 请注意，第二V1列已重命名V1.1（请参阅参考资料help("make.names")）：

wrong_df %>% 
  setNames(make.names(names(.), unique = TRUE)) %>% 
  mutate(V2 = ifelse(is.na(V6), V2, paste(V2, V3, sep = " ")))
#   V1  V2 V3 V6 V1.1
# 1  1 1 1  1  4    7
# 2  2   2  2 NA    8
# 3  3 3 3  3  6    9

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，8 月前
查看次数：	11576 次
最近记录：	8 年，8 月前