我df看起来像这样:
ID Country
55 Poland
55 Romania
55 France
98 Spain
98 Portugal
98 UK
65 Germany
67 Luxembourg
84 Greece
22 Estonia
22 Lithuania
Run Code Online (Sandbox Code Playgroud)
其中一些ID重复,因为它们属于同一组。我想要做的是将paste所有内容都Country同在一起ID,得到这样的输出。
到目前为止,我尝试了,
ifelse(df[duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE),], paste('Countries', df$Country), NA)但这并没有检索预期的输出。
使用 data.table
library(data.table)
setDT(df)[, New_Name := c(paste0(Country, collapse = " + ")[1L], rep(NA, .N -1)), by = ID]
#df
#ID Country New_Name
#1: 55 Poland Poland + Romania + France
#2: 55 Romania <NA>
#3: 55 France <NA>
#4: 98 Spain Spain + Portugal + UK
#5: 98 Portugal <NA>
#6: 98 UK <NA>
#7: 65 Germany Germany
#8: 67 Luxembourg Luxembourg
#9: 84 Greece Greece
#10: 22 Estonia Estonia + Lithuania
#11: 22 Lithuania <NA>
Run Code Online (Sandbox Code Playgroud)
使用基数R
replace(v1 <- with(df, ave(as.character(Country), ID, FUN = toString)), duplicated(v1), NA)
#[1] "Poland, Romania, France" NA NA "Spain, Portugal, UK" NA NA "Germany" "Luxembourg" "Greece" "Estonia, Lithuania"
#[11] NA
Run Code Online (Sandbox Code Playgroud)