我有一个表格如下:
Table1 <- data.frame(
"Random" = c("A", "B", "C"),
"Genes" = c("Apple", "Candy", "Toothpaste"),
"Extra" = c("Up", "", "Down"),
"Desc" = c("Healthy,Red,Fruit", "Sweet,Cavities,Sugar,Fruity", "Minty,Dentist")
)
Run Code Online (Sandbox Code Playgroud)
赠送:
Random Genes Extra Desc
1 A Apple Up Healthy,Red,Fruit
2 B Candy Sweet,Cavities,Sugar,Fruity
3 C Toothpaste Down Minty,Dentist
Run Code Online (Sandbox Code Playgroud)
我有另一个包含描述的表,并希望添加Genes列.例如,Table2将是:
Table2 <- data.frame(
"Col1" = c(1, 2, 3, 4, 5, 6),
"Desc" = c("Sweet", "Sugar", "Dentist", "Red", "Fruit", "Fruity")
)
Run Code Online (Sandbox Code Playgroud)
赠送:
Col1 Desc
1 1 Sweet
2 2 Sugar
3 3 Dentist
4 4 Red
5 5 Fruit
6 6 Fruity
Run Code Online (Sandbox Code Playgroud)
我想在Table2中添加另一个名为"Genes"的列,它与两个表中的"Desc"相匹配,并添加Table1中的Genes来获取:
Col1 Desc Gene
1 1 Sweet Candy
2 2 Sugar Candy
3 3 Dentist Toothpaste
4 4 Red Apple
5 5 Fruit Apple
6 6 Fruity Candy
Run Code Online (Sandbox Code Playgroud)
你可以尝试cSplit从splitstackshape在"表1"分裂"说明"栏和"宽"转换数据集"长"格式.输出将是一个data.table.我们可以使用data.table方法将键列设置为'Desc'(setkey),与"Table2"连接,最后通过选择列或将:=不需要的列分配给NULL来删除输出中不需要的列
library(splitstackshape)
setkey(cSplit(Table1, 'Desc', ',', 'long'),Desc)[Table2[2:1]][
,c(5,4,2), with=FALSE]
# Col1 Desc Genes
#1: 1 Sweet Candy
#2: 2 Sugar Candy
#3: 3 Dentist Toothpaste
#4: 4 Red Apple
#5: 5 Fruit Apple
#6: 6 Fruity Candy
Run Code Online (Sandbox Code Playgroud)
以下是基本R中使用中间链接表的方法:
# create an intermediate data.frame with all the key (Desc) / value (Gene) pairs
df <- NULL
for(i in seq(nrow(Table1)))
df <- rbind(df,
data.frame(Gene =Table1$Genes[i],
Desc =strsplit(as.character(Table1$Desc)[i],',')[[1]]))
df
#> Gene Desc
#> 1 Apple Healthy
#> 2 Apple Red
#> 3 Apple Fruit
#> 4 Candy Sweet
#> 5 Candy Cavities
#> 6 Candy Sugar
#> 7 Candy Fruity
#> 8 Toothpaste Minty
#> 9 Toothpaste Dentist
Run Code Online (Sandbox Code Playgroud)
现在以通常的方式链接到它:
Table2$Gene <- df$Gene[match(Table2$Desc,df$Desc)]
Run Code Online (Sandbox Code Playgroud)