Bio*_*azy 11 split r dataframe
我df喜欢这样的:
df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))
> df
FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C
Run Code Online (Sandbox Code Playgroud)
我希望有这样的输出:
> df
X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C
Run Code Online (Sandbox Code Playgroud)
到目前为止,我尝试了这个例子:在数据框中的分隔符处拆分列但是没有重复值而没有拆分列,我得到的是:
df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))
> df
X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C
Run Code Online (Sandbox Code Playgroud)
我也得到这个警告:
警告信息:在rbind中(c("A","B","C"),c("A","B"),c("B","C"),"A","C" ):结果列数不是矢量长度的倍数(arg 2)
在这些情况下我该怎么办?最好用baseR
简单地说:
splt <- strsplit(as.character(df$FOO),"\\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x){all_val[!(all_val %in% x)]<-NA;all_val}))
# [,1] [,2] [,3]
#[1,] "A" "B" "C"
#[2,] "A" "B" NA
#[3,] NA "B" "C"
#[4,] "A" NA NA
#[5,] NA NA "C"
Run Code Online (Sandbox Code Playgroud)
数据:
df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))
Run Code Online (Sandbox Code Playgroud)
请注意:
我的版本是base::(不需要库)和一般:
它也适用于:
df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))
Run Code Online (Sandbox Code Playgroud)
忽略了OP要求base R解决方案.请尝试@ AndreElrico,@ r.user.05apr或@milan的解决方案.
这可以cSplit_e从splitstackshape包中完成:
library(splitstackshape)
cSplit_e(
data = df,
split.col = "FOO",
sep = "|",
mode = "value",
type = "character",
fill = " ",
drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B
#3 B C
#4 A
#5 C
Run Code Online (Sandbox Code Playgroud)
在以下df的情况下也可以使用(参见上面的OP评论).
(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F
cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C
#2 A B
#3 B C
#4 A
#5 C
#6 B D F
Run Code Online (Sandbox Code Playgroud)