mah*_*ood 4 loops r strsplit rbind
我有一个这样的数据框(4行和5列):
Marker ind1 ind2 ind3 ind4
mark1 CT TT CT TT
mark2 AG AA AG AA
mark3 AC AA AC AA
mark4 CT TT CT TT
Run Code Online (Sandbox Code Playgroud)
我想要做的是将每个列(第一个coloumn除外)拆分为两列.所以输出应该像这样(4行9列):
Marker ind1 ind1 ind2 ind2 ind3 ind3 ind4 ind4
mark1 C T T T C T T T
mark2 A G A A A G A A
mark3 A C A A A C A A
mark4 C T T T C T T T
Run Code Online (Sandbox Code Playgroud)
我知道如何拆分一列
do.call(rbind,strsplit(test$JRP4RA6119.039, ""))
Run Code Online (Sandbox Code Playgroud)
这给了这个:
[,1] [,2]
[1,] "C" "T"
[2,] "A" "G"
[3,] "A" "C"
[4,] "C" "T"
Run Code Online (Sandbox Code Playgroud)
我想要的是能够循环这个并为一个数据帧中的所有列.
提前致谢.
我觉得这有点牵强,但是:
test_split <- data.frame(Marker=test$Marker,
do.call("cbind", lapply(apply(test[, -1], 2, strsplit, ""),
function(x) do.call("rbind", x))),
stringsAsFactors=F)
colnames(test_split)[-1] <- paste(rep(colnames(test)[-1], e=2), 1:2, sep="_")
test_split
# Marker JRP4RA6119.039_1 JRP4RA6119.039_2 JRP4RA6124.029_1 JRP4RA6124.029_2 JRP4RA6133.051_1 JRP4RA6133.051_2 JRP4RA6125.009_1 JRP4RA6125.009_2
#1 s7e4419xxx C T T T C T T T
#2 s7e7001s01 A G A A A G A A
#3 s7e3049xxx A C A A A C A A
#4 s7e4727xxx C T T T C T T T
Run Code Online (Sandbox Code Playgroud)
您也可以尝试cSplit_f从splitstackshape
library(splitstackshape)
df1[-1] <- lapply(df1[-1] , function(x)
gsub('(?<=\\w)(?=\\w)', ',', x, perl=TRUE))
cSplit_f(df1, 2:ncol(df1), sep=',')
# Marker ind1_1 ind1_2 ind2_1 ind2_2 ind3_1 ind3_2 ind4_1 ind4_2
#1: mark1 C T T T C T T T
#2: mark2 A G A A A G A A
#3: mark3 A C A A A C A A
#4: mark4 C T T T C T T T
Run Code Online (Sandbox Code Playgroud)
或者正如@Ananda Mahto建议的那样,cSplit对大型数据集可能更有效,并且可以直接使用它而无需更改分隔符.
cSplit(df1, names(df1)[-1], sep="", stripWhite = FALSE)
# Marker ind1_1 ind1_2 ind2_1 ind2_2 ind3_1 ind3_2 ind4_1 ind4_2
#1: mark1 C T T T C T T T
#2: mark2 A G A A A G A A
#3: mark3 A C A A A C A A
#4: mark4 C T T T C T T T
Run Code Online (Sandbox Code Playgroud)
或者使用tstrsplit来自data.table
library(data.table)#v1.9.5+
setDT(df1)
cbind(Marker=df1$Marker,df1[, unlist(lapply(.SD, function(x)
tstrsplit(x, '')), recursive=FALSE), .SDcols=-1])
# Marker ind11 ind12 ind21 ind22 ind31 ind32 ind41 ind42
#1: mark1 C T T T C T T T
#2: mark2 A G A A A G A A
#3: mark3 A C A A A C A A
#4: mark4 C T T T C T T T
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(Marker = c("mark1", "mark2", "mark3", "mark4"),
ind1 = c("CT", "AG", "AC", "CT"), ind2 = c("TT", "AA", "AA",
"TT"), ind3 = c("CT", "AG", "AC", "CT"), ind4 = c("TT", "AA",
"AA", "TT")), .Names = c("Marker", "ind1", "ind2", "ind3",
"ind4"), class = "data.frame", row.names = c(NA, -4L))
Run Code Online (Sandbox Code Playgroud)