我的数据看起来像这样:
SNP Geno Allele
marker1 G1 AA
marker2 G1 TT
marker3 G1 TT
marker1 G2 CC
marker2 G2 AA
marker3 G2 TT
marker1 G3 GG
marker2 G3 AA
marker3 G3 TT
Run Code Online (Sandbox Code Playgroud)
我希望它看起来像这样:
SNP Geno Allele1 Allele2
marker1 G1 A A
marker2 G1 T T
marker3 G1 T T
marker1 G2 C C
marker2 G2 A A
marker3 G2 T T
marker1 G3 G G
marker2 G3 A A
marker3 G3 T T
Run Code Online (Sandbox Code Playgroud)
我用这个:
strsplit(Allele, split extended = TRUE)
Run Code Online (Sandbox Code Playgroud)
但这不起作用.我需要其他命令吗?
Ben*_*Ben 12
另一种方法,从开始到结束:
制作可重复的数据:
dat <- read.table(header = TRUE, text = "SNP Geno Allele
marker1 G1 AA
marker2 G1 TT
marker3 G1 TT
marker1 G2 CC
marker2 G2 AA
marker3 G2 TT
marker1 G3 GG
marker2 G3 AA
marker3 G3 TT")
Run Code Online (Sandbox Code Playgroud)
UPDATED提取Allele列,将其拆分为单个字符,然后将这些字符分成数据框的两列:
无论是
dat1 <- data.frame(t(matrix(
unlist(strsplit(as.vector(dat$Allele), split = "")),
ncol = length(dat$Allele), nrow = 2)))
Run Code Online (Sandbox Code Playgroud)
或者遵循@joran的建议
dat1 <- data.frame(do.call(rbind, strsplit(as.vector(dat$Allele), split = "")))
Run Code Online (Sandbox Code Playgroud)
然后
将列名添加到新列:
names(dat1) <- c("Allele1", "Allele2")
Run Code Online (Sandbox Code Playgroud)
将两个新列附加到原始数据表中的列,如@ user1317221建议:
dat3 <- cbind(dat$SNP, dat$Geno, dat1)
dat$SNP dat$Geno Allele1 Allele2
1 marker1 G1 A A
2 marker2 G1 T T
3 marker3 G1 T T
4 marker1 G2 C C
5 marker2 G2 A A
6 marker3 G2 T T
7 marker1 G3 G G
8 marker2 G3 A A
9 marker3 G3 T T
Run Code Online (Sandbox Code Playgroud)