strsplit一列具有确切信息到两列

mar*_*rie 6 split r

我的数据看起来像这样:

    SNP Geno Allele
marker1   G1    AA
marker2   G1    TT
marker3   G1    TT
marker1   G2    CC
marker2   G2    AA
marker3   G2    TT
marker1   G3    GG
marker2   G3    AA
marker3   G3    TT
Run Code Online (Sandbox Code Playgroud)

我希望它看起来像这样:

    SNP Geno Allele1 Allele2
marker1   G1       A       A
marker2   G1       T       T
marker3   G1       T       T
marker1   G2       C       C
marker2   G2       A       A
marker3   G2       T       T
marker1   G3       G       G
marker2   G3       A       A
marker3   G3       T       T
Run Code Online (Sandbox Code Playgroud)

我用这个:

strsplit(Allele, split extended = TRUE)
Run Code Online (Sandbox Code Playgroud)

但这不起作用.我需要其他命令吗?

Ben*_*Ben 12

另一种方法,从开始到结束:

制作可重复的数据:

dat <- read.table(header = TRUE,  text = "SNP Geno    Allele
marker1 G1  AA
marker2 G1  TT
marker3 G1  TT
marker1 G2  CC
marker2 G2  AA
marker3 G2  TT
marker1 G3  GG
marker2 G3  AA
marker3 G3  TT")
Run Code Online (Sandbox Code Playgroud)

UPDATED提取Allele列,将其拆分为单个字符,然后将这些字符分成数据框的两列:

无论是

dat1 <- data.frame(t(matrix(
                     unlist(strsplit(as.vector(dat$Allele), split = "")), 
                     ncol = length(dat$Allele), nrow = 2)))
Run Code Online (Sandbox Code Playgroud)

或者遵循@joran的建议

dat1 <- data.frame(do.call(rbind, strsplit(as.vector(dat$Allele), split = "")))
Run Code Online (Sandbox Code Playgroud)

然后

将列名添加到新列:

names(dat1) <- c("Allele1", "Allele2")
Run Code Online (Sandbox Code Playgroud)

将两个新列附加到原始数据表中的列,如@ user1317221建议:

dat3 <- cbind(dat$SNP, dat$Geno, dat1)
        dat$SNP dat$Geno Allele1 Allele2
1 marker1       G1       A       A
2 marker2       G1       T       T
3 marker3       G1       T       T
4 marker1       G2       C       C
5 marker2       G2       A       A
6 marker3       G2       T       T
7 marker1       G3       G       G
8 marker2       G3       A       A
9 marker3       G3       T       T
Run Code Online (Sandbox Code Playgroud)