在第4列中分隔两个字母的字符串

dat*_*kid 0 r tidyr data-science

我有一个数据框 - df - 与基因组数据.最后一个col有两个字母的变体.

               id crm     pos allele
160841  rs2237282  11 1273948     AG
160842  rs6417577  11 1276796     AC
165677  rs2151342  11 1199626     GT
165678  rs2749240  11 1258025     AG
Run Code Online (Sandbox Code Playgroud)

我想把最后的col分成两个一个字母的cols

               id crm     pos allele allele2
160841  rs2237282  11 1273948     A       G
160842  rs6417577  11 1276796     A       C
165677  rs2151342  11 1199626     G       T
165678  rs2749240  11 1258025     A       G
Run Code Online (Sandbox Code Playgroud)

我在使用dplyr和tidyr的RStudio 1.1.419,R 3.4.3中尝试过但没有成功:

  • 分开(df,allele,into = c("allele","allele2"))
  • 分开(df,allele,into = c("allele","allele2"),sep ="")
  • 分开(df,allele,into = c("allele","allele2"),sep ="\ c")
  • 分开(df,allele,into = c("allele","allele2"),sep =".")
  • 分开(df,allele,into = c("allele","allele2"),sep =.)
  • 分开(df,allele,into = c("allele","allele2"),sep =\c)

我如何最终得到所需的分裂?

Ony*_*mbu 6

使用基础r:

HERE=data.frame(A1=character(),A2=character())
cbind(data,strcapture("(.)(.)",data$allele,HERE))
              id crm     pos allele A1 A2
160841 rs2237282  11 1273948     AG  A  G
160842 rs6417577  11 1276796     AC  A  C
165677 rs2151342  11 1199626     GT  G  T
165678 rs2749240  11 1258025     AG  A  G
Run Code Online (Sandbox Code Playgroud)


G. *_*eck 5

separatesep参数可以是数字和表示在该分割,以便在字符位置:

separate(df, allele, into = c("allele1", "allele2"), sep = 1)
Run Code Online (Sandbox Code Playgroud)

赠送:

              id crm     pos allele1 allele2
160841 rs2237282  11 1273948       A       G
160842 rs6417577  11 1276796       A       C
165677 rs2151342  11 1199626       G       T
165678 rs2749240  11 1258025       A       G
Run Code Online (Sandbox Code Playgroud)