是否有一个工具可以在R中进行以下合并?

Joh*_*han -2 merge r

我正在等待间隔:

chr1:004336501-004336560   0.3437   
chr1:004340069-004340128   0.3437   
chr1:004350335-004350394   0.3437   
chr1:004354213-004354272   0.3218   
chr1:004380332-004380391   0.3218   
chr1:004481060-004481119   0.3218   
chr1:004488728-004488787   0.3607   
...
Run Code Online (Sandbox Code Playgroud)

我想得到以下内容:

chr1  004336501  004350394  0.3437
chr1  004354213  004481119  0.3218
...
Run Code Online (Sandbox Code Playgroud)

如果没有R功能,我会感到惊讶.我不想在R中使用循环,因为文件很大.我很感激任何我可以发展的建议.

谢谢!

akr*_*run 5

你也可以尝试:

 library(data.table)
 library(devtools)
 source_gist(11380733)

 #Updated based on @Ananda Mahto's comments
 DT <- cSplit(df, "V1", "[:-]", fixed = FALSE)[,
          list(chr = V1_1[1], First = V1_2[1], Last = V1_3[.N]), by = V2]
 setkey(DT,V2)

 DT
 #      V2  chr     First     Last
 #1: 0.3218 chr1 004354213 004481119
 #2: 0.3437 chr1 004336501 004350394
 #3: 0.3607 chr1 004488728 004488787
Run Code Online (Sandbox Code Playgroud)

或者用于regex将多个分隔符更改为单个分隔符.

 DT1 <- cSplit(transform(df, V1=gsub(":", "-", V1)),
            "V1", "-")[,list(Chr=V1_1[1], ColN1=V1_2[1], ColN2=V1_3[.N]), by=V2]
 setkey(DT1, V2)
  DT1
  #      V2  Chr     ColN1     ColN2
  #1: 0.3218 chr1 004354213 004481119
  #2: 0.3437 chr1 004336501 004350394
  #3: 0.3607 chr1 004488728 004488787
Run Code Online (Sandbox Code Playgroud)

数据

 df <- structure(list(V1 = c("chr1:004336501-004336560", "chr1:004340069-004340128", 
 "chr1:004350335-004350394", "chr1:004354213-004354272", "chr1:004380332-004380391", 
 "chr1:004481060-004481119", "chr1:004488728-004488787"), V2 = c(0.3437, 
 0.3437, 0.3437, 0.3218, 0.3218, 0.3218, 0.3607)), .Names = c("V1", 
 "V2"), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)