我正在等待间隔:
chr1:004336501-004336560 0.3437
chr1:004340069-004340128 0.3437
chr1:004350335-004350394 0.3437
chr1:004354213-004354272 0.3218
chr1:004380332-004380391 0.3218
chr1:004481060-004481119 0.3218
chr1:004488728-004488787 0.3607
...
Run Code Online (Sandbox Code Playgroud)
我想得到以下内容:
chr1 004336501 004350394 0.3437
chr1 004354213 004481119 0.3218
...
Run Code Online (Sandbox Code Playgroud)
如果没有R功能,我会感到惊讶.我不想在R中使用循环,因为文件很大.我很感激任何我可以发展的建议.
谢谢!
你也可以尝试:
library(data.table)
library(devtools)
source_gist(11380733)
#Updated based on @Ananda Mahto's comments
DT <- cSplit(df, "V1", "[:-]", fixed = FALSE)[,
list(chr = V1_1[1], First = V1_2[1], Last = V1_3[.N]), by = V2]
setkey(DT,V2)
DT
# V2 chr First Last
#1: 0.3218 chr1 004354213 004481119
#2: 0.3437 chr1 004336501 004350394
#3: 0.3607 chr1 004488728 004488787
Run Code Online (Sandbox Code Playgroud)
或者用于regex将多个分隔符更改为单个分隔符.
DT1 <- cSplit(transform(df, V1=gsub(":", "-", V1)),
"V1", "-")[,list(Chr=V1_1[1], ColN1=V1_2[1], ColN2=V1_3[.N]), by=V2]
setkey(DT1, V2)
DT1
# V2 Chr ColN1 ColN2
#1: 0.3218 chr1 004354213 004481119
#2: 0.3437 chr1 004336501 004350394
#3: 0.3607 chr1 004488728 004488787
Run Code Online (Sandbox Code Playgroud)
df <- structure(list(V1 = c("chr1:004336501-004336560", "chr1:004340069-004340128",
"chr1:004350335-004350394", "chr1:004354213-004354272", "chr1:004380332-004380391",
"chr1:004481060-004481119", "chr1:004488728-004488787"), V2 = c(0.3437,
0.3437, 0.3437, 0.3218, 0.3218, 0.3218, 0.3607)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)