Sam*_*bus 2 r character dataframe
我有一个数据框,其中包含一个长字符串,每个字符串与一个'Sample'相关联:
Sample Data
1 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
Run Code Online (Sandbox Code Playgroud)
我想用一种简单的方法将这个字符串打成5个片段,格式如下:
Sample X
CCT6 - Characters 1-33
GAT1 - Characters 34-68
IMD3 - Characters 69-99
PDR3 - Characters 100-130
RIM15 - Characters 131-168
Run Code Online (Sandbox Code Playgroud)
为每个样本提供如下所示的输出:
Sample 1
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N
Run Code Online (Sandbox Code Playgroud)
我已经能够使用该substr函数将长字符串分解为单个部分,但是id能够自动化它,因此我可以在一个输出中获得所有5个部分.理想情况下,此输出也是数据帧.
这?read.fwf是为了什么.
首先是一些看起来像你的问题的数据:
x <- data.frame(Sample = c(1, 2), Data = c("000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N",
"000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N"),
stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)
现在使用read.fwf,指定每个字段的宽度及其名称,并且所有字段都应该是模式character.我们将示例数据的文本列包装起来,textConnection以便我们可以将其视为通常由read.*其他函数理解的连接.
(strs <- read.fwf(textConnection(x$Data), widths = c(33, 35, 31, 31, 38), colClasses = "character", col.names = c("CCT6", "GAT1", "IMD3", "PDR3", "RIM15")))
CCT6 GAT1 IMD3 PDR3 RIM15
1 000000000000000000000000000N01000 000000000N0N000000000N00N0000NN00N0 N000000100000N00N0N0000000NNNN0 1111111111111111111111111111111 0000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000 000000000N0N000000000N00N0000NN00N0 N000000100000N00N0N0000000NNNN0 1111111111111111111111111111111 0000000000000000000N000000N0000000000N
Run Code Online (Sandbox Code Playgroud)
现在循环遍历行并按照您的示例打印出每个行:
for (i in 1:nrow(strs)) {
writeLines(paste("Sample", i))
writeLines(paste(names(strs), strs[i, ], sep = " - "))
}
Run Code Online (Sandbox Code Playgroud)
举例来说:
Sample 2
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
475 次 |
| 最近记录: |