定期拆分字符串

Question

定期拆分字符串

我想定期拆分一个字符串.我的问题几乎与这个问题相同:如何将字符串拆分为给定长度的子字符串？除了我在数据集中有一列字符串而不是一个字符串.

这是一个示例数据集:

df = read.table(text = "
my.id   X1    
010101   1
010102   1
010103   1
010104   1
020101   1
020112   1
021701   0
021802   0
133301   0
133302   0  
241114   0
241215   0
", header = TRUE, colClasses=c('character', 'numeric'), stringsAsFactors = FALSE)

Run Code Online (Sandbox Code Playgroud)

这是期望的结果.我更愿意删除前导零,如图所示:

desired.result = read.table(text = "
A1 A2 A3   X1
 1  1  1   1
 1  1  2   1
 1  1  3   1
 1  1  4   1
 2  1  1   1
 2  1 12   1
 2 17  1   0
 2 18  2   0
13 33  1   0
13 33  2   0
24 11 14   0
24 12 15   0
", header = TRUE, colClasses=c('numeric', 'numeric', 'numeric', 'numeric'), stringsAsFactors = FALSE)

Run Code Online (Sandbox Code Playgroud)

这是一个似乎接近的循环,也许我可以使用它.但是,我认为可能有一种更有效的方式.

for(i in 1:nrow(df)) {
     print(substring(df$my.id[i], seq(1, 5, 2), seq(2, 6, 2)))
}

Run Code Online (Sandbox Code Playgroud)

此apply声明不起作用:

apply(df$my.id, 1,  function(x) substring(df$my.id[x], seq(1, 5, 2), seq(2, 6, 2))   )

Run Code Online (Sandbox Code Playgroud)

谢谢你的任何建议.我更喜欢基地R的解决方案.

Answer 1

42-*_*42- 10

我发现read.fwf应用于a textConnection是最有效和易于理解的各种方法.它具有内置于read.*函数中的自动类检测的优点.

cbind( read.fwf(file=textConnection(df$my.id), 
              widths=c(2,2,2), col.names=paste0("A", 1:3)), 
     X1=df$X1)
#-----------
   A1 A2 A3 X1
1   1  1  1  1
2   1  1  2  1
3   1  1  3  1
4   1  1  4  1
5   2  1  1  1
6   2  1 12  1
7   2 17  1  0
8   2 18  2  0
9  13 33  1  0
10 13 33  2  0
11 24 11 14  0
12 24 12 15  0

Run Code Online (Sandbox Code Playgroud)

(我相信大约6年前我从Rachp的Gabor Grothendieck那里学到了这一点.)

如果您更喜欢正则表达式策略,那么请查看每两个位置插入一个选项卡并通过read.table运行它.非常紧凑:

read.table(text=gsub('(.{2})','\\1\t',df$my.id) )
#---------
   V1 V2 V3
1   1  1  1
2   1  1  2
3   1  1  3
4   1  1  4
5   2  1  1
6   2  1 12
7   2 17  1
8   2 18  2
9  13 33  1
10 13 33  2
11 24 11 14
12 24 12 15

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年前
查看次数：	719 次
最近记录：	13 年前