Dre*_*Dre 0 split r strsplit stringr
我想在遇到时间之后将我的文本分成8个单词和数字.
文字示例:
s <- 'random random random 19:49 0-2 H 2 ABC TREE LAKE #88 TURTLE random random 03:32 43-21 V 8 XYZ LOG #72 FIRE random random random'
Run Code Online (Sandbox Code Playgroud)
我希望如何拆分文本的示例.
'random random random 19:49 0-2 H 2 ABC TREE LAKE #88 TURTLE
random random 03:32 43-21 V 8 XYZ DOG LOG #72 FIRE
random random random'
Run Code Online (Sandbox Code Playgroud)
我知道我可以通过多种方式找到时间
str_extract(str_extract(s, "[:digit:]*:"), "[:digit:]*")
Run Code Online (Sandbox Code Playgroud)
但我不确定如何在时间之后分割八个单词和数字.任何帮助将不胜感激.
我们可以在一个或多个space(\\s+)的8个实例后跟一个或多个非空格(\\S+)(:后面跟着2个数字)后用a ,然后split在该分隔符上替换后面的空格.
strsplit(gsub('((?:\\:\\d{2}(\\s+\\S+){8}))\\s', '\\1,',
s, perl=TRUE), ',')[[1]]
#[1] "random random random 19:49 0-2 H 2 ABC TREE LAKE #88 TURTLE"
#[2] "random random 03:32 43-21 V 8 XYZ DOG LOG #72 FIRE"
#[3] "random random random"
Run Code Online (Sandbox Code Playgroud)
s <- 'random random random 19:49 0-2 H 2 ABC TREE LAKE #88 TURTLE random random 03:32 43-21 V 8 XYZ DOG LOG #72 FIRE random random random'
Run Code Online (Sandbox Code Playgroud)