Tyl*_*ker 11 regex string r stringi
我正在尝试使用stringi
包拆分分隔符(可能重复分隔符)但保留分隔符.这类似于我问moons前的这个问题:R分割分隔符(split)保留分隔符(split)但分隔符可以重复.我不认为base strsplit
可以处理这种类型的正则表达式.该stringi
包可以,但我无法弄清楚如何将它分割的分隔符,如果有重复,也不要在字符串的结尾留下一个空字符串格式的正则表达式.
基本R解决方案,stringr,stringi等解决方案都受到欢迎.
后来的问题发生是因为我使用贪婪*
,\\s
但空间不是空间,所以我只能考虑将其留在:
MWE
text.var <- c("I want to split here.But also||Why?",
"See! Split at end but no empty.",
"a third string. It has two sentences"
)
library(stringi)
stri_split_regex(text.var, "(?<=([?.!|]{1,10}))\\s*")
Run Code Online (Sandbox Code Playgroud)
#结果
## [[1]]
## [1] "I want to split here." "But also|" "|" "Why?"
## [5] ""
##
## [[2]]
## [1] "See!" "Split at end but no empty." ""
##
## [[3]]
## [1] "a third string." "It has two sentences"
Run Code Online (Sandbox Code Playgroud)
# 期望的结果
## [[1]]
## [1] "I want to split here." "But also||" "Why?"
##
## [[2]]
## [1] "See!" "Split at end but no empty."
##
## [[3]]
## [1] "a third string." "It has two sentences"
Run Code Online (Sandbox Code Playgroud)
运用 strsplit
strsplit(text.var, "(?<=[.!|])( +|\\b)", perl=TRUE)
#[[1]]
#[1] "I want to split here." "But also||" "Why?"
#[[2]]
#[1] "See!" "Split at end but no empty."
#[[3]]
#[1] "a third string." "It has two sentences"
Run Code Online (Sandbox Code Playgroud)
要么
library(stringi)
stri_split_regex(text.var, "(?<=[.!|])( +|\\b)")
#[[1]]
#[1] "I want to split here." "But also||" "Why?"
#[[2]]
#[1] "See!" "Split at end but no empty."
#[[3]]
#[1] "a third string." "It has two sentences"
Run Code Online (Sandbox Code Playgroud)
只是使用发现字符间位置的图案:(1)被通过一个前面?.!|
; (2)后面没有一个?.!|
.钉在\\s*
搭配,吃起来任意数量的连续的空格字符,你是好去.
## (look-behind)(look-ahead)(spaces)
strsplit(text.var, "(?<=([?.!|]))(?!([?.!|]))\\s*", perl=TRUE)
# [[1]]
# [1] "I want to split here." "But also||" "Why?"
#
# [[2]]
# [1] "See!" "Split at end but no empty."
#
# [[3]]
# [1] "a third string." "It has two sentences"
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
557 次 |
最近记录: |