拆分保持重复分隔符

Tyl*_*ker 11 regex string r stringi

我正在尝试使用stringi包拆分分隔符(可能重复分隔符)但保留分隔符.这类似于我问moons前的这个问题:R分割分隔符(split)保留分隔符(split)但分隔符可以重复.我不认为base strsplit可以处理这种类型的正则表达式.该stringi包可以,但我无法弄清楚如何将它分割的分隔符,如果有重复,也不要在字符串的结尾留下一个空字符串格式的正则表达式.

基本R解决方案,stringr,stringi等解决方案都受到欢迎.

后来的问题发生是因为我使用贪婪*,\\s但空间不是空间,所以我只能考虑将其留在:

MWE

text.var <- c("I want to split here.But also||Why?",
   "See! Split at end but no empty.",
   "a third string.  It has two sentences"
)

library(stringi)   
stri_split_regex(text.var, "(?<=([?.!|]{1,10}))\\s*")
Run Code Online (Sandbox Code Playgroud)

#结果

## [[1]]
## [1] "I want to split here." "But also|"     "|"          "Why?"                 
## [5] ""                     
## 
## [[2]]
## [1] "See!"       "Split at end but no empty." ""                          
## 
## [[3]]
## [1] "a third string."      "It has two sentences"
Run Code Online (Sandbox Code Playgroud)

# 期望的结果

## [[1]]
## [1] "I want to split here." "But also||"                     "Why?"                                  
## 
## [[2]]
## [1] "See!"         "Split at end but no empty."                         
## 
## [[3]]
## [1] "a third string."      "It has two sentences"
Run Code Online (Sandbox Code Playgroud)

akr*_*run 8

运用 strsplit

 strsplit(text.var, "(?<=[.!|])( +|\\b)", perl=TRUE)
 #[[1]]
 #[1] "I want to split here." "But also||"            "Why?"                 

 #[[2]]
 #[1] "See!"                       "Split at end but no empty."

 #[[3]]
 #[1] "a third string."      "It has two sentences"
Run Code Online (Sandbox Code Playgroud)

要么

 library(stringi)
 stri_split_regex(text.var, "(?<=[.!|])( +|\\b)")
 #[[1]]
 #[1] "I want to split here." "But also||"            "Why?"                 

 #[[2]]
 #[1] "See!"                       "Split at end but no empty."

 #[[3]]
 #[1] "a third string."      "It has two sentences"
Run Code Online (Sandbox Code Playgroud)


Jos*_*ien 6

只是使用发现字符间位置的图案:(1)通过一个前面?.!|; (2)后面没有一个?.!|.钉在\\s*搭配,吃起来任意数量的连续的空格字符,你是好去.

##                  (look-behind)(look-ahead)(spaces)
strsplit(text.var, "(?<=([?.!|]))(?!([?.!|]))\\s*", perl=TRUE)
# [[1]]
# [1] "I want to split here." "But also||"            "Why?"                 
# 
# [[2]]
# [1] "See!"                       "Split at end but no empty."
# 
# [[3]]
# [1] "a third string."      "It has two sentences"
Run Code Online (Sandbox Code Playgroud)