R string删除分割时的标点符号

pau*_*uez 8 regex r

假设我有一个字符串,例如以下内容.

x <- 'The world is at end. What do you think?   I am going crazy!    These people are too calm.'
Run Code Online (Sandbox Code Playgroud)

我只需要在标点符号!?.和后跟空格上进行拆分并保留标点符号.

这会删除标点并在分割部分留下前导空格

vec <- strsplit(x, '[!?.][:space:]*')
Run Code Online (Sandbox Code Playgroud)

如何分割留下标点符号的句子?

hwn*_*wnd 14

您可以PCRE通过使用perl=TRUE并使用lookbehind断言来打开.

strsplit(x, '(?<![^!?.])\\s+', perl=TRUE)
Run Code Online (Sandbox Code Playgroud)

正则表达式:

(?<!          look behind to see if there is not:
 [^!?.]       any character except: '!', '?', '.'
)             end of look-behind
\s+           whitespace (\n, \r, \t, \f, and " ") (1 or more times)
Run Code Online (Sandbox Code Playgroud)

现场演示


Tyl*_*ker 6

qdap包中sentSplit函数是为此任务创建的:

library(qdap)
sentSplit(data.frame(text = x), "text")

##   tot                       text
## 1 1.1       The world is at end.
## 2 2.2         What do you think?
## 3 3.3          I am going crazy!
## 4 4.4 These people are too calm.
Run Code Online (Sandbox Code Playgroud)