strsplit所有空格和标点符号除了撇号

Tyl*_*ker 6 regex r

我在这里这里问了相关的问题.我试图概括这些答案,但都失败了.

基本上我有一个字符串,我想分成单词,数字和任何类型的标点符号,但是,我想保留撇号.这是我尝试过的,我是如此接近(我认为):

x <- "Raptors don't like robots! I'd pay $500.00 to rid them."

strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)

## [[1]]
##  [1] "Raptors" "don"     "'"       "t"       "like"    "robots"  "!"             
##  [8] ""   "I"   "'"    "d"  "pay"     "$"       "500"     "."       "00"      "to"         
## [20] "rid"   "them"    "."  
Run Code Online (Sandbox Code Playgroud)

这就是我追求的:

## [[1]]
##  [1] "Raptors" "don't"       "like"    "robots"  "!"       ""        "I'd"      
##  [8] "pay"     "$"       "500"   "."   "00"  "to"      "rid"     "them"    "."  
Run Code Online (Sandbox Code Playgroud)

虽然我想要一个基本解决方案,我希望看到其他解决方案(我确信有人有一个字符串解决方案),这使得这个问题更容易被其他人推广.

注意: R具有特定的正则表达式系统.你会想熟悉R来回答这个问题.

sgi*_*ibb 5

您可以使用否定前瞻(?!'):

strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
#  [1] "Raptors" "don't"   "like"    "robots"  "!"       ""        "I'd"     "pay"     "$"       "500"     "."       "00"      "to"      "rid"     "them"    "."
Run Code Online (Sandbox Code Playgroud)