我在这里和这里问了相关的问题.我试图概括这些答案,但都失败了.
基本上我有一个字符串,我想分成单词,数字和任何类型的标点符号,但是,我想保留撇号.这是我尝试过的,我是如此接近(我认为):
x <- "Raptors don't like robots! I'd pay $500.00 to rid them."
strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)
## [[1]]
##  [1] "Raptors" "don"     "'"       "t"       "like"    "robots"  "!"             
##  [8] ""   "I"   "'"    "d"  "pay"     "$"       "500"     "."       "00"      "to"         
## [20] "rid"   "them"    "."  
这就是我追求的:
## [[1]]
##  [1] "Raptors" "don't"       "like"    "robots"  "!"       ""        "I'd"      
##  [8] "pay"     "$"       "500"   "."   "00"  "to"      "rid"     "them"    "."  
虽然我想要一个基本解决方案,我希望看到其他解决方案(我确信有人有一个字符串解决方案),这使得这个问题更容易被其他人推广.
注意: R具有特定的正则表达式系统.你会想熟悉R来回答这个问题.
您可以使用否定前瞻(?!'):
strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
#  [1] "Raptors" "don't"   "like"    "robots"  "!"       ""        "I'd"     "pay"     "$"       "500"     "."       "00"      "to"      "rid"     "them"    "."