我在这里和这里问了相关的问题.我试图概括这些答案,但都失败了.
基本上我有一个字符串,我想分成单词,数字和任何类型的标点符号,但是,我想保留撇号.这是我尝试过的,我是如此接近(我认为):
x <- "Raptors don't like robots! I'd pay $500.00 to rid them."
strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)
## [[1]]
## [1] "Raptors" "don" "'" "t" "like" "robots" "!"
## [8] "" "I" "'" "d" "pay" "$" "500" "." "00" "to"
## [20] "rid" "them" "."
Run Code Online (Sandbox Code Playgroud)
这就是我追求的:
## [[1]]
## [1] "Raptors" "don't" "like" "robots" "!" "" "I'd"
## [8] "pay" "$" "500" "." "00" "to" "rid" "them" "."
Run Code Online (Sandbox Code Playgroud)
虽然我想要一个基本解决方案,我希望看到其他解决方案(我确信有人有一个字符串解决方案),这使得这个问题更容易被其他人推广.
注意: R具有特定的正则表达式系统.你会想熟悉R来回答这个问题.
您可以使用否定前瞻(?!'):
strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
# [1] "Raptors" "don't" "like" "robots" "!" "" "I'd" "pay" "$" "500" "." "00" "to" "rid" "them" "."
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1868 次 |
| 最近记录: |