R从字符串中删除非字母数字符号

scr*_*Owl 15 regex grep r

我有一个字符串,我想删除所有非字母数字符号,然后放入一个矢量.所以这:

"This is a string.  In addition, this is a string!" 
Run Code Online (Sandbox Code Playgroud)

会成为:

>stringVector1

"This","is","a","string","In","addition","this","is","a","string"
Run Code Online (Sandbox Code Playgroud)

我看过grep()但找不到匹配的例子.有什么建议?

koh*_*ske 33

这是一个例子:

> str <- "This is a string. In addition, this is a string!"
> str
[1] "This is a string. In addition, this is a string!"
> strsplit(gsub("[^[:alnum:] ]", "", str), " +")[[1]]
 [1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"       
[10] "string"  
Run Code Online (Sandbox Code Playgroud)

  • 谢谢,最后,我并不害羞在R`gsub中使用正则表达式("[^ [:alnum:] = \\.]","","哦,等等等等等等.请保持安静!= 0.42")比累积几次使用`gsub()`函数更好地用`""`替换每个标点符号要好得多. (3认同)

小智 5

处理这个问题的另一种方法

library(stringr)
text =  c("This is a string.  In addition, this is a string!")
str_split(str_squish((str_replace_all(text, regex("\\W+"), " "))), " ")
#[1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"        "string"  
Run Code Online (Sandbox Code Playgroud)
  • str_replace_all(text, regex("\\W+"), " "):查找非单词字符并替换" "
  • str_squish():减少字符串内重复的空格
  • str_split():将字符串分成几部分