删除除撇号和R中的字内短划线之外的标点符号

use*_*736 3 string text r

我知道如何单独删除标点并保留撇号:

gsub( "[^[:alnum:]']", " ", db$text )  
Run Code Online (Sandbox Code Playgroud)

或者如何使用tm包保持字内短划线:

removePunctuation(db$text, preserve_intra_word_dashes = TRUE)
Run Code Online (Sandbox Code Playgroud)

但我无法找到同时做到这两点的方法.例如,如果我的原始句子是:

"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"
Run Code Online (Sandbox Code Playgroud)

我希望它是:

"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
Run Code Online (Sandbox Code Playgroud)

当然,会有额外的空白区域,但我可以在以后删除它们.

我将非常感谢你的帮助.

Dav*_*urg 10

使用字符类

gsub("[^[:alnum:]['-]", " ", db$text)

## "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
Run Code Online (Sandbox Code Playgroud)