相关疑难解决方法(0)

删除除撇号和R中的字内短划线之外的标点符号

我知道如何单独删除标点并保留撇号:

gsub( "[^[:alnum:]']", " ", db$text )  
Run Code Online (Sandbox Code Playgroud)

或者如何使用tm包保持字内短划线:

removePunctuation(db$text, preserve_intra_word_dashes = TRUE)
Run Code Online (Sandbox Code Playgroud)

但我无法找到同时做到这两点的方法.例如,如果我的原始句子是:

"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"
Run Code Online (Sandbox Code Playgroud)

我希望它是:

"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
Run Code Online (Sandbox Code Playgroud)

当然,会有额外的空白区域,但我可以在以后删除它们.

我将非常感谢你的帮助.

string text r

3
推荐指数
1
解决办法
2174
查看次数

标签 统计

r ×1

string ×1

text ×1