小编use*_*736的帖子

R中的模式替换

我正在研究R中的Twitter数据集,我发现很难从推文中删除用户名.

这是我的数据集的tweet列中的推文的示例:

[1] "@danimottale: 2 bad our inalienable rights offend their sensitivities. U cannot reason with obtuse zealotry. // So very well said."         
[2] "@FreeMktMonkey @drleegross Want to build HSA throughout lifetime for when older thus need HDHP not to deplete it if ill before 65y/o.thanks"
Run Code Online (Sandbox Code Playgroud)

我想删除/替换以"@"开头的所有单词以获得此输出:

[1] "2 bad our inalienable rights offend their sensitivities. U cannot reason with obtuse zealotry. // So very well said."         
[2] "Want to build HSA throughout lifetime for when older thus need HDHP …
Run Code Online (Sandbox Code Playgroud)

regex twitter r

5
推荐指数
1
解决办法
204
查看次数

删除除撇号和R中的字内短划线之外的标点符号

我知道如何单独删除标点并保留撇号:

gsub( "[^[:alnum:]']", " ", db$text )  
Run Code Online (Sandbox Code Playgroud)

或者如何使用tm包保持字内短划线:

removePunctuation(db$text, preserve_intra_word_dashes = TRUE)
Run Code Online (Sandbox Code Playgroud)

但我无法找到同时做到这两点的方法.例如,如果我的原始句子是:

"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"
Run Code Online (Sandbox Code Playgroud)

我希望它是:

"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
Run Code Online (Sandbox Code Playgroud)

当然,会有额外的空白区域,但我可以在以后删除它们.

我将非常感谢你的帮助.

string text r

3
推荐指数
1
解决办法
2174
查看次数

删除单词R中的所有破折号

我之前曾问过类似的问题,但这个问题更具体,需要与之前提供的解决方案不同的解决方案,所以我希望发布它是可以的.我需要在我的文本中仅保留撇号和字内短划线(删除所有其他标点符号).例如,我想从str1获取str2:

str1<-"I'm dash before word -word, dash &%$,. in-between word, two before word --word just dashes ------, between words word - word"
str2<-"I'm dash before word word dash in-between word two before word  word just dashes  between words word  word"
Run Code Online (Sandbox Code Playgroud)

我到目前为止的解决方案,首先删除单词之间的破折号:
gsub(" - ", " ", str1)

然后留下字母和数字字符加上剩余的破折号
gsub("[^[:alnum:]['-]", " ", str1)

问题是,它不会删除相互之间的破折号,例如" - "和单词开头和结尾的破折号:" - word"或"word-"

regex r gsub

3
推荐指数
1
解决办法
1121
查看次数

标签 统计

r ×3

regex ×2

gsub ×1

string ×1

text ×1

twitter ×1