R中的模式替换

use*_*736 5 regex twitter r

我正在研究R中的Twitter数据集,我发现很难从推文中删除用户名.

这是我的数据集的tweet列中的推文的示例:

[1] "@danimottale: 2 bad our inalienable rights offend their sensitivities. U cannot reason with obtuse zealotry. // So very well said."         
[2] "@FreeMktMonkey @drleegross Want to build HSA throughout lifetime for when older thus need HDHP not to deplete it if ill before 65y/o.thanks"
Run Code Online (Sandbox Code Playgroud)

我想删除/替换以"@"开头的所有单词以获得此输出:

[1] "2 bad our inalienable rights offend their sensitivities. U cannot reason with obtuse zealotry. // So very well said."         
[2] "Want to build HSA throughout lifetime for when older thus need HDHP not to deplete it if ill before 65y/o.thanks"
Run Code Online (Sandbox Code Playgroud)

这个gsub函数只用于删除"@"符号.

gsub("@", "", tweetdata$tweets)
Run Code Online (Sandbox Code Playgroud)

我想说,删除文本符号后面的字符,直到遇到空格或标点符号.

我开始尝试处理空间但无济于事:

gsub("@.*[:space:]$", "", tweetdata$tweets)
Run Code Online (Sandbox Code Playgroud)

这完全删除了第二条推文

gsub("@.*[:blank:]$", "", tweetdata$tweets)
Run Code Online (Sandbox Code Playgroud)

这不会改变输出.

我将非常感谢你的帮助.

hwn*_*wnd 9

您可以使用以下内容.\S+匹配任何非空白字符(1或更多次),然后匹配单个空白字符.

gsub('@\\S+\\s', '', noRT$text)
Run Code Online (Sandbox Code Playgroud)

工作演示

编辑:否定的匹配也可以正常工作(只使用空格字符)

gsub('@[^ ]+ ', '', noRT$text)
Run Code Online (Sandbox Code Playgroud)