如何删除R中没有大写字母的单词?

pac*_*ese 3 r tm stringi

我在做使用R.文本分析有没有办法删除所有的话不是在盖使用tmstringi

如果我有这样的事情

Albert Einstein went to the store and saw his friend Nikola Tesla ... + 200 pags
Run Code Online (Sandbox Code Playgroud)

被转换成

Albert Einstein Nikola Tesla
Run Code Online (Sandbox Code Playgroud)

最好的祝福

Dav*_*urg 8

您可以使用简单的正则表达式删除这些单词

gsub("\\b[a-z]+\\s+", "", x)
# [1] "Albert Einstein Nikola Tesla"
Run Code Online (Sandbox Code Playgroud)

这只是寻找一个单词边界>小写字母>后面的所有字母>它后面的所有空格并删除它


虽然如果你有这样的词don't,你需要更复杂的正则表达式.就像是

x <- "if Albert Einstein didn't see his friend Nikola Tesla leavin'"
gsub("\\b[a-z][^ ]*(\\s+)?", "", x)
# [1] "Albert Einstein Nikola Tesla "
Run Code Online (Sandbox Code Playgroud)


arv*_*000 6

只需使用grep和正则表达式:

words <- 'Albert Einstein went to the store and saw his friend Nikola Tesla'

# split to vector of individual words
vec <- unlist(strsplit(words, ' '))
# just the capitalized ones
caps <- grep('^[A-Z]', vec, value = T)
# assemble back to a single string, if you want
paste(caps, collapse=' ')
Run Code Online (Sandbox Code Playgroud)