我遇到了一个问题,我确信它的修复非常简单,但我一直在寻找大约一个小时的答案,似乎无法解决问题.
我有一个字符向量,其数据看起来有点像这样:
[5] "Toronto, ON" "Manchester, UK"
[7] "New York City, NY" "Newark, NJ"
[9] "Melbourne" "Los Angeles, CA"
[11] "New York, USA" "Liverpool, England"
[13] "Fort Collins, CO" "London, UK"
[15] "New York, NY"
Run Code Online (Sandbox Code Playgroud)
基本上我想摆脱2位或更短的所有字符元素,以便数据可以如下所示:
[5] "Toronto, " "Manchester, "
[7] "New York City, " "Newark, "
[9] "Melbourne" "Los Angeles, "
[11] "New York, USA" "Liverpool, England"
[13] "Fort Collins, " "London, "
[15] "New York, "
Run Code Online (Sandbox Code Playgroud)
我知道如何摆脱的逗号.正如我所说的,我确信这非常简单,任何帮助都会非常感激.谢谢!
您可以对\\w带有单词边界的单词字符使用量词,\\b\\w{1,2}\\b将单词与一个或两个字符匹配; 如果您有多个匹配的模式,请使用gsub删除它:
gsub("\\b\\w{1,2}\\b", "", v)
# [1] "Toronto, " "Manchester, " "New York City, " "Newark, " "Melbourne" "Los Angeles, " "New York, USA"
# [8] "Liverpool, England" "Fort Collins, " "London, " "New York, "
Run Code Online (Sandbox Code Playgroud)
注意事项\\w匹配字母和带有下划线的数字,如果您只想考虑字母字母,则可以使用gsub("\\b[a-zA-Z]{1,2}\\b", "", v).
v <- c("Toronto, ON", "Manchester, UK", "New York City, NY", "Newark, NJ", "Melbourne", "Los Angeles, CA", "New York, USA", "Liverpool, England", "Fort Collins, CO", "London, UK", "New York, NY")
Run Code Online (Sandbox Code Playgroud)