R - 使用正则表达式删除2个字符或更少的所有字符串

nik*_*UoM -1 regex string r

我遇到了一个问题,我确信它的修复非常简单,但我一直在寻找大约一个小时的答案,似乎无法解决问题.

我有一个字符向量,其数据看起来有点像这样:

  [5] "Toronto, ON"                    "Manchester, UK"                    
  [7] "New York City, NY"              "Newark, NJ"             
  [9] "Melbourne"                      "Los Angeles, CA"                         
 [11] "New York, USA"                  "Liverpool, England"            
 [13] "Fort Collins, CO"               "London, UK"                              
 [15] "New York, NY" 
Run Code Online (Sandbox Code Playgroud)

基本上我想摆脱2位或更短的所有字符元素,以便数据可以如下所示:

  [5] "Toronto, "                      "Manchester, "                    
  [7] "New York City, "                "Newark, "             
  [9] "Melbourne"                      "Los Angeles, "                         
 [11] "New York, USA"                  "Liverpool, England"            
 [13] "Fort Collins, "                 "London, "                              
 [15] "New York, " 
Run Code Online (Sandbox Code Playgroud)

我知道如何摆脱的逗号.正如我所说的,我确信这非常简单,任何帮助都会非常感激.谢谢!

Psi*_*dom 5

您可以对\\w带有单词边界的单词字符使用量词,\\b\\w{1,2}\\b将单词与一个或两个字符匹配; 如果您有多个匹配的模式,请使用gsub删除它:

gsub("\\b\\w{1,2}\\b", "", v)
# [1] "Toronto, "          "Manchester, "       "New York City, "    "Newark, "           "Melbourne"          "Los Angeles, "      "New York, USA"     
# [8] "Liverpool, England" "Fort Collins, "     "London, "           "New York, "  
Run Code Online (Sandbox Code Playgroud)

注意事项\\w匹配字母和带有下划线的数字,如果您只想考虑字母字母,则可以使用gsub("\\b[a-zA-Z]{1,2}\\b", "", v).


v <- c("Toronto, ON", "Manchester, UK", "New York City, NY", "Newark, NJ", "Melbourne", "Los Angeles, CA", "New York, USA", "Liverpool, England", "Fort Collins, CO", "London, UK", "New York, NY")
Run Code Online (Sandbox Code Playgroud)