正则表达式排除R中的单词

Duy*_*Bui 9 regex r

我真的不知道如何在R中使用不匹配的正则表达式正确地找到单词

例如:数据包括:

x =  c("hail", "small hail", "wind hail",  "deep hail",  "thunderstorm hail", "tstm wind hail", "gusty wind hail", "late season hail", "non severe hail", "marine hail")
Run Code Online (Sandbox Code Playgroud)

我想找到所有有"冰雹"但没有"海洋"的人

我的尝试:

x[grep("[^(marine)] hail", x)]
Run Code Online (Sandbox Code Playgroud)

- >我只有5:

"small hail"      "wind hail"       "deep hail"       "tstm wind hail"  "gusty wind hail"
Run Code Online (Sandbox Code Playgroud)

我不知道其他4会发生什么

Avi*_*Raj 17

使用外观断言.

> x =  c("hail", "small hail", "wind hail",  "deep hail",  "thunderstorm hail", "tstm wind hail", "gusty wind hail", "late season hail", "non severe hail", "marine hail")
> x[grep("^(?=.*hail)(?!.*marine)", x, perl=TRUE)]
[1] "hail"              "small hail"        "wind hail"        
[4] "deep hail"         "thunderstorm hail" "tstm wind hail"   
[7] "gusty wind hail"   "late season hail"  "non severe hail" 
Run Code Online (Sandbox Code Playgroud)

要么

\b必要时添加边界.\b单词字符和非单词字符之间的匹配.

> x[grep("^(?=.*\\bhail\\b)(?!.*\\bmarine\\b)", x, perl=TRUE)]
Run Code Online (Sandbox Code Playgroud)
  • ^ 断言我们刚开始.

  • (?=.*hail) 确定匹配必须包含字符串的正向前瞻 hail

  • (?!.*marine)否定前瞻,断言匹配不包含字符串marine.

  • 因此,只有满足两个条件时,上述正则表达式才会匹配起始锚点或行的起点.


hwn*_*wnd 7

在这种情况下,您希望使用先行断言.您的否定字符类的当前实现不符合您的预期,而是匹配以下内容:

[^(marine)]  # any character except: '(', 'm', 'a', 'r', 'i', 'n', 'e', ')'
 hail        # ' hail'
Run Code Online (Sandbox Code Playgroud)

要解决这个问题,你可以简单地做:

> x[grep('^(?!.*marine).*hail', x, perl=TRUE)]
# [1] "hail"              "small hail"        "wind hail"        
# [4] "deep hail"         "thunderstorm hail" "tstm wind hail"   
# [7] "gusty wind hail"   "late season hail"  "non severe hail"
Run Code Online (Sandbox Code Playgroud)

  • 你必须善于使用正则表达式从手机接听而无法测试结果. (3认同)

And*_*lor 6

如果所有x只包括类型hail,则:

x[-grep("marine", x)] 
Run Code Online (Sandbox Code Playgroud)

应该工作得很好.

编辑: Per G. Grothendieck建议:

 x[ ! grepl("marine", x) ] 
Run Code Online (Sandbox Code Playgroud)

是一个更好的解决方案

  • 仅当至少有一个具有"海洋"的组件时才有效.试试`x [!grepl("marine",x)]`而不是. (2认同)