我真的不知道如何在R中使用不匹配的正则表达式正确地找到单词
例如:数据包括:
x = c("hail", "small hail", "wind hail", "deep hail", "thunderstorm hail", "tstm wind hail", "gusty wind hail", "late season hail", "non severe hail", "marine hail")
Run Code Online (Sandbox Code Playgroud)
我想找到所有有"冰雹"但没有"海洋"的人
我的尝试:
x[grep("[^(marine)] hail", x)]
Run Code Online (Sandbox Code Playgroud)
- >我只有5:
"small hail" "wind hail" "deep hail" "tstm wind hail" "gusty wind hail"
Run Code Online (Sandbox Code Playgroud)
我不知道其他4会发生什么
Avi*_*Raj 17
使用外观断言.
> x = c("hail", "small hail", "wind hail", "deep hail", "thunderstorm hail", "tstm wind hail", "gusty wind hail", "late season hail", "non severe hail", "marine hail")
> x[grep("^(?=.*hail)(?!.*marine)", x, perl=TRUE)]
[1] "hail" "small hail" "wind hail"
[4] "deep hail" "thunderstorm hail" "tstm wind hail"
[7] "gusty wind hail" "late season hail" "non severe hail"
Run Code Online (Sandbox Code Playgroud)
要么
\b必要时添加边界.\b单词字符和非单词字符之间的匹配.
> x[grep("^(?=.*\\bhail\\b)(?!.*\\bmarine\\b)", x, perl=TRUE)]
Run Code Online (Sandbox Code Playgroud)
^ 断言我们刚开始.
(?=.*hail) 确定匹配必须包含字符串的正向前瞻 hail
(?!.*marine)否定前瞻,断言匹配不包含字符串marine.
因此,只有满足两个条件时,上述正则表达式才会匹配起始锚点或行的起点.
在这种情况下,您希望使用先行断言.您的否定字符类的当前实现不符合您的预期,而是匹配以下内容:
[^(marine)] # any character except: '(', 'm', 'a', 'r', 'i', 'n', 'e', ')'
hail # ' hail'
Run Code Online (Sandbox Code Playgroud)
要解决这个问题,你可以简单地做:
> x[grep('^(?!.*marine).*hail', x, perl=TRUE)]
# [1] "hail" "small hail" "wind hail"
# [4] "deep hail" "thunderstorm hail" "tstm wind hail"
# [7] "gusty wind hail" "late season hail" "non severe hail"
Run Code Online (Sandbox Code Playgroud)
如果所有x只包括类型hail,则:
x[-grep("marine", x)]
Run Code Online (Sandbox Code Playgroud)
应该工作得很好.
编辑: Per G. Grothendieck建议:
x[ ! grepl("marine", x) ]
Run Code Online (Sandbox Code Playgroud)
是一个更好的解决方案