似乎grep在返回匹配的方式上是"贪婪的".假设我有以下数据:
Sources <- c(
"Coal burning plant",
"General plant",
"coalescent plantation",
"Charcoal burning plant"
)
Registry <- seq(from = 1100, to = 1103, by = 1)
df <- data.frame(Registry, Sources)
Run Code Online (Sandbox Code Playgroud)
如果我执行grep("(?=.*[Pp]lant)(?=.*[Cc]oal)", df$Sources, perl = TRUE, value = TRUE),它会返回
"Coal burning plant"
"coalescent plantation"
"Charcoal burning plant"
Run Code Online (Sandbox Code Playgroud)
但是,我只想返回完全匹配,即只发生"煤"和"植物"的地方.我不想要"合并","种植园"等.所以对此,我只想看"Coal burning plant"
您希望\b在单词模式周围使用单词边界.单词边界不消耗任何字符.它断言,一方面有一个字符,而另一方则没有.您可能还需要考虑使用内联(?i)修饰符进行不区分大小写的匹配.
grep('(?i)(?=.*\\bplant\\b)(?=.*\\bcoal\\b)', df$Sources, perl=T, value=T)
Run Code Online (Sandbox Code Playgroud)