gen*_*ric 5 regex r stringr regex-lookarounds
我在 R 中有一个数据框。我想匹配并保留该行,如果
phrases_with_woman <- structure(list(phrase = c("woman get degree", "woman obtain justice",
"session woman vote for member", "woman have to end", "woman have no existence",
"woman lose right", "woman be much", "woman mix at dance", "woman vote as member",
"woman have power", "woman act only", "she be woman", "no committee woman passed vote")), row.names = c(NA,
-13L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
在上面的示例中,我希望能够匹配除“she be Woman”之外的所有行。
这是我到目前为止的代码。我有一个积极的环视((?<=woman\\s)\\w+"),似乎在正确的轨道上,但它与太多前面的单词匹配。我尝试使用{1}仅匹配前面的一个单词,但这种语法不起作用。
matches <- phrases_with_woman %>%
filter(str_detect(phrase, "^woman|(?<=woman\\s)\\w+"))
Run Code Online (Sandbox Code Playgroud)
感谢帮助。
每个条件都可以是一个替代项,尽管最后一个条件需要两个替代项,假设 no/not/never 可以是第一个或第二个单词。
library(dplyr)
pat <- "^(woman|\\w+ woman|\\w+ (no|not|never) woman|(no|not|never) \\w+ woman)\\b"
phrases_with_woman %>%
filter(grepl(pat, phrase))
Run Code Online (Sandbox Code Playgroud)