grepl("instance|percentage", labelTest$Text)
Run Code Online (Sandbox Code Playgroud)
如果存在instance或中的任何一个,将返回true percentage。
仅当同时存在这两个术语时,我才能如何实现。
Aks*_*elA 11
Text <- c("instance", "percentage", "n",
"instance percentage", "percentage instance")
grepl("instance|percentage", Text)
# TRUE TRUE FALSE TRUE TRUE
grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE TRUE
Run Code Online (Sandbox Code Playgroud)
后者通过寻找:
('instance')(any character sequence)('percentage')
OR
('percentage')(any character sequence)('instance')
Run Code Online (Sandbox Code Playgroud)
自然,如果您需要找到两个以上单词的任意组合,这将变得非常复杂。这样,注释中提到的解决方案将更易于实现和阅读。
在匹配多个单词时可能涉及的另一种选择是使用正向预见(可以认为是“非消耗性”匹配)。为此,您必须激活perl正则表达式。
# create a vector of word combinations
set.seed(1)
words <- c("instance", "percentage", "element",
"character", "n", "o", "p")
Text2 <- replicate(10, paste(sample(words, 5), collapse=" "))
# grepl with multiple positive look-ahead
longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)",
Text2, perl=TRUE)
# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) &
grepl("percentage", Text2) &
grepl("element", Text2) &
grepl("character", Text2)
# they produce identical results
identical(longperl, longstrd)
Run Code Online (Sandbox Code Playgroud)
此外,如果将模式存储在向量中,则可以显着压缩表达式,从而为您提供
pat <- c("instance", "percentage", "element", "character")
longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE)
longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L
Run Code Online (Sandbox Code Playgroud)
如注释中所要求的,如果要匹配精确的单词,即不匹配子字符串,我们可以使用来指定单词边界\\b。例如:
tx <- c("cent element", "percentage element", "element cent", "element centimetre")
grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE)
# TRUE FALSE TRUE FALSE
grepl("element", tx) & grepl("\\bcent\\b", tx)
# TRUE FALSE TRUE FALSE
Run Code Online (Sandbox Code Playgroud)