r- grepl查找多个字符串存在

Question

r- grepl查找多个字符串存在

grepl("instance|percentage", labelTest$Text)

Run Code Online (Sandbox Code Playgroud)

如果存在instance或中的任何一个，将返回true percentage。

仅当同时存在这两个术语时，我才能如何实现。

Answer 1

Aks*_*elA 11

Text <- c("instance", "percentage", "n", 
          "instance percentage", "percentage instance")

grepl("instance|percentage", Text)
# TRUE  TRUE FALSE  TRUE  TRUE

grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE  TRUE

Run Code Online (Sandbox Code Playgroud)

后者通过寻找：

('instance')(any character sequence)('percentage')  
OR  
('percentage')(any character sequence)('instance')

Run Code Online (Sandbox Code Playgroud)

自然，如果您需要找到两个以上单词的任意组合，这将变得非常复杂。这样，注释中提到的解决方案将更易于实现和阅读。

在匹配多个单词时可能涉及的另一种选择是使用正向预见（可以认为是“非消耗性”匹配）。为此，您必须激活perl正则表达式。

# create a vector of word combinations
set.seed(1)
words <- c("instance", "percentage", "element",
           "character", "n", "o", "p")
Text2 <- replicate(10, paste(sample(words, 5), collapse=" "))

# grepl with multiple positive look-ahead
longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)",
  Text2, perl=TRUE)

# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) & 
          grepl("percentage", Text2) & 
             grepl("element", Text2) & 
           grepl("character", Text2)

# they produce identical results
identical(longperl, longstrd)

Run Code Online (Sandbox Code Playgroud)

此外，如果将模式存储在向量中，则可以显着压缩表达式，从而为您提供

pat <- c("instance", "percentage", "element", "character")

longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE)
longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L

Run Code Online (Sandbox Code Playgroud)

如注释中所要求的，如果要匹配精确的单词，即不匹配子字符串，我们可以使用来指定单词边界\\b。例如：

tx <- c("cent element", "percentage element", "element cent", "element centimetre")

grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE)
# TRUE FALSE  TRUE FALSE
grepl("element", tx) & grepl("\\bcent\\b", tx)
# TRUE FALSE  TRUE FALSE

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	16431 次
最近记录：	6 年，1 月前