在R中grepl以查找与任何字符串列表的匹配

Mar*_*lla 13 regex grep r grepl

在引用值列表时是否可以使用grepl参数,可能使用%in%运算符?我想获取下面的数据,如果动物名称中有"dog"或"cat",我想返回一个值,比如"keep"; 如果它没有"狗"或"猫",我想返回"丢弃".

data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T))
Run Code Online (Sandbox Code Playgroud)

现在,如果我只是通过严格匹配值来做到这一点,比如"cat"和"dog",我可以使用以下方法:

matches <- c("cat","dog")

data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard")
Run Code Online (Sandbox Code Playgroud)

但是使用grep或grepl只引用列表中的第一个参数:

data$keep <- ifelse(grepl(matches, data$animal), "Keep","Discard")
Run Code Online (Sandbox Code Playgroud)

回报

Warning message:
In grepl(matches, data$animal) :
  argument 'pattern' has length > 1 and only the first element will be used
Run Code Online (Sandbox Code Playgroud)

注意,我在搜索中看到了这个帖子,但这似乎不起作用: grep使用具有多个模式的字符向量

Ric*_*ven 21

您可以|在正则表达式中使用"或"()语句grepl.

ifelse(grepl("dog|cat", data$animal), "keep", "discard")
# [1] "keep"    "keep"    "discard" "keep"    "keep"    "keep"    "keep"    "discard"
# [9] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "discard" "keep"   
#[17] "discard" "keep"    "keep"    "discard" "keep"    "keep"    "discard" "keep"   
#[25] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[33] "keep"    "discard" "keep"    "discard" "keep"    "discard" "keep"    "keep"   
#[41] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[49] "keep"    "discard"
Run Code Online (Sandbox Code Playgroud)

正则表达式dog|cat告诉正则表达式引擎查找"dog"或者"cat",并返回两者的匹配项.


tal*_*lat 14

不确定你尝试了什么,但这似乎工作:

data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")
Run Code Online (Sandbox Code Playgroud)

与您链接的答案类似.

诀窍是使用粘贴:

paste(matches, collapse = "|")
#[1] "cat|dog"
Run Code Online (Sandbox Code Playgroud)

所以它创建了一个带有狗或猫的正则表达式,并且还可以使用很长的模式列表而无需键入每个模式.

编辑:

如果您以后根据"保留"和"放弃"条目对data.frame进行子集化,则可以使用以下命令更直接地执行此操作:

data[grepl(paste(matches, collapse = "|"), data$animal),]
Run Code Online (Sandbox Code Playgroud)

这样,其结果为greplTRUE或FALSE用于子集.


Dav*_*urg 13

尽量避免尽量避免ifelse.例如,这很好用

c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]
Run Code Online (Sandbox Code Playgroud)

对于123种子,你会得到

##  [1] "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Discard" "Keep"   
##  [9] "Discard" "Discard" "Keep"    "Discard" "Keep"    "Discard" "Keep"    "Keep"   
## [17] "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"   
## [25] "Keep"    "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [33] "Keep"    "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [41] "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"    "Discard"
## [49] "Keep"    "Keep"   
Run Code Online (Sandbox Code Playgroud)