目前,我正在使用 grepl 的嵌套 ifelse 函数来检查数据框中字符串向量的匹配,例如:
# vector of possible words to match
x <- c("Action", "Adventure", "Animation")
# data
my_text <- c("This one has Animation.", "This has none.", "Here is Adventure.")
my_text <- as.data.frame(my_text)
my_text$new_column <- ifelse (
grepl("Action", my_text$my_text) == TRUE,
"Action",
ifelse (
grepl("Adventure", my_text$my_text) == TRUE,
"Adventure",
ifelse (
grepl("Animation", my_text$my_text) == TRUE,
"Animation", NA)))
> my_text$new_column
[1] "Animation" NA "Adventure"
Run Code Online (Sandbox Code Playgroud)
这对于少数元素(例如,这里的三个)来说很好,但是当可能的匹配项更大时(例如,150)我该如何返回?嵌套的 ifelse 看起来很疯狂。我知道我可以同时 grepl 多个内容,如下面的代码所示,但是这会返回一个逻辑告诉我仅字符串是否匹配,而不是哪个字符串匹配。我想知道匹配了什么(在多个匹配的情况下,任何匹配都可以。
x <- c("Action", "Adventure", "Animation")
my_text <- c("This one has Animation.", "This has none.", "Here is Adventure.")
grepl(paste(x, collapse = "|"), my_text)
returns: [1] TRUE FALSE TRUE
what i'd like it to return: "Animation" ""(or FALSE) "Adventure"
Run Code Online (Sandbox Code Playgroud)
按照这里的模式,base解决方案。
x <- c("ActionABC", "AdventureDEF", "AnimationGHI")
regmatches(x, regexpr("(Action|Adventure|Animation)", x))
Run Code Online (Sandbox Code Playgroud)
stringr有一个更简单的方法来做到这一点
library(stringr)
str_extract(x, "(Action|Adventure|Animation)")
Run Code Online (Sandbox Code Playgroud)