Dan*_*sos 6 if-statement r dplyr
是否可以在嵌套的 ifelse 中使用带有 %like% 的多个模式?如果没有,还有什么替代方案?
fruits<-c("apple", "pineapple", "grape", "avocado","banana")
color <-c("red","yellow","purple", "green","yellow")
mydata = data.frame(fruits=fruits,color=color )
mydata %>%
mutate(group = ifelse(fruits %like% c("%pple%","%vocado%"), "group 1",
ifelse(fruits %like% c("%anana%","%grape%"), "group 2", "group 3")))
Run Code Online (Sandbox Code Playgroud)
当我尝试上面的代码时,出现以下错误:
Warning messages:
1: In grep(pattern, levels(vector)) :
argument 'pattern' has length > 1 and only the first element will be used
2: In grep(pattern, levels(vector)) :
argument 'pattern' has length > 1 and only the first element will be used
Run Code Online (Sandbox Code Playgroud)
任何指导表示赞赏。谢谢你!
data.table的like()函数及其%like%、%ilike%和%flike%运算符版本仅接受单个模式参数,但您可以在正则表达式中使用交替。交替用竖线表示:
library(data.table)
library(dplyr)
mydata %>%
mutate(group = ifelse(fruits %ilike% "apple|avocado", "group 1",
ifelse(fruits %ilike% "banana|grape", "group 2", "group 3")))
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)fruits color group 1 apple red group 1 2 pineapple yellow group 1 3 grape purple group 2 4 avocado green group 1 5 banana yellow group 2
因此,group 1匹配任何出现apple或avocado出现在字符串中任意位置的字符串。因此,%不需要指示任意数量的任意字符。
请注意,%ilike%已使用 代替%like%。%ilike%是一个新的便利函数,用于不区分大小写的模式匹配,可在 data.table v1.12.4 中使用(自 2019 年 10 月 3 日起在 CRAN 上)。
%ilike%还将匹配单词Apple(大写A)。
当然,按照r2evanscase_when()的建议,这是嵌套的一个很好的替代方案:ifelse()
mydata %>%
mutate(group = case_when(fruits %ilike% "apple|avocado" ~ "group 1",
fruits %ilike% "banana|grape" ~ "group 2",
TRUE ~ "group 3"))
Run Code Online (Sandbox Code Playgroud)
您可以sapply对模式进行行求和来找到您需要的内容。
笔记:
dplyr,最好使用它的if_else,因为这个版本可以更好地防止不同的类输出(以及 base 的一些其他问题ifelse)。%like%只是该like函数的中缀运算符(data.table至少在 中),因此为了清楚起见,我在这里使用后者(修复前版本)。sapply(c(".*apple.*", ".*vocado.*"), like, vector = fruits)
# .*apple.* .*vocado.*
# [1,] TRUE FALSE
# [2,] TRUE FALSE
# [3,] FALSE FALSE
# [4,] FALSE TRUE
# [5,] FALSE FALSE
rowSums(sapply(c(".*apple.*", ".*vocado.*"), like, vector = fruits)) > 0
# [1] TRUE TRUE FALSE TRUE FALSE
Run Code Online (Sandbox Code Playgroud)
这就是我们需要的,一个 的向量logical。我将为此创建一个辅助函数。
mylike <- function(x, ptns) rowSums(sapply(ptns, like, vector = x)) > 0
mylike(fruits, c(".*apple.*", ".*vocado.*"))
# [1] TRUE TRUE FALSE TRUE FALSE
mydata %>%
mutate(
group = if_else(mylike(fruits, c(".*apple.*", ".*vocado.*")), "group 1",
if_else(mylike(fruits, c(".*anana.*",".*grape.*")), "group 2", "group 3"))
)
# fruits color group
# 1 apple red group 1
# 2 pineapple yellow group 1
# 3 grape purple group 2
# 4 avocado green group 1
# 5 banana yellow group 2
Run Code Online (Sandbox Code Playgroud)
但是,当我看到嵌套ifelse/时if_else,我建议使用case_when,因为它更具可读性,尤其是当条件数量增加时。
mydata %>%
mutate(
group = case_when(
mylike(fruits, c(".*apple.*", ".*vocado.*")) ~ "group 1",
mylike(fruits, c(".*anana.*",".*grape.*")) ~ "group 2",
TRUE ~ "group 3"
)
)
# fruits color group
# 1 apple red group 1
# 2 pineapple yellow group 1
# 3 grape purple group 2
# 4 avocado green group 1
# 5 banana yellow group 2
Run Code Online (Sandbox Code Playgroud)
如果您已经拥有一组 SQL 模式并且不想将它们全部转换为正则表达式,那么这里有一个基于https://codereview.stackexchange.com/a/36864/42300的快速帮助函数:
# https://codereview.stackexchange.com/a/36864/42300
sql2regex <- function(ptn) {
paste0(
"^",
gsub("_", ".",
gsub("(?<!\\[)%(?!\\])", ".*", ptn, perl = TRUE)),
"$")
}
Run Code Online (Sandbox Code Playgroud)
它试图聪明地不进行转换[%],这是“转义”百分比并获取其文字的一种方法(参考:http: //www.sqlserver.info/syntax/sql-server-like-with-percent-literal /)。然而,尽管[%看起来不完整,但这并没有正确翻译为^[.*$,而是保留为^[%$,这将会失败。再次强调,这只是一个快速破解帮助函数。
mydata %>%
mutate(
group = case_when(
mylike(fruits, sql2regex(c("%pple%","%vocado%"))) ~ "group 1",
mylike(fruits, sql2regex(c("%anana%","%grape%"))) ~ "group 2",
TRUE ~ "group 3"
)
)
# fruits color group
# 1 apple red group 1
# 2 pineapple yellow group 1
# 3 grape purple group 2
# 4 avocado green group 1
# 5 banana yellow group 2
Run Code Online (Sandbox Code Playgroud)