当逻辑条件意味着不应评估输出时，为什么 dplyr 在此嵌套 if_else 中出错？

Question

当逻辑条件意味着不应评估输出时，为什么 dplyr 在此嵌套 if_else 中出错？

我在if_else里面有一个嵌套的语句mutate。在我的示例数据框中：

tmp_df2 <- data.frame(a = c(1,1,2), b = c(T,F,T), c = c(1,2,3))

  a     b c
1 1  TRUE 1
2 1 FALSE 2
3 2  TRUE 3

Run Code Online (Sandbox Code Playgroud)

我希望分组a，然后根据组是否有一行或两行执行操作。我会认为这个嵌套if_else就足够了：

tmp_df2 %>%
    group_by(a) %>%
    mutate(tmp_check = n() == 1) %>%
    mutate(d = if_else(tmp_check, # check for number of entries in group
                       0,
                       if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
    )
    )

Run Code Online (Sandbox Code Playgroud)

但这会引发错误：

Error in eval(substitute(expr), envir, enclos) : 
  `false` is length 2 not 1 or 1.

Run Code Online (Sandbox Code Playgroud)

该示例的设置方式是，当第一个if_else(n() == 1)条件评估为真时，则返回一个元素，但当它评估为假时，则返回一个包含两个元素的向量，这就是我假设导致错误的原因。然而，从逻辑上讲，这句话对我来说似乎是合理的。

以下两个语句产生（期望的）结果：

> tmp_df2 %>%
+     group_by(a) %>%
+     mutate(d = ifelse(rep(n() == 1, n()), # avoid undesired recycling
+                        0,
+                        if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
+     )
+     )
Source: local data frame [3 x 4]
Groups: a [2]

      a     b     c     d
  <dbl> <lgl> <dbl> <dbl>
1     1  TRUE     1   3.0
2     1 FALSE     2   1.5
3     2  TRUE     3   0.0

Run Code Online (Sandbox Code Playgroud)

或者只是过滤，以便只留下包含两行的组：

> tmp_df2 %>%
+     group_by(a) %>%
+     filter(n() == 2) %>%
+     mutate(d = if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)]))
Source: local data frame [2 x 4]
Groups: a [1]

      a     b     c     d
  <dbl> <lgl> <dbl> <dbl>
1     1  TRUE     1   3.0
2     1 FALSE     2   1.5

Run Code Online (Sandbox Code Playgroud)

我有三个问题。

dplyr 如何知道由于逻辑条件不应该评估的第二个输出是无效的？
如何在 dplyr 中获得所需的行为（不使用ifelse）？

编辑如答案中所述，要么没有临时tmp_check列并使用该if ... else构造，要么使用以下有效但会产生警告的代码：

library(dplyr)
tmp_df2 %>%
    group_by(a) %>%
    mutate(tmp_check = n() == 1) %>%
    mutate(d = if (tmp_check)  # check for number of entries in group
                       0 else
                       if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
    )

Run Code Online (Sandbox Code Playgroud)

Answer 1

Wei*_*ong 5

dplyr“知道”因为if_else检查值以用于 True 和 False 情况。这在中说明?if_else，消息来源告诉我们它是如何完成的：

if_else
# function (condition, true, false, missing = NULL) 
# {
#     if (!is.logical(condition)) {
#         stop("`condition` must be logical", call. = FALSE)
#     }
#     out <- true[rep(NA_integer_, length(condition))]
#     out <- replace_with(out, condition & !is.na(condition), true, 
#         "`true`")
#     out <- replace_with(out, !condition & !is.na(condition), 
#         false, "`false`")
#     out <- replace_with(out, is.na(condition), missing, "`missing`")
#     out
# }
# <environment: namespace:dplyr>

Run Code Online (Sandbox Code Playgroud)

检查来源replace_with：

dplyr:::replace_with
# function (x, i, val, name) 
# {
#     if (is.null(val)) {
#         return(x)
#     }
#     check_length(val, x, name)
#     check_type(val, x, name)
#     check_class(val, x, name)
#     if (length(val) == 1L) {
#         x[i] <- val
#     }
#     else {
#         x[i] <- val[i]
#     }
#     x
# }
# <environment: namespace:dplyr>

Run Code Online (Sandbox Code Playgroud)

因此检查 True 和 False 情况的值的长度。

要获得您想要的行为，您可以使用if ... else，正如另一个 SO 用户在您之前的问题中建议的那样：

tmp_df2 %>%
    group_by(a) %>%
    mutate(d = if (n() == 1) 0 else if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
    )

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，2 月前
查看次数：	8478 次
最近记录：	9 年，2 月前