我正在努力编写一个在dplyr::mutate().
由于rowwise() %>% sum()在大型数据集上速度相当慢,因此建议的替代方案是返回到 baseR。我希望按如下方式简化此过程,但在 mutate 函数中传递数据时遇到问题。
require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)
cars %>%
mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Appears to not be getting the data passed. Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
Run Code Online (Sandbox Code Playgroud)
所以问题就变成了如何通过在函数内部传递数据来解决每次都包含一个点的需要?
rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(., na.rm = na.rm)
}
#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v0.2.0)于 2018-05-22 创建。
以下是 akrun 的回答(请点赞):
换句话来说:只需放弃mutate()并在新函数中执行所有操作即可。
这是我的最终函数,作为对其的更新,如果需要,它还允许命名总和值列。
rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
bind_cols(data, .)
}
Run Code Online (Sandbox Code Playgroud)
我们可以将其放在...最后
rowwise_sum <- function(data, na.rm = FALSE,...) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
cars %>%
mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
# ... with 40 more rows
Run Code Online (Sandbox Code Playgroud)
它也可以在不改变位置的情况下工作...(尽管通常建议这样做)。这里的主要问题是data(即.) 未在 in 中的参数列表中指定mutate。
在函数中创建整个流程而不是执行一部分会更容易
rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(sum = rowSums(., na.rm = TRUE)) %>%
bind_cols(data, .)
}
rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
Run Code Online (Sandbox Code Playgroud)