Kon*_*rad 5 parallel-processing aggregate r distributed-computing dataframe
我想运行aggregate
函数中的dmapply
通过所提供的功能ddR
包.
期望的结果反映了通过aggregate
base 生成的简单输出:
aggregate(
x = mtcars$mpg,
FUN = function(x) {
mean(x, na.rm = TRUE)
},
by = list(trans = mtcars$am)
)
Run Code Online (Sandbox Code Playgroud)
产生:
trans x
1 0 17.14737
2 1 24.39231
Run Code Online (Sandbox Code Playgroud)
ddmapply
我希望在使用时得到相同的结果ddmapply
,如下所示:
# ddR
require(ddR)
# ddR object creation
distMtcars <- as.dframe(mtcars)
# Aggregate / ddmapply
dmapply(
FUN = function(x, y) {
aggregate(FUN = mean(x, na.rm = TRUE),
x = x,
by = list(trans = y))
},
distMtcars$mpg,
y = distMtcars$am,
output.type = "dframe",
combine = "rbind"
)
Run Code Online (Sandbox Code Playgroud)
代码失败:
错误
match.fun(FUN)
:'mean(x, na.rm = TRUE)'
不是函数,字符或符号来自:match.fun(FUN)
@Mike指出的修复错误会删除错误,但不会产生所需的结果.代码:
# Avoid namespace conflict with other packages
ddR::collect(
dmapply(
FUN = function(x, y) {
aggregate(
FUN = function(x) {
mean(x, na.rm = TRUE)
},
x = x,
by = list(trans = y)
)
},
distMtcars$mpg,
y = distMtcars$am,
output.type = "dframe",
combine = "rbind"
)
)
Run Code Online (Sandbox Code Playgroud)
收益率:
[1] trans x
<0 rows> (or 0-length row.names)
Run Code Online (Sandbox Code Playgroud)
如果您将聚合函数更改为与之前调用的函数一致,那么它对我来说效果很好:FUN = function(x) mean(x, na.rm = T)
。它找不到的原因mean(x, na.rm = T)
是因为它不是一个函数(它是一个函数调用),而是mean
一个函数。
除非您将其更改为,否则它也会给您NA
结果。y 也一样。话虽如此,我认为这应该对你有用:x = distMtcars$mpg
x = collect(distMtcars)$mpg
res <-dmapply(
FUN = function(x, y) {
aggregate(FUN = function(x) mean(x, na.rm = TRUE),
x = x,
by = list(trans = y))
},
x = list(collect(distMtcars)$mpg),
y = list(collect(distMtcars)$am),
output.type = "dframe",
combine = "rbind"
)
Run Code Online (Sandbox Code Playgroud)
然后你就可以collect(res)
看到结果了。
collect(res)
# trans x
#1 0 17.14737
#2 1 24.39231
Run Code Online (Sandbox Code Playgroud)