在dmapply(ddR包)中运行聚合函数

Kon*_*rad 5 parallel-processing aggregate r distributed-computing dataframe

我想运行aggregate函数dmapply通过所提供的功能ddR包.

期望的结果

期望的结果反映了通过aggregatebase 生成的简单输出:

aggregate(
  x = mtcars$mpg,
  FUN = function(x) {
    mean(x, na.rm = TRUE)
  },
  by = list(trans = mtcars$am)
)
Run Code Online (Sandbox Code Playgroud)

产生:

  trans        x
1     0 17.14737
2     1 24.39231
Run Code Online (Sandbox Code Playgroud)

尝试 - ddmapply

我希望在使用时得到相同的结果ddmapply,如下所示:

# ddR
require(ddR)

# ddR object creation
distMtcars <- as.dframe(mtcars)

# Aggregate / ddmapply
dmapply(
  FUN = function(x, y) {
    aggregate(FUN = mean(x, na.rm = TRUE),
              x = x,
              by = list(trans = y))
  },
  distMtcars$mpg,
  y = distMtcars$am,
  output.type = "dframe",
  combine = "rbind"
)
Run Code Online (Sandbox Code Playgroud)

代码失败:

错误match.fun(FUN): 'mean(x, na.rm = TRUE)'不是函数,字符或符号来自:match.fun(FUN)


更新

@Mike指出的修复错误会删除错误,但不会产生所需的结果.代码:

# Avoid namespace conflict with other packages
ddR::collect(
  dmapply(
    FUN = function(x, y) {
      aggregate(
        FUN = function(x) {
          mean(x, na.rm = TRUE)
        },
        x = x,
        by = list(trans = y)
      )
    },
    distMtcars$mpg,
    y = distMtcars$am,
    output.type = "dframe",
    combine = "rbind"
  )
)
Run Code Online (Sandbox Code Playgroud)

收益率:

[1] trans x    
<0 rows> (or 0-length row.names)
Run Code Online (Sandbox Code Playgroud)

Mik*_* H. 2

如果您将聚合函数更改为与之前调用的函数一致,那么它对我来说效果很好:FUN = function(x) mean(x, na.rm = T)。它找不到的原因mean(x, na.rm = T)是因为它不是一个函数(它是一个函数调用),而是mean一个函数。

除非您将其更改为,否则它也会给您NA结果。y 也一样。话虽如此,我认为这应该对你有用:x = distMtcars$mpgx = collect(distMtcars)$mpg

res <-dmapply(
  FUN = function(x, y) {
    aggregate(FUN = function(x) mean(x, na.rm = TRUE),
              x = x,
              by = list(trans = y))
  },
  x = list(collect(distMtcars)$mpg),
  y = list(collect(distMtcars)$am),
  output.type = "dframe",
  combine = "rbind"
)
Run Code Online (Sandbox Code Playgroud)

然后你就可以collect(res)看到结果了。

collect(res)
#  trans        x
#1     0 17.14737
#2     1 24.39231
Run Code Online (Sandbox Code Playgroud)