tuc*_*son 22 parallel-processing foreach r
我曾试图用20 CPU运行在Unix机器上下面的代码,使用[R ,foreach,parallel,doParallel和party包(我的目标是让党/ varimp功能上并行多个CPU的工作):
parallel_compute_varimp <- function (object, mincriterion = 0, conditional = FALSE, threshold = 0.2,
nperm = 1, OOB = TRUE, pre1.0_0 = conditional)
{
response <- object@responses
input <- object@data@get("input")
xnames <- colnames(input)
inp <- initVariableFrame(input, trafo = NULL)
y <- object@responses@variables[[1]]
error <- function(x, oob) mean((levels(y)[sapply(x, which.max)] != y)[oob])
w <- object@initweights
perror <- matrix(0, nrow = nperm * length(object@ensemble), ncol = length(xnames))
colnames(perror) <- xnames
data = foreach(b = 1:length(object@ensemble), .packages = c("party","stats"), .combine = rbind) %dopar%
{
try({
tree <- object@ensemble[[b]]
oob <- object@weights[[b]] == 0
p <- .Call("R_predict", tree, inp, mincriterion, -1L, PACKAGE = "party")
eoob <- error(p, oob)
for (j in unique(varIDs(tree))) {
for (per in 1:nperm) {
if (conditional || pre1.0_0) {
tmp <- inp
ccl <- create_cond_list(conditional, threshold, xnames[j], input)
if (is.null(ccl)) {
perm <- sample(which(oob))
}
else {
perm <- conditional_perm(ccl, xnames, input, tree, oob)
}
tmp@variables[[j]][which(oob)] <- tmp@variables[[j]][perm]
p <- .Call("R_predict", tree, tmp, mincriterion, -1L, PACKAGE = "party")
}
else {
p <- .Call("R_predict", tree, inp, mincriterion, as.integer(j), PACKAGE = "party")
}
perror[b, j] <- (error(p, oob) - eoob)
}
}
########
# return data to the %dopar% loop data variable
perror[b, ]
########
}) # END OF TRY
} # END OF LOOP WITH PARALLEL COMPUTING
perror = data
perror <- as.data.frame(perror)
return(MeanDecreaseAccuracy = colMeans(perror))
}
environment(parallel_compute_varimp) <- asNamespace('party')
cl <- makeCluster(detectCores())
registerDoParallel(cl, cores = detectCores())
<...>
system.time(data.cforest.varimp <- parallel_compute_varimp(data.cforest, conditional = TRUE))
Run Code Online (Sandbox Code Playgroud)
但我收到一个错误:
> system.time(data.cforest.varimp <- parallel_compute_varimp(data.cforest, conditional = TRUE))
Error in unserialize(socklist[[n]]) : error reading from connection
Timing stopped at: 58.302 13.197 709.307
Run Code Online (Sandbox Code Playgroud)
代码正在使用4个CPU上的较小数据集.
我的想法已经不多了.有人可以建议一种方法来实现我在并行CPU上运行party package varimp函数的目标吗?
Ste*_*ton 33
错误:
Error in unserialize(socklist[[n]]) : error reading from connection
Run Code Online (Sandbox Code Playgroud)
表示主进程在调用unserialize以从其中一个worker的套接字连接中读取时出错.这可能意味着相应的工作程序死亡,从而丢弃了套接字连接的结束.不幸的是,它可能因各种原因而死亡,其中许多原因都是系统特定的.
您通常可以通过使用makeCluster"outfile"选项找出工作人员死亡的原因,以便不会丢弃工作人员生成的错误消息.我通常建议outfile=""按照这个答案中的描述使用.请注意,"outfile"选项在snow和parallel包中的工作方式相同.
您还可以通过注册顺序后端来验证foreach循环在顺序执行时是否正常工作:
registerDoSEQ()
Run Code Online (Sandbox Code Playgroud)
如果你很幸运,foreach循环将在顺序执行时失败,因为通常更容易弄清楚出了什么问题.