调用序列化R函数时出错

MSa*_*ich 29 parallel-processing r snow

我将以下包装入R:

library(foreach)
library(doParallel)
library(iterators)
Run Code Online (Sandbox Code Playgroud)

我将代码"并行化"了很长时间,但最近我在代码运行时遇到INTERMITTENT停止.错误是:

Error in serialize(data, node$con) : error writing to connection
Run Code Online (Sandbox Code Playgroud)

我有根据的猜测是,我使用下面的命令打开的连接可能已过期:

## Register Cluster
##
cores<-8
cl <- makeCluster(cores)
registerDoParallel(cl)
Run Code Online (Sandbox Code Playgroud)

查看makeCluster手册页,我发现默认情况下,连接仅在30天后到期!我可以设置选项(错误=恢复),以便在代码停止时动态检查连接是否打开,但我之前决定发布这个一般性问题.

重要:

1)错误实际上是间歇性的,有时我重新运行相同的代码并且没有错误.2)我在同一台多核机器(Intel/8内核)上运行所有内容.因此,它不是群集中的通信(网络)问题.3)我是笔记本电脑和台式机(64核心)上CPU和GPU并行化的重要用户.不幸的是,这是我第一次遇到这种类型的错误.

是否有人有相同类型的错误?

根据要求,我提供了sessionInfo():

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TTR_0.22-0       xts_0.9-3        doParallel_1.0.1 iterators_1.0.6  foreach_1.4.0    zoo_1.7-9        Revobase_6.2.0   RevoMods_6.2.0  

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.3 grid_2.15.3     lattice_0.20-13 tools_2.15.3   
Run Code Online (Sandbox Code Playgroud)

@SeteveWeston,低于其中一个调用中的错误(再次是间歇性的):

starting worker pid=8808 on localhost:10187 at 15:21:52.232
starting worker pid=5492 on localhost:10187 at 15:21:53.624
starting worker pid=8804 on localhost:10187 at 15:21:54.997
starting worker pid=8540 on localhost:10187 at 15:21:56.360
starting worker pid=6308 on localhost:10187 at 15:21:57.721
starting worker pid=8164 on localhost:10187 at 15:21:59.137
starting worker pid=8064 on localhost:10187 at 15:22:00.491
starting worker pid=8528 on localhost:10187 at 15:22:01.855
Error in unserialize(node$con) : 
  ReadItem: unknown type 0, perhaps written by later version of R
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Run Code Online (Sandbox Code Playgroud)

添加更多信息.我设置选项(错误=恢复),它提供了以下信息:

Error in serialize(data, node$con) : error writing to connection

Enter a frame number, or 0 to exit   

1: #51: parallelize(FUN = "ensemble.prism", arg = list(prism = iis.long, instances = oos.instances), vectorize.arg = c("prism", "instances"), cores = cores, .export 
2: parallelize.R#58: foreach.bind(idx = i) %dopar% pFUN(idx)
3: e$fun(obj, substitute(ex), parent.frame(), e$data)
4: clusterCall(cl, workerInit, c.expr, exportenv, obj$packages)
5: sendCall(cl[[i]], fun, list(...))
6: postNode(con, "EXEC", list(fun = fun, args = args, return = return, tag = tag))
7: sendData(con, list(type = type, data = value, tag = tag))
8: sendData.SOCKnode(con, list(type = type, data = value, tag = tag))
9: serialize(data, node$con)

Selection: 9
Run Code Online (Sandbox Code Playgroud)

我试图检查连接是否仍然可用,并且有:

Browse[1]> showConnections()
   description                class      mode  text     isopen   can read can write
3  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
4  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
6  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
7  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
8  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
9  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
10 "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
Browse[1]> 
Run Code Online (Sandbox Code Playgroud)

由于连接是打开的,错误0表示R版本(正如@SteveWeston所指出的那样),我真的可以弄清楚这里发生了什么.

编辑1:

我对问题的解决方法

传递给函数的参数代码很好.因此,由@MichaelFilosi提供的答案并没有带来太大的影响.无论如何,非常感谢您的回答!

我无法确切地找到呼叫的确切错误,但至少,我可以解决问题.

诀窍是将每个并行线程的函数调用参数分解为更小的块.

奇迹般错误消失了.

如果同样适合你,请告诉我!

Max*_*don 13

这很可能是由于内存不足(有关详细信息,请参阅我的博文).以下是如何导致此错误的示例:

> a <- matrix(1, ncol=10^4*2.1, nrow=10^4)
> cl <- makeCluster(8, type = "FORK")
> parSapply(cl, 1:8, function(x) {
+   b <- a + 1
+   mean(b)
+   })
Error in unserialize(node$con) : error reading from connection
Run Code Online (Sandbox Code Playgroud)

  • 我也认为这是由于内存问题.我通过在我的问题中创建需要更少内存的更小线程来解决它 (3认同)

小智 2

我收到类似的错误 Unserialize(node$con) 中的错误:从连接读取错误

我发现这是对 C 函数的调用中缺少参数,.Call() 也许它可以提供帮助!

  • @Filosi 你能提供更多细节来解决你的问题吗?例如,缺少哪个参数;哪一行代码,...等等。干杯 (9认同)