nee*_*elp 3 parallel-processing foreach r checkpoint r-future
我使用检查点包进行可重复的数据分析。有些计算需要很长时间才能计算,所以我想并行运行它们。然而,当并行运行时,检查点未在工作线程上设置,因此我收到一条错误消息“没有名为 xy 的包”(因为它没有安装在我的默认库目录中)。
我如何确保每个工作人员都使用检查点文件夹中的包版本?我尝试在 foreach 代码中设置 .libPaths 但这似乎不起作用。我还希望在全局范围内设置检查点/libPaths 一次,而不是在每个 foreach 调用中设置一次。
另一种选择可能是更改 .Rprofile 文件,但我不想这样做。
checkpoint::checkpoint("2018-06-01")
library(foreach)
library(doFuture)
library(future)
doFuture::registerDoFuture()
future::plan("multisession")
l <- .libPaths()
# Code to run in parallel does not make much sense of course but I wanted to keep it simple.
res <- foreach::foreach(
x = unique(iris$Species),
lib.path = l
) %dopar% {
.libPaths(lib.path)
stringr::str_c(x, "_")
}
Run Code Online (Sandbox Code Playgroud)
{ 中的错误:任务 2 失败 - “没有名为‘stringr’的包”
未来的作者在这里。
\n更新 2022-05-25:从未来1.20.0 (2021-11-03) 开始,多会话并行工作线程自动继承 R 库路径 (=.libPaths()并行工作线程自动从主 R 会话因此,不再需要以下解决方法。然而,未来的其他后端可能仍然需要它。
将 R 主进程的库路径作为全局变量传递libs并为每个工作进程设置它就.libPaths(libs)足够了;
## Use CRAN checkpoint from 2018-07-24 to get future (>= 1.9.0) [1],\n## otherwise the below stdout won\'t be relayed back to the master\n## R process, but settings .libPaths() does also work in older\n## versions of the future package.\n## [1] https://cran.microsoft.com/snapshot/2018-07-24/web/packages/future\ncheckpoint::checkpoint("2018-07-24")\nstopifnot(packageVersion("future") >= "1.9.0")\n\nlibs <- .libPaths()\nprint(libs)\n### [1] "/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1"\n### [2] "/home/hb/.checkpoint/R-3.5.1" \n### [3] "/usr/lib/R/library"\n\nlibrary(foreach)\n\ndoFuture::registerDoFuture()\nfuture::plan("multisession")\n\nres <- foreach::foreach(x = unique(iris$Species)) %dopar% {\n ## Use the same library paths as the master R session\n .libPaths(libs)\n \n cat(sprintf("Library paths used by worker (PID %d):\\n", Sys.getpid()))\n cat(sprintf(" - %s\\n", sQuote(.libPaths())))\n \n stringr::str_c(x, "_")\n}\n\n### - \xe2\x80\x98/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1\xe2\x80\x99\n### - \xe2\x80\x98/home/hb/.checkpoint/R-3.5.1\xe2\x80\x99\n### - \xe2\x80\x98/usr/lib/R/library\xe2\x80\x99\n### Library paths used by worker (PID 9394):\n### - \xe2\x80\x98/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1\xe2\x80\x99\n### - \xe2\x80\x98/home/hb/.checkpoint/R-3.5.1\xe2\x80\x99\n### - \xe2\x80\x98/usr/lib/R/library\xe2\x80\x99\n### Library paths used by worker (PID 9412):\n### - \xe2\x80\x98/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1\xe2\x80\x99\n### - \xe2\x80\x98/home/hb/.checkpoint/R-3.5.1\xe2\x80\x99\n### - \xe2\x80\x98/usr/lib/R/library\xe2\x80\x99\n\nstr(res)\n### List of 3\n### $ : chr "setosa_"\n### $ : chr "versicolor_"\n### $ : chr "virginica_"\nRun Code Online (Sandbox Code Playgroud)\n仅供参考,未来的路线图是为了更容易地将库路径传递给工人。
\n我的细节:
\n> sessionInfo()\nR version 3.5.1 (2018-07-02) \nPlatform: x86_64-pc-linux-gnu (64-bit) \nRunning under: Ubuntu 18.04.1 LTS \n\nMatrix products: default \nBLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 \nLAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 \n \nlocale: \n [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 \n [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C \n[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C \n \nattached base packages: \n[1] stats graphics grDevices utils datasets methods base \n \nother attached packages: \n[1] foreach_1.4.4 \n \nloaded via a namespace (and not attached): \n[1] drat_0.1.4 compiler_3.5.1 BiocManager_1.30.2 parallel_3.5.1 tools_3.5.1 listenv_0.7.0 doFuture_0.6.0 \n[8] codetools_0.2-15 iterators_1.0.10 digest_0.6.15 globals_0.12.1 checkpoint_0.4.5 future_1.9.0 \nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
1171 次 |
| 最近记录: |