如何在purrr :: pmap中分叉/并行化进程

nev*_*int 5 parallel-processing r purrr tidyverse

我有以下代码用purr :: pmap进行串行处理


library(tidyverse)

set.seed(1)
params <- tribble(
  ~mean, ~sd, ~n,
  5,     1,  1,
  10,     5,  3,
  -3,    10,  5
)
params %>% 
  pmap(rnorm)
#> [[1]]
#> [1] 4.373546
#> 
#> [[2]]
#> [1] 10.918217  5.821857 17.976404
#> 
#> [[3]]
#> [1]   0.2950777 -11.2046838   1.8742905   4.3832471   2.7578135
Run Code Online (Sandbox Code Playgroud)

如何并行化(fork)上面的过程,以便它运行得更快并产生相同的结果?

在这里,我rnorm用于说明目的,实际上我有一个功能,做重型工作.它需要并行化.

我对非purrr(非tidyverse)解决方案持开放态度,只要它在给定rnorm函数和params输入时产生相同的结果.

Aur*_*èle 7

简而言之:pmap()允许类似语法的"并行" pmap()可能看起来像:lift(mcmapply)()lift(clusterMap)().


如果您不在Windows上,则可以:

library(parallel)

# forking

set.seed(1, "L'Ecuyer")
params %>% 
  lift(mcmapply, mc.cores = detectCores() - 1)(FUN = rnorm)

# [[1]]
# [1] 4.514604
# 
# [[2]]
# [1] 0.7022156 0.8734875 5.0250478
# 
# [[3]]
# [1]   8.7704060  11.7217925 -12.8776289 -10.7466152   0.5177089
Run Code Online (Sandbox Code Playgroud)

编辑

这是一个"更清洁"的选项,应该更像是使用pmap:

nc <- max(parallel::detectCores() - 1, 1L)

par_pmap <- function(.l, .f, ..., mc.cores = getOption("mc.cores", 2L)) {
  do.call(
    parallel::mcmapply, 
    c(.l, list(FUN = .f, MoreArgs = list(...), SIMPLIFY = FALSE, mc.cores = mc.cores))
  )
}

f <- function(n, mean, sd, ...) rnorm(n, mean, sd) 

params %>% 
  par_pmap(f, some_other_arg_to_f = "foo", mc.cores = nc)
Run Code Online (Sandbox Code Playgroud)

如果您使用的是Windows(或任何其他操作系统),您可以:

library(parallel)

# (Parallel SOCKet cluster)

cl <- makeCluster(detectCores() - 1)

clusterSetRNGStream(cl, 1)
params %>% 
  lift(clusterMap, cl = cl)(fun = rnorm)

# [[1]]
# [1] 5.460811
# 
# [[2]]
# [1] 7.573021 6.870994 5.633097
# 
# [[3]]
# [1] -21.595569 -21.253025 -12.949904  -4.817278  -7.650049

stopCluster(cl)
Run Code Online (Sandbox Code Playgroud)

如果你更倾向于使用foreach,你可以:

library(doParallel)

# (fork by default on my Linux machine, should PSOCK by default on Windows)

registerDoParallel(cores = detectCores() - 1)

set.seed(1, "L'Ecuyer")
lift(foreach)(params) %dopar%
  rnorm(n, mean, sd)

# [[1]]
# [1] 4.514604
# 
# [[2]]
# [1] 0.7022156 0.8734875 5.0250478
# 
# [[3]]
# [1]   8.7704060  11.7217925 -12.8776289 -10.7466152   0.5177089

stopImplicitCluster()
Run Code Online (Sandbox Code Playgroud)