如何让purrr地图功能运行得更快？

Question

如何让purrr地图功能运行得更快？

我使用的map功能，从purrr库应用segmented功能（从segmented库），如下所示：

library(purrr)
library(dplyr)
library(segmented)

# Data frame is nested to create list column
by_veh28_101 <- df101 %>% 
  filter(LCType=="CFonly", Lane %in% c(1,2,3)) %>% 
  group_by(Vehicle.ID2) %>% 
  nest() %>% 
  ungroup()

# Functions:
segf2 <- function(df){
  try(segmented(lm(svel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dssvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}


segf2p <- function(df){
  try(segmented(lm(PrecVehVel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dspsvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}  

# map function:
models8_101 <- by_veh28_101 %>% 
  mutate(segs = map(data, segf2),
         segsp = map(data, segf2p))

Run Code Online (Sandbox Code Playgroud)

该对象by_veh28_101包含 2457 tibbles。最后一步，map使用函数，需要 16 分钟才能完成。有什么办法可以让它更快吗？

Answer 1

Lau*_*azo 5

您可以使用该函数future_map代替map。

此功能来自包装furrr，是map家庭的并行选项。这是包的自述文件的链接。

因为您的代码问题不可重现，所以我无法在map和future_map函数之间准备基准。

您的future_map函数代码如下：

library(tidyverse)
library(segmented)
library(furrr)


# Data frame stuff....

# Your functions....

# future_map function

# this distribute over the different cores of your computer
# You set a "plan" for how the code should run. The easiest is `multiprocess`
# On Mac this picks plan(multicore) and on Windows this picks plan(multisession)

plan(strategy = multiprocess)

models8_101 <- by_veh28_101 %>% 
  mutate(segs = future_map(data, segf2),
         segsp = future_map(data, segf2p))

Run Code Online (Sandbox Code Playgroud)

请注意，“多核”不再在 RStudio 中工作。从某些环境（例如 RStudio 环境）运行 R 时，分叉处理被认为不稳定。因此，自 future 1.13.0 起，“多核” future 在这些情况下已被禁用。[https://cran.case.edu/web/packages/future/NEWS](https://cran.case.edu/web/packages/future/NEWS) (2认同)

归档时间：	9 年，6 月前
查看次数：	1232 次
最近记录：	7 年，8 月前