尚未支持mclapply长向量

Ign*_*cio 11 r

我正在尝试运行一些R代码,因为内存而崩溃.我得到的错误是:

Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : 
  long vectors not supported yet: memory.c:3100
Run Code Online (Sandbox Code Playgroud)

产生麻烦的功能如下:

StationUserX <- function(userNDX){
  lat1 = deg2rad(geolocation$latitude[userNDX])
  long1 = deg2rad(geolocation$longitude[userNDX])
  session_user_id = as.character(geolocation$session_user_id[userNDX])
  #Find closest station
  Distance2Stations <- unlist(lapply(stationNDXs, Distance2StationX, lat1, long1))
  # Return index for closest station and distance to closest station
  stations_userX = data.frame(session_user_id = session_user_id, 
                              station = ghcndstations$ID[stationNDXs], 
                              Distance2Station = Distance2Stations)    
  stations_userX = stations_userX[with(stations_userX, order(Distance2Station)), ]
  stations_userX = stations_userX[1:100,] #only the 100 closest stations...
  row.names(stations_userX)<-NULL
  return(stations_userX)
}
Run Code Online (Sandbox Code Playgroud)

我用mclapply运行这个函数50k次.StationUserX呼叫Distance2StationX 90k次.

是否有一种明显的方法来优化StationUserX功能?

Sta*_*tan 14

mclapply无法将工作线程中的所有数据发送回主线程.这是因为预先调度,每个线程运行大量迭代,然后同步所有数据.这很好而且速度很快,但会导致大约2GB的数据被发回,这是不可能做到的.

运行mclapplymc.preschedule=F关闭前调度.现在,每次迭代都会产生自己的线程并返回自己的数据.它不会那么快,但它解决了问题.