我正在尝试运行一些R代码,因为内存而崩溃.我得到的错误是:
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
long vectors not supported yet: memory.c:3100
Run Code Online (Sandbox Code Playgroud)
产生麻烦的功能如下:
StationUserX <- function(userNDX){
lat1 = deg2rad(geolocation$latitude[userNDX])
long1 = deg2rad(geolocation$longitude[userNDX])
session_user_id = as.character(geolocation$session_user_id[userNDX])
#Find closest station
Distance2Stations <- unlist(lapply(stationNDXs, Distance2StationX, lat1, long1))
# Return index for closest station and distance to closest station
stations_userX = data.frame(session_user_id = session_user_id,
station = ghcndstations$ID[stationNDXs],
Distance2Station = Distance2Stations)
stations_userX = stations_userX[with(stations_userX, order(Distance2Station)), ]
stations_userX = stations_userX[1:100,] #only the 100 closest stations...
row.names(stations_userX)<-NULL
return(stations_userX)
}
Run Code Online (Sandbox Code Playgroud)
我用mclapply运行这个函数50k次.StationUserX呼叫Distance2StationX 90k次.
是否有一种明显的方法来优化StationUserX功能?
Sta*_*tan 14
mclapply无法将工作线程中的所有数据发送回主线程.这是因为预先调度,每个线程运行大量迭代,然后同步所有数据.这很好而且速度很快,但会导致大约2GB的数据被发回,这是不可能做到的.
运行mclapply与mc.preschedule=F关闭前调度.现在,每次迭代都会产生自己的线程并返回自己的数据.它不会那么快,但它解决了问题.