我正在并行运行大量迭代.某些迭代比其他迭代花费更多(比如说100倍).我想把它们计算出来,但我宁愿不必深入研究函数背后的C代码(称之为fun.c)来做繁重的工作.我希望有一些类似于try()但有一个time.out选项.然后我可以这样做:
for (i in 1:1000) {
try(fun.c(args),time.out=60))->to.return[i]
}
Run Code Online (Sandbox Code Playgroud)
因此,如果fun.c在某个迭代中花费的时间超过60秒,那么经过修改的try()函数就会杀死它并在这些行上返回警告或其他内容.
有人有什么建议吗?提前致谢.
jth*_*zel 26
请参阅此主题:http://r.789695.n4.nabble.com/Time-out-for-aR-Function-td3075686.html
并?evalWithTimeout在R.utils包中.
这是一个例子:
require(R.utils)
## function that can take a long time
fn1 <- function(x)
{
for (i in 1:x^x)
{
rep(x, 1000)
}
return("finished")
}
## test timeout
evalWithTimeout(fn1(3), timeout = 1, onTimeout = "error") # should be fine
evalWithTimeout(fn1(8), timeout = 1, onTimeout = "error") # should timeout
Run Code Online (Sandbox Code Playgroud)
Ric*_*ton 13
这听起来应该是应该通过任何向工作人员发送任务来管理的东西,而不是应该包含在工作线程中的东西.该multicore软件包支持某些功能的超时; snow据我所知,没有.
编辑:如果你真的非常渴望在工作线程中有这个,那么试试这个功能,灵感来自@ jthetzel的答案中的链接.
try_with_time_limit <- function(expr, cpu = Inf, elapsed = Inf)
{
y <- try({setTimeLimit(cpu, elapsed); expr}, silent = TRUE)
if(inherits(y, "try-error")) NULL else y
}
try_with_time_limit(sqrt(1:10), 1) #value returns as normal
try_with_time_limit(for(i in 1:1e7) sqrt(1:10), 1) #returns NULL
Run Code Online (Sandbox Code Playgroud)
您可能希望在超时时自定义行为.目前它刚刚回归NULL.
您在评论中提到您的问题是 C 代码运行时间过长。根据我的经验,基于setTimeLimit/的纯 R 超时解决方案evalWithTimeout都不能停止 C 代码的执行,除非代码提供了中断 R 的机会。
您还在评论中提到您正在对 SNOW 进行并行化。如果您要并行化的机器是支持分叉的操作系统(即,不是 Windows),那么您可以在 SNOW 集群上的节点的命令上下文中使用 mcparallel(在parallel包中,派生自multicore);反过来也是如此 顺便说一句,您可以从multicore分叉的上下文中触发 SNOW 集群。如果您没有通过 SNOW 并行化,这个答案也(当然)成立,前提是需要超时 C 代码的机器可以分叉。
这适用于opencpueval_fork使用的解决方案。在函数主体下方查看Windows 中的一个 hack 的轮廓以及该 hack 的半个版本。eval_fork
eval_fork <- function(..., timeout=60){
#this limit must always be higher than the timeout on the fork!
setTimeLimit(timeout+5);
#dispatch based on method
##NOTE!!!!! Due to a bug in mcparallel, we cannot use silent=TRUE for now.
myfork <- parallel::mcparallel({
eval(...)
}, silent=FALSE);
#wait max n seconds for a result.
myresult <- parallel::mccollect(myfork, wait=FALSE, timeout=timeout);
#try to avoid bug/race condition where mccollect returns null without waiting full timeout.
#see https://github.com/jeroenooms/opencpu/issues/131
#waits for max another 2 seconds if proc looks dead
while(is.null(myresult) && totaltime < timeout && totaltime < 2) {
Sys.sleep(.1)
enddtime <- Sys.time();
totaltime <- as.numeric(enddtime - starttime, units="secs")
myresult <- parallel::mccollect(myfork, wait = FALSE, timeout = timeout);
}
#kill fork after collect has returned
tools::pskill(myfork$pid, tools::SIGKILL);
tools::pskill(-1 * myfork$pid, tools::SIGKILL);
#clean up:
parallel::mccollect(myfork, wait=FALSE);
#timeout?
if(is.null(myresult)){
stop("R call did not return within ", timeout, " seconds. Terminating process.", call.=FALSE);
}
#move this to distinguish between timeout and NULL returns
myresult <- myresult[[1]];
#reset timer
setTimeLimit();
#forks don't throw errors themselves
if(inherits(myresult,"try-error")){
#stop(myresult, call.=FALSE);
stop(attr(myresult, "condition"));
}
#send the buffered response
return(myresult);
}
Run Code Online (Sandbox Code Playgroud)
Windows hack:原则上,特别是对于 SNOW 中的工作节点,您可以通过拥有工作节点来完成类似的事情:
save.image)存储到已知位置RscriptR 脚本,该脚本加载节点保存的工作区,然后保存结果(本质上是对 R 工作区进行慢速内存分叉)。我在很久以前写了一些代码,例如使用慢速内存副本在本地主机上的 Windows 上的 mcparallel。我现在会用完全不同的方式来写它,但它可能会给你一个起点,所以无论如何我都会提供它。需要注意的一些问题,russmisc是我正在编写的一个包,它现在作为repsych. glibrary是一个函数,repsych如果它不可用则安装一个包(如果你的 SNOW 不只是在本地主机上,这可能很重要)。...当然,我没有在 /years/ 中使用此代码,而且我最近还没有对其进行过测试 - 我共享的版本可能包含我在以后的版本中解决的错误。
# Farm has been banished here because it likely violates
# CRAN's rules in regards to where it saves files and is very
# windows specific. Also, the darn thing is buggy.
#' Create a farm
#'
#' A farm is an external self-terminating instance of R to solve a time consuming problem in R.
#' Think of it as a (very) poor-person's multi-core.
#' For a usage example, see checkFarm.
#' Known issues: May have a problem if the library gdata has been loaded.//
#' If a farm produces warnings or errors you won't see them
#' If a farm produces an error... it never will produce a result.
#'
#' @export
#' @param commands A text string of commands including line breaks to run.
#' This must include the result being saved in the object farmName in the file farmResult (both are variables provided by farm() to the farm).
#' @param farmName This is the name of the farm, used for creating and destroying filenames. One is randomly assigned that is plausibly unique.
#' @param Rloc The location of R.exe. The default loads the version of R that is stored in the windows registry as being \"current\".
#' @return The farm name is returned to be stored in an object and then used in checkFarm()
#' @seealso \code{\link{checkFarm}} \code{\link{waitForFarm}}
farm <- function(commands,farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL)
{
if (is.null(Rloc)) {Rloc <- paste('\"',readRegistry(paste("Software\\R-core\\R\\",readRegistry("Software\\R-core\\R\\",maxdepth=100)$`Current Version`,"\\",sep=""))$InstallPath,"\\bin",sep="")}
Rloc <- paste(Rloc,"\\R.exe\"",sep="")
farmRda <- paste(farmName,".Rda",sep="")
farmRda.int <- paste(farmName,".int.Rda",sep="") #internal .Rda
farmR <- paste(farmName,".R",sep="")
farmResult <- paste(farmName,".res.Rda",sep="") #result .Rda
unlink(c(farmRda,farmR,farmResult,farmRda.int))
farmwd <- getwd()
cat("setwd(\"",farmwd,"\")\n",file=farmR,append=TRUE,sep="")
#loading the internals to get them, then loading the globals, then reloading the internals to make sure they have haven't been overwritten
cat("
load(\"",farmRda.int,"\")
load(farmRda)
load(\"",farmRda.int,"\")
",file=farmR,append=TRUE,sep="")
cat("library(russmisc)\n",file=farmR,append=TRUE)
cat("glibrary(",paste(c(names(sessionInfo()$loadedOnly),names(sessionInfo()$otherPkgs)),collapse=","),")\n",file=farmR,append=TRUE)
cat(commands,file=farmR,append=TRUE)
cat("
unlink(farmRda)
unlink(farmRda.int)
",file=farmR,append=TRUE,sep="")
save(list = ls(all.names=TRUE,envir=.GlobalEnv), file = farmRda,envir=.GlobalEnv)
save(list = ls(all.names=TRUE), file = farmRda.int)
#have to drop the escaped quotes for file.exists to find the file
if (file.exists(gsub('\"','',Rloc))) {
cmd <- paste(Rloc," --file=",getwd(),"/",farmR,sep="")
} else {
stop(paste("Error in russmisc:farm: Unable to find R.exe at",Rloc))
}
print(cmd)
shell(cmd,wait=FALSE)
return(farmName)
}
NULL
#' Check a farm
#'
#' See farm() for details on farms. This function checks for a file based on the farmName parameter called farmName.res.Rda.
#' If that file exists it loads it and returns the object stored by the farm in the object farmName. If that file does not exist,
#' then the farm is not done processing, and a warning and NULL are returned. Note that a rapid loop through checkFarm() without Sys.sleep produced an error during development.
#'
#' @export
#' @param farmName This is the name of the farm, used for creating and destroying filenames. This should be saved from when the farm() is created
#' @seealso \code{\link{farm}} \code{\link{waitForFarm}}
#' @examples
#' #Example not run
#' #.tmp <- "This is a test of farm()"
#' #exampleFarm <- farm("
#' #print(.tmp)
#' #helloFarm <- 10+2
#' #farmName <- helloFarm
#' #save(farmName,file=farmResult)
#' #")
#' #example.result <- checkFarm(exampleFarm)
#' #while (is.null(example.result)) {
#' # example.result <- checkFarm(exampleFarm)
#' # Sys.sleep(1)
#' #}
#' #print(example.result)
checkFarm <- function(farmName) {
farmResult <- paste(farmName,".res.Rda",sep="")
farmR <- paste(farmName,".r",sep="")
if (!file.exists(farmR)) {
message(paste("Warning in russmisc:checkFarm: There is no evidence that the farm '",farmName,"' exists (no .r file found).\n",sep=""))
}
if (file.exists(farmResult)) {
load(farmResult)
unlink(farmResult) #delete the farmResult file
unlink(farmR) #delete the script file
return(farmName)
} else {
warning(paste("Warning in russmisc:checkFarm: The farm '",farmName,"' is not ready.\n",sep=""))
return(invisible(NULL))
}
}
NULL
#' Wait for a farm result
#'
#' This function repeatedly checks for a farm, when the farm is found it returns the harvest (the farm result object).
#' If the farm terminated with an error or there is some other sort of coding error, waitForFarm will be an infinate loop. As
#' \code{checkFarm} produces errors on checks when the harvest is not ready, waitForFarm hides these errors in the factory error-catching wrapper.
#'
#' @export
#' @param farmName This is the name of the farm, used for creating and destroying filenames. This should be saved from when the farm() is created
#' @param noCheck If this value is TRUE the check for the farm's .r is skipped. If it is FALSE, the existance of the appropriate .r is checked for before entering a potentially unending while loop.
waitForFarm <- function(farmName,noCheck=FALSE) {
f.checkFarm <- factory(checkFarm)
farmR <- paste(farmName,".r",sep="")
if (!file.exists(farmR) & !noCheck) {
stop(paste("Error in russmisc:checkFarm: There is no evidence that the farm '",farmName,"' exists (no .r file found).\n",sep=""))
}
repeat {
harvest <- f.checkFarm(farmName)
if (!is.null(harvest[[1]])) {break}
Sys.sleep(1)
}
return(harvest[[1]])
}
NULL
#' Create a one-line simple farm
#'
#' This is a convience wrapper function that uses farm to create a single farm appropriate for processing single line commands.
#'
#' @export
#' @param command A single command
#' @param farmName This is the name of the farm, used for creating and destroying filenames. One is randomly assigned that is plausibly unique.
#' @param Rloc The location of R.exe. The default loads the version of R that is stored in the windows registry as being \"current\".
#' @return The farm name is returned to be stored in an object and then used in checkFarm()
#' @seealso \code{\link{farm}}, \code{\link{checkFarm}}, and \code{\link{waitForFarm}}
#' @examples
#' #Example not run
#' #a <- 5
#' #b <- 10
#' #farmID <- simpleFarm("a + b")
#' #waitForFarm(farmID)
simpleFarm <- function(command,farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL) {
return(farm(paste("farmName <- (",command,");save(farmName,file=farmResult)",collapse=""),farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL))
}
NULL
Run Code Online (Sandbox Code Playgroud)
我喜欢R.utils::withTimeout(),但如果可以,我也渴望避免包依赖性。这是基本 R 中的解决方案。请注意on.exit()调用。即使您的表达式抛出错误,它也可以确保删除时间限制。
with_timeout <- function(expr, cpu, elapsed){
expr <- substitute(expr)
envir <- parent.frame()
setTimeLimit(cpu = cpu, elapsed = elapsed, transient = TRUE)
on.exit(setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE))
eval(expr, envir = envir)
}
Run Code Online (Sandbox Code Playgroud)