Tom*_*lly 22
如果你右键单击RStudio,你应该可以打开几个单独的"会话"的RStudio(无论你是否使用项目).默认情况下,这些将分别使用1个核心.
更新(2018年7月):作为预览版提供的RStudio v1.2.830-1 支持"作业"窗格.这专用于在交互式R会话中独立运行R脚本:
在干净的R会话中将任何R脚本作为后台作业运行
监控进度并实时查看脚本输出
(可选)在启动时为作业提供全局环境,并在完成时将值导出
这将在RStudio版本1.2中提供.
如果您知道多个脚本运行没有错误,我建议通过命令行在不同的参数上运行这些脚本:
RCMD script.R
RScript script.R
R --vanilla < script.R
Run Code Online (Sandbox Code Playgroud)
在后台运行:
nohup Rscript script.R &
Run Code Online (Sandbox Code Playgroud)
这里"&"在后台运行脚本(可以使用fg,监视htop和使用kill <pid>或检索它pkill rsession)并将nohup输出保存在文件中,并在终端关闭时继续运行.
将参数传递给脚本:
Rscript script.R 1 2 3
Run Code Online (Sandbox Code Playgroud)
这将传递c(1, 2, 3)给R作为输出,commandArgs()因此bash中的循环可以使用bash循环运行多个Rscript实例:
for ii in 1 2 3
do
nohup Rscript script.R $ii &
done
Run Code Online (Sandbox Code Playgroud)
您经常会发现R脚本中的特定步骤会减慢计算速度,我建议您在R代码中运行并行代码而不是单独运行它们吗?我推荐使用snow包在R中并行运行循环.一般来说,不是使用:
cl <- makeCluster(n)
# n = number of cores (I'd recommend one less than machine capacity)
clusterExport(list=ls()) #export input data to all cores
output_list <- parLapply(cl, input_list, function(x) ... )
stopCluster() # close cluster when complete (particularly on shared machines)
Run Code Online (Sandbox Code Playgroud)
在通常使用lapplyR中的函数的任何地方使用它来并行运行它.
Chr*_*ris 16
假设结果不需要在同一环境中结束,您可以使用RStudio项目实现此目的:https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
首先创建两个单独的项目.您可以同时打开两个,这将导致两个rsessions.然后,您可以打开每个项目中的每个脚本,并分别执行每个脚本.然后在您的操作系统上管理核心分配.
小智 7
您可以使用以下代码在同一会话中实现多核并行(如此处所述https://cran.r-project.org/web/packages/doMC/vignettes/gettingstartedMC.pdf)
if(Sys.info()["sysname"]=="Windows"){
library(doParallel)
cl<-makeCluster(numberOfCores)
registerDoParallel(cl)
}else{
library(doMC)
registerDoMC(numberOfCores)
}
library(foreach)
someList<-list("file1","file2")
returnComputation <-
foreach(x=someList) %dopar%{
source(x)
}
if(Sys.info()["sysname"]=="Windows") stopCluster(cl)
Run Code Online (Sandbox Code Playgroud)
您仍然需要调整您的输出。
All you need to do (assuming you use Unix/Linux) is run a R batch command and put it in the background. This will automatically allocate it to a CPU.
At the shell, do:
/your/path/$ nohup R CMD BATCH --no-restore my_model1.R &
/your/path/$ nohup R CMD BATCH --no-restore my_model2.R &
/your/path/$ nohup R CMD BATCH --no-restore my_model3.R &
/your/path/$ nohup R CMD BATCH --no-restore my_model4.R &
Run Code Online (Sandbox Code Playgroud)
executes the commands, will save the printout in the file my_model1.Rout,and saves all created R objects in the file.RData. This will run each model on a different CPU. The run of the session and output will be put in the output files.
In case of you doing it over the Internet, via a terminal, you will need to use the nohup command. Otherwise, upon exiting the session, the processes will terminate.
/your/path/$ nohup R CMD BATCH --no-restore my_model1.R &
Run Code Online (Sandbox Code Playgroud)
If you want to give processes a low priority, you do:
/your/path/$ nohup nice -n 19 R CMD BATCH --no-restore my_model.R &
Run Code Online (Sandbox Code Playgroud)
You'd do best to include some code at the beginning of the script to load and attach the relevant data file.
NEVER do simply
/your/path/$ nohup R CMD BATCH my_model1.R &
Run Code Online (Sandbox Code Playgroud)
This will slurp the .RData file (all the funny objects there too), and will seriously compromise reproducibility. That is to say,
--no-restore
Run Code Online (Sandbox Code Playgroud)
or
--vanilla
Run Code Online (Sandbox Code Playgroud)
are your dear friends.
If you have too many models, I suggest doing computation on a cloud account, because you can have more CPU and RAM. Depending on what you are doing, and the R package, models can take hours on current hardware.
I've learned this the hard way, but there's a nice document here:
http://users.stat.umn.edu/~geyer/parallel/parallel.pdf
HTH.
小智 5
如果你想做一个令人尴尬的并行,你可以在终端选项卡(位于控制台选项卡之后)中打开任意数量的终端,并通过使用运行你的代码Rscript yourcode.R。默认情况下,每个代码将在单独的核心上运行。如果需要,您还可以使用命令行参数(如 @Tom Kelly 提到的)。
| 归档时间: |
|
| 查看次数: |
30857 次 |
| 最近记录: |