我有一个案例,foreach使用doMC后端在不同的机器上产生不同的行为.
在运行Ubuntu 12.04.4 LTS的Linux服务器上,以下代码(改编自foreach vingette)在单个内核上同时运行5个作业,这不是所需的行为.
library(foreach)
library(doMC)
registerDoMC(cores=5)
getDoParWorkers()
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
r <- foreach(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
Run Code Online (Sandbox Code Playgroud)
会话信息:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C
[6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用Python包Boto从Glacier下载大型存档(~1 TB).我使用的当前方法如下所示:
import os
import boto.glacier
import boto
import time
ACCESS_KEY_ID = 'XXXXX'
SECRET_ACCESS_KEY = 'XXXXX'
VAULT_NAME = 'XXXXX'
ARCHIVE_ID = 'XXXXX'
OUTPUT = 'XXXXX'
layer2 = boto.connect_glacier(aws_access_key_id = ACCESS_KEY_ID,
aws_secret_access_key = SECRET_ACCESS_KEY)
gv = layer2.get_vault(VAULT_NAME)
job = gv.retrieve_archive(ARCHIVE_ID)
job_id = job.id
while not job.completed:
time.sleep(10)
job = gv.get_job(job_id)
if job.completed:
print "Downloading archive"
job.download_to_file(OUTPUT)
Run Code Online (Sandbox Code Playgroud)
问题是作业ID在24小时后到期,这还不足以检索整个存档.我需要将下载分解为至少4个.我该怎么做并将输出写入单个文件?
对于图中的每个边,我想添加一个数值属性(权重),它是事件顶点的属性(概率)的乘积.我可以通过循环边缘来做到这一点; 那是:
for (i in E(G)) {
ind <- V(G)[inc(i)]
p <- get.vertex.attribute(G, name = "prob", index=ind)
E(G)[i]$weight <- prod(p)
}
Run Code Online (Sandbox Code Playgroud)
但是,这对于我的图表来说速度很慢(| V |〜= 20,000和| E |〜= 200,000).有没有更快的方法来执行此操作?