the*_*one 10 r setattribute data.table
我喜欢data.table,它快速而直观,还有什么可以更好?唉,这是我的问题:当data.table在foreach()循环中引用一个循环(使用doMC实现)时,我偶尔会得到以下错误:
附录中的示例
Error in { :
Internal error: .internal.selfref prot is not itself an extptr
Run Code Online (Sandbox Code Playgroud)
这里令人讨厌的问题之一是我不能让它以任何一致性重现,但它会在一些很长(几小时)的任务中发生,所以我想确保它永远不会发生,如果可能的话.
由于我引用相同的data.table,DT在每个循环中,我尝试在每个循环的开头运行以下内容:
setattr(DT,".internal.selfref",NULL)
Run Code Online (Sandbox Code Playgroud)
...删除无效/损坏的self ref属性.这有效,并且不再出现内部selfref错误.不过,这是一种解决方法.
解决根本问题的任何想法?
非常感谢您的帮助!
埃里克
附录:缩写R会话信息以确认最新版本:
R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
other attached packages:
[1] data.table_1.8.8 doMC_1.3.0
Run Code Online (Sandbox Code Playgroud)
使用模拟数据的示例 - 您可能需要history()多次运行该函数(如数百个)才能获得错误:
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load packages and Prepare Data
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require(data.table)
##this is the package we use for multicore
require(doMC)
##register n-2 of your machine's cores
registerDoMC(multicore:::detectCores()-2)
## Build simulated data
value.a <- runif(500,0,1)
value.b <- 1-value.a
value <- c(value.a,value.b)
answer.opt <- c(rep("a",500),rep("b",500))
answer.id <- rep( 6000:6499 , 2)
question.id <- rep( sample(c(1001,1010,1041,1121,1124),500,replace=TRUE) ,2)
date <- rep( (Sys.Date() - sample.int(150, size=500, replace=TRUE)) , 2)
user.id <- rep( sample(250:350, size=500, replace=TRUE) ,2)
condition <- substr(as.character(user.id),1,1)
condition[which(condition=="2")] <- "x"
condition[which(condition=="3")] <- "y"
##Put everything in a data.table
DT.full <- data.table(user.id = user.id,
answer.opt = answer.opt,
question.id = question.id,
date = date,
answer.id = answer.id,
condition = condition,
value = value)
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Daily Aggregation Function
##
##a basic function that aggregates all the values from
##all users for every question on a given day:
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
each.day <- function(val.date){
DT <- DT.full[ date < val.date ]
#count the number of updates per user (for weighting)
setkey(DT, question.id, user.id)
DT <- DT[ DT[answer.opt=="a",length(value),by="question.id,user.id"] ]
setnames(DT, "V1", "freq")
#retain only the most recent value from each user on each question
setkey(DT, question.id, user.id, answer.id)
DT <- DT[ DT[ ,answer.id == max(answer.id), by="question.id,user.id", ][[3]] ]
#now get a weighted mean (with freq) of the value for each question
records <- lapply(unique(DT$question.id), function(q.id) {
DT <- DT[ question.id == q.id ]
probs <- DT[ ,weighted.mean(value,freq), by="answer.opt" ]
return(data.table(q.id = rep(q.id,nrow(probs)),
ans.opt = probs$answer.opt,
date = rep(val.date,nrow(probs)),
value = probs$V1))
})
return(do.call("rbind",records))
}
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## foreach History Function
##
##to aggregate accross many days quickly
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
history <- function(start, end){
#define a sequence of dates
date.seq <- seq(as.Date(start),as.Date(end),by="day")
#now run a foreach to get the history for each date
hist <- foreach(day = date.seq, .combine = "rbind") %dopar% {
#setattr(DT,".internal.selfref",NULL) #resolves occasional internal selfref error
each.day(val.date = day)
}
}
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Examples
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##aggregate only one day
each.day(val.date = "2012-12-13")
##generate a history
hist.example <- history (start = "2012-11-01", end = Sys.Date())
Run Code Online (Sandbox Code Playgroud)
类似的问题已经困扰我几个月了。也许我们可以通过把我们的经历放在一起来看到一个模式。
我一直在等待发布,直到我可以创建一个可重现的示例。到目前为止还不可能。该错误不会发生在同一代码位置。在过去,我经常只需重新运行完全相同的代码就可以避免错误。其他时候,我重新制定了表达式并成功重新运行。无论如何,我非常确定这些错误确实是 data.table 内部的。
我保存了最后 4 条错误消息以尝试检测模式(粘贴在下面)。
---------------------------------------------------
[1] "err msg: location 1"
Error in selfrefok(x) :
Internal error: .internal.selfref prot is not itself an extptr
Calls: my.fun1 ... $<- -> $<-.data.table -> [<-.data.table -> selfrefok
Execution halted
---------------------------------------------------
[1] "err msg: location 1"
Error in alloc.col(newx) :
Internal error: .internal.selfref prot is not itself an extptr
Calls: my.fun1 -> $<- -> $<-.data.table -> copy -> alloc.col
Execution halted
---------------------------------------------------
[1] "err msg: location 2"
Error in shallow(x) :
Internal error: .internal.selfref prot is not itself an extptr
Calls: print ... do.call -> lapply -> as.list -> as.list.data.table -> shallow
Execution halted
---------------------------------------------------
[1] "err msg: location 3"
Error in shallow(x) :
Internal error: .internal.selfref prot is not itself an extptr
Calls: calc.book.summ ... .rbind.data.table -> as.list -> as.list.data.table -> shallow
Execution halted
Run Code Online (Sandbox Code Playgroud)
与上面示例的另一个相似之处:我在并行线程之间传递 data.tables,因此它们正在被序列化/反序列化。
我将尝试上面提到的“setattr”修复。
希望这有帮助,谢谢,杰森
以下是其中一个代码段的简化,该代码段似乎每运行 50-100k 次就会生成此错误 1 次:
谢谢@MatthewDowle 顺便说一句。data.table 是最有用的。这是一段精简的代码:
require(data.table)
require(xts)
book <- data.frame(name='',
s=0,
Value=0.0,
x=0.0,
Qty=0)[0, ]
for (thing in list(1,2,3,4,5)) {
tmp <- xts(1:5, order.by= make.index.unique(rep(Sys.time(), 5)))
colnames(tmp) <- 'A'
tmp <- cbind(coredata(tmp[nrow(tmp), 'A']),
coredata(colSums(tmp[, 'A'])),
coredata(tmp[nrow(tmp), 'A']))
book <- rbind(book,
data.table(name='ALPHA',
s=0*NA,
Value=tmp[1],
x=tmp[2],
Qty=tmp[3]))
}
Run Code Online (Sandbox Code Playgroud)
像这样的事情似乎是这个错误的原因:
Error in shallow(x) :
Internal error: .internal.selfref prot is not itself an extptr
Calls: my.function ... .rbind.data.table -> as.list -> as.list.data.table -> shallow
Execution halted
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1167 次 |
| 最近记录: |