在data.table中为R获取随机内部selfref错误

the*_*one 10 r setattribute data.table

我喜欢data.table,它快速而直观,还有什么可以更好?唉,这是我的问题:当data.tableforeach()循环中引用一个循环(使用doMC实现)时,我偶尔会得到以下错误: 附录中的示例

Error in { : 
  Internal error: .internal.selfref prot is not itself an extptr
Run Code Online (Sandbox Code Playgroud)

这里令人讨厌的问题之一是我不能让它以任何一致性重现,但它会在一些很长(几小时)的任务中发生,所以我想确保它永远不会发生,如果可能的话.

由于我引用相同的data.table,DT在每个循环中,我尝试在每个循环的开头运行以下内容:

setattr(DT,".internal.selfref",NULL)   
Run Code Online (Sandbox Code Playgroud)

...删除无效/损坏的self ref属性.这有效,并且不再出现内部selfref错误.不过,这是一种解决方法.

解决根本问题的任何想法?

非常感谢您的帮助!

埃里克

附录:缩写R会话信息以确认最新版本:

R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
other attached packages:
 [1] data.table_1.8.8  doMC_1.3.0
Run Code Online (Sandbox Code Playgroud)

使用模拟数据的示例 - 您可能需要history()多次运行该函数(如数百个)才能获得错误:

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load packages and Prepare Data
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require(data.table)
##this is the package we use for multicore
require(doMC)
##register n-2 of your machine's cores
registerDoMC(multicore:::detectCores()-2) 

## Build simulated data
value.a <- runif(500,0,1)
value.b <- 1-value.a
value <- c(value.a,value.b)
answer.opt <- c(rep("a",500),rep("b",500))
answer.id <- rep( 6000:6499 , 2)
question.id <- rep( sample(c(1001,1010,1041,1121,1124),500,replace=TRUE) ,2)
date <- rep( (Sys.Date() - sample.int(150, size=500, replace=TRUE)) , 2)
user.id <- rep( sample(250:350, size=500, replace=TRUE) ,2)
condition <- substr(as.character(user.id),1,1)
condition[which(condition=="2")] <- "x"
condition[which(condition=="3")] <- "y"

##Put everything in a data.table
DT.full <- data.table(user.id = user.id,
                      answer.opt = answer.opt,
                      question.id = question.id,
                      date = date,
                      answer.id = answer.id,
                      condition = condition,
                      value = value)

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Daily Aggregation Function
##
##a basic function that aggregates all the values from
##all users for every question on a given day:
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
each.day <- function(val.date){
  DT <- DT.full[ date < val.date ]

  #count the number of updates per user (for weighting)
  setkey(DT, question.id, user.id)
  DT <- DT[ DT[answer.opt=="a",length(value),by="question.id,user.id"] ]
  setnames(DT, "V1", "freq")

  #retain only the most recent value from each user on each question
  setkey(DT, question.id, user.id, answer.id)
  DT <- DT[ DT[ ,answer.id == max(answer.id), by="question.id,user.id", ][[3]] ]

  #now get a weighted mean (with freq) of the value for each question
  records <- lapply(unique(DT$question.id), function(q.id) {
    DT <- DT[ question.id == q.id ]
    probs <- DT[ ,weighted.mean(value,freq), by="answer.opt" ]
    return(data.table(q.id = rep(q.id,nrow(probs)),
                      ans.opt = probs$answer.opt,
                      date = rep(val.date,nrow(probs)),
                      value = probs$V1))
  })
  return(do.call("rbind",records))
}

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## foreach History Function 
##
##to aggregate accross many days quickly
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
history <- function(start, end){
  #define a sequence of dates
  date.seq <- seq(as.Date(start),as.Date(end),by="day")

  #now run a foreach to get the history for each date
  hist <- foreach(day = date.seq,  .combine = "rbind") %dopar% {
    #setattr(DT,".internal.selfref",NULL) #resolves occasional internal selfref error
    each.day(val.date = day)
  }
}

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Examples
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

##aggregate only one day
each.day(val.date = "2012-12-13")

##generate a history
hist.example <- history (start = "2012-11-01", end = Sys.Date())
Run Code Online (Sandbox Code Playgroud)

Jas*_*onB 2

类似的问题已经困扰我几个月了。也许我们可以通过把我们的经历放在一起来看到一个模式。

我一直在等待发布,直到我可以创建一个可重现的示例。到目前为止还不可能。该错误不会发生在同一代码位置。在过去,我经常只需重新运行完全相同的代码就可以避免错误。其他时候,我重新制定了表达式并成功重新运行。无论如何,我非常确定这些错误确实是 data.table 内部的。

我保存了最后 4 条错误消息以尝试检测模式(粘贴在下面)。

---------------------------------------------------
[1] "err msg: location 1"
Error in selfrefok(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: my.fun1 ... $<- -> $<-.data.table -> [<-.data.table -> selfrefok
Execution halted


---------------------------------------------------
[1] "err msg: location 1"
Error in alloc.col(newx) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: my.fun1 -> $<- -> $<-.data.table -> copy -> alloc.col
Execution halted


---------------------------------------------------
[1] "err msg: location 2"
Error in shallow(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: print ... do.call -> lapply -> as.list -> as.list.data.table -> shallow
Execution halted

---------------------------------------------------
[1] "err msg: location 3"
Error in shallow(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: calc.book.summ ... .rbind.data.table -> as.list -> as.list.data.table -> shallow
Execution halted
Run Code Online (Sandbox Code Playgroud)

与上面示例的另一个相似之处:我在并行线程之间传递 data.tables,因此它们正在被序列化/反序列化。

我将尝试上面提到的“setattr”修复。

希望这有帮助,谢谢,杰森

以下是其中一个代码段的简化,该代码段似乎每运行 50-100k 次就会生成此错误 1 ​​次:

谢谢@MatthewDowle 顺便说一句。data.table 是最有用的。这是一段精简的代码:

require(data.table)
require(xts)

book <- data.frame(name='',
                   s=0,
                   Value=0.0,
                   x=0.0,
                   Qty=0)[0, ]

for (thing in list(1,2,3,4,5)) {

  tmp <- xts(1:5, order.by= make.index.unique(rep(Sys.time(), 5)))
  colnames(tmp) <- 'A'
  tmp <- cbind(coredata(tmp[nrow(tmp), 'A']),
               coredata(colSums(tmp[, 'A'])),
               coredata(tmp[nrow(tmp), 'A']))

  book <- rbind(book,
                data.table(name='ALPHA',
                           s=0*NA,
                           Value=tmp[1],
                           x=tmp[2],
                           Qty=tmp[3]))

}
Run Code Online (Sandbox Code Playgroud)

像这样的事情似乎是这个错误的原因:

Error in shallow(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: my.function ... .rbind.data.table -> as.list -> as.list.data.table -> shallow
Execution halted
Run Code Online (Sandbox Code Playgroud)