tryCatch - 命名空间?

Jen*_*nny 4 r try-catch assign

我对R很新,我对正确用法感到困惑tryCatch.我的目标是对大型数据集进行预测.如果预测无法适应内存,我想通过拆分数据来规避问题.

现在,我的代码大致如下:

tryCatch({
  large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
  for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
    small_vector = predict(model, large_data_frame[i:(i+step-1), ])
    save(small_vector, tmpfile)
  }
  rm(large_data_frame) # free memory
  large_vector = NULL
  for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
    load(tmpfile)
    unlink(tmpfile)
    large_vector = c(large_vector, small_vector)
  }
})
Run Code Online (Sandbox Code Playgroud)

关键是如果没有错误发生,large_vector则按预期填充我的预测.如果发生错误,large_vector似乎只存在于错误代码的命名空间中 - 这是有道理的,因为我将其声明为函数.出于同样的原因,我收到一条警告说large_data_frame无法删除.

不幸的是,这种行为不是我想要的.我想large_vector从我的错误函数中分配变量.我认为一种可能性是指定环境并使用assign.因此,我会在我的错误代码中使用以下语句:

rm(large_data_frame, envir = parent.env(environment()))
[...]
assign('large_vector', large_vector, parent.env(environment()))
Run Code Online (Sandbox Code Playgroud)

但是,这个解决方案对我来说似乎很脏.我想知道是否有可能用"干净"的代码实现我的目标?

[编辑]似乎有些混乱,因为我把上面的代码主要用来说明问题,而不是给出一个有效的例子.这是一个显示命名空间问题的最小示例:

# Example 1 : large_vector fits into memory
rm(large_vector)
tryCatch({
  large_vector = rep(5, 1000)
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
})
print(large_vector)  # all 5

# Example 2 : pretend large_vector does not fit into memory; solution using parent environment
rm(large_vector)
tryCatch({ 
  stop();  # simulate error
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
  assign('large_vector', large_vector, parent.env(environment()))
})
print(large_vector)  # all 3

# Example 3 : pretend large_vector does not fit into memory; namespace issue
rm(large_vector)
tryCatch({ 
  stop();  # simulate error
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
})
print(large_vector)  # does not exist
Run Code Online (Sandbox Code Playgroud)

ags*_*udy 5

我会做这样的事情:

res <- tryCatch({
  large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
  ll <- lapply(split(data,seq(1,nrow(large_data_frame),1000)),
         function(x)
             small_vector = predict(model, x))
  return(ll)
})
rm(large_data_frame)
if(is.list(ll)) 
  res <- do.call(rbind,res)
Run Code Online (Sandbox Code Playgroud)

如果耗尽内存,我们的想法是返回预测结果列表.

注意,我不确定这里的结果,因为我们没有可重复的例子.