相关疑难解决方法(0)

Slow memory leak in data.table when returning named lists in j (trying to reshape a data.table)

Edit 3:

I created a much shorter example of the memory leak. I hope it makes it much easier to reason about what's going on. As the iterations proceed, you see steadily increasing gc() VCell memory use, while memory use reported by tables() stays the same. Somehow, the unlist(.SD) call seems to be responsible. Here it is:

DT = data.table(k = 1:100, g = 1:20, val = rnorm(2e6))
for (i in 1:100){
  tmp = DT[ , unlist(.SD), by = 'k'] …
Run Code Online (Sandbox Code Playgroud)

r data.table

12
推荐指数
1
解决办法
572
查看次数

data.table中的内存泄漏按引用分组分配

我在组中使用按组引用分配时看到奇数内存使用情况data.table.这是一个简单的示例(请原谅示例的无关紧要):

N <- 1e6
dt <- data.table(id=round(rnorm(N)), value=rnorm(N))

gc()
for (i in seq(100)) {
  dt[, value := value+1, by="id"]
}
gc()
tables()
Run Code Online (Sandbox Code Playgroud)

产生以下输出:

> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells  303909 16.3     597831 32.0   407500 21.8
Vcells 2442853 18.7    3260814 24.9  2689450 20.6
> for (i in seq(100)) {
  +   dt[, value := value+1, by="id"]
  + }
> gc()
used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   315907  16.9     597831  32.0   407500  21.8 …
Run Code Online (Sandbox Code Playgroud)

r data.table

7
推荐指数
1
解决办法
807
查看次数

标签 统计

data.table ×2

r ×2