在R中释放内存

Pep*_*iCo 38 memory-management r

我使用了一些变量,但是当它被使用时,我再也不需要它了,所以我需要删除它并释放内存,但函数rm()似乎没有帮助:

memory.size()
30.69
tmp=matrix(rnorm(6e5*20),6e5,20)
memory.size()
207.64
rm(tmp)
memory.size()
207.64
Run Code Online (Sandbox Code Playgroud)

这是否意味着我删除了tmp但内存没有被释放?

Ben*_*Ben 45

我用来gc()在操作之间释放RAM.下面是例子我如何使用它在一个循环,但看到这里的一个更详细的讨论gc()这里的R会议期间更多的内存管理.

# load library
library(topicmodels)

# get data
data("AssociatedPress"))

# set number of topics to start with
k <- 20

# set model options
control_LDA_VEM <-
list(estimate.alpha = TRUE, alpha = 50/k, estimate.beta = TRUE,
verbose = 0, prefix = tempfile(), save = 0, keep = 0,
seed = as.integer(100), nstart = 1, best = TRUE,
var = list(iter.max = 10, tol = 10^-6),
em = list(iter.max = 10, tol = 10^-4),
initialize = "random")


# create the sequence that stores the number of topics to 
# iterate over
sequ <- seq(20, 300, by = 20)

# basic loop to iterate over different topic numbers with gc
# after each run to empty out RAM
lda <- vector(mode='list', length = length(sequ))
for(k in sequ) {
  lda[[k]] <- LDA(AssociatedPress[1:20,], k, method= "VEM", control = control_LDA_VEM)
  gc() # here's where I put the garbage collection to free up memory before the next round of the loop
}

# convert list output to dataframe (suggestions for a simpler method are welcome!)
best.model.logLik <- data.frame(logLik = as.matrix(lapply(lda[sequ], logLik)), ntopic = sequ)

# plot
with(best.model.logLik, plot(ntopic, logLik, type = 'l', xlab="Number of topics", ylab="Log likelihood"))
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

# print ordered dataframe to see which number of topics has the highest log likelihood
(best.model.logLik.sort <- best.model.logLik[order(-as.numeric(best.model.logLik$logLik)), ]) 
    logLik       ntopic
2  -17904.12     40
3  -18105.48     60
1  -18181.84     20
4   -18569.7     80
5  -19736.94    100
6   -21919.6    120
7  -23785.08    140
8  -24914.23    160
9  -25493.76    180
10 -25837.64    200
11 -25964.23    220
12 -26061.01    240
13 -26117.92    260
14 -26149.44    280
15 -26168.91    300
Run Code Online (Sandbox Code Playgroud)

  • 这是R hell第二圈的经典案例.永远不要在循环中生长向量(或对象).`lda < - vector(mode ='list',length = seq)`(`seq`是R中的函数名,最好避免将这些作为对象名,因为它可能导致混淆).如果`k`总是长度为1的整数,你可能想用`[[< - `not` [< - `赋值. (2认同)
  • 在我的计算机上,gc()释放了一些内存,但这并不完美。如果我加载了一个大对象,请对其进行处理,然后将其删除并使用gc(),而不会获得与开始时相同的可用内存。我做的事越多,无法恢复的内存就越大。最后,在使用大objetcs进行许多操作之后,我可能会用光内存。我在Windows 10 x64中,使用16GB的RAM。 (2认同)