eld*_*d-a 5 python memory memory-leaks numpy overflow
循环加载 npz 文件会导致内存溢出(取决于文件列表长度)。
以下似乎都没有帮助
删除将数据存储在文件中的变量。
使用 mmap。
调用 gc.collect() (垃圾收集)。
以下代码应重现该现象:
import numpy as np
# generate a file for the demo
X = np.random.randn(1000,1000)
np.savez('tmp.npz',X=X)
# here come the overflow:
for i in xrange(1000000):
data = np.load('tmp.npz')
data.close() # avoid the "too many files are open" error
Run Code Online (Sandbox Code Playgroud)
在我的实际应用程序中,循环结束了一个文件列表,溢出超过了 24GB 的 RAM!请注意,这是在 ubuntu 11.10 以及 numpy v 1.5.1 和 1.6.0 上尝试过的
我已经在numpy 票证 2048 中提交了一份报告,但这可能引起更广泛的兴趣,因此我也将其发布在这里(此外,我不确定这是一个错误,但可能是我的错误编程造成的)。
命令
del data.f
Run Code Online (Sandbox Code Playgroud)
应该在命令之前
data.close()
Run Code Online (Sandbox Code Playgroud)
有关更多信息和找到解决方案的方法,请阅读下面 HYRY 的友好回答
I think this is a bug, and maybe I found the solution: call "del data.f".
for i in xrange(10000000):
data = np.load('tmp.npz')
del data.f
data.close() # avoid the "too many files are open" error
Run Code Online (Sandbox Code Playgroud)
to found this kind of memory leak. you can use the following code:
import numpy as np
import gc
# here come the overflow:
for i in xrange(10000):
data = np.load('tmp.npz')
data.close() # avoid the "too many files are open" error
d = dict()
for o in gc.get_objects():
name = type(o).__name__
if name not in d:
d[name] = 1
else:
d[name] += 1
items = d.items()
items.sort(key=lambda x:x[1])
for key, value in items:
print key, value
Run Code Online (Sandbox Code Playgroud)
After the test program, I created a dict and count objects in gc.get_objects(). Here is the output:
...
wrapper_descriptor 1382
function 2330
tuple 9117
BagObj 10000
NpzFile 10000
list 20288
dict 21001
Run Code Online (Sandbox Code Playgroud)
From the result we know that there are something wrong with BagObj and NpzFile. Find the code:
class NpzFile(object):
def __init__(self, fid, own_fid=False):
...
self.zip = _zip
self.f = BagObj(self)
if own_fid:
self.fid = fid
else:
self.fid = None
def close(self):
"""
Close the file.
"""
if self.zip is not None:
self.zip.close()
self.zip = None
if self.fid is not None:
self.fid.close()
self.fid = None
def __del__(self):
self.close()
class BagObj(object):
def __init__(self, obj):
self._obj = obj
def __getattribute__(self, key):
try:
return object.__getattribute__(self, '_obj')[key]
except KeyError:
raise AttributeError, key
Run Code Online (Sandbox Code Playgroud)
NpzFile has del(), NpzFile.f is a BagObj, and BagObj._obj is NpzFile, this is a reference cycle and will cause both NpzFile and BagObj uncollectable. Here is some explanation in Python document: http://docs.python.org/library/gc.html#gc.garbage
So, to break the reference cycle, will need to call "del data.f"