ddn*_*ddn 31 python serialization pickle
我正在运行创建大型对象的代码,包含多个用户定义的类,然后我必须将其序列化以供以后使用.据我所知,只有酸洗才能满足我的要求.我一直在使用cPickle存储它们,但它生成的对象大小约为40G,代码运行在500 MB的内存中.序列化的速度不是问题,但对象的大小是.是否有任何提示或替代流程可以使泡菜变小?
Joh*_*yon 42
您可以将cPickle dump
调用与zipfile 结合使用:
import cPickle
import gzip
def save_zipped_pickle(obj, filename, protocol=-1):
with gzip.open(filename, 'wb') as f:
cPickle.dump(obj, f, protocol)
Run Code Online (Sandbox Code Playgroud)
并重新加载拉链腌制对象:
def load_zipped_pickle(filename):
with gzip.open(filename, 'rb') as f:
loaded_object = cPickle.load(f)
return loaded_object
Run Code Online (Sandbox Code Playgroud)
Vik*_*kez 42
如果你必须使用pickle,没有其他的序列化方法适合你,你可以随时管理pickle bzip2
.唯一的问题是bzip2
有点慢...... gzip
应该更快,但文件大小几乎要大2倍:
In [1]: class Test(object):
def __init__(self):
self.x = 3841984789317471348934788731984731749374
self.y = 'kdjsaflkjda;sjfkdjsf;klsdjakfjdafjdskfl;adsjfl;dasjf;ljfdlf'
l = [Test() for i in range(1000000)]
In [2]: import cPickle as pickle
with open('test.pickle', 'wb') as f:
pickle.dump(l, f)
!ls -lh test.pickle
-rw-r--r-- 1 viktor staff 88M Aug 27 22:45 test.pickle
In [3]: import bz2
import cPickle as pickle
with bz2.BZ2File('test.pbz2', 'w') as f:
pickle.dump(l, f)
!ls -lh test.pbz2
-rw-r--r-- 1 viktor staff 2.3M Aug 27 22:47 test.pbz2
In [4]: import gzip
import cPickle as pickle
with gzip.GzipFile('test.pgz', 'w') as f:
pickle.dump(l, f)
!ls -lh test.pgz
-rw-r--r-- 1 viktor staff 4.8M Aug 27 22:51 test.pgz
Run Code Online (Sandbox Code Playgroud)
所以我们看到它的文件大小bzip2
几乎减小了40 gzip
倍,小了20倍.gzip与原始cPickle的性能非常接近,你可以看到:
cPickle : best of 3: 18.9 s per loop
bzip2 : best of 3: 54.6 s per loop
gzip : best of 3: 24.4 s per loop
Run Code Online (Sandbox Code Playgroud)