rth*_*rth 6 python arrays numpy pickle
我需要组织一个包含命名数据块的数据文件。数据是 NUMPY 个数组。但我不想使用 numpy.save 或 numpy.savez 函数,因为在某些情况下,数据必须通过管道或其他接口在服务器上发送。所以我想将 numpy 数组转储到内存中,压缩它,然后将它发送到服务器。
我试过简单的泡菜,像这样:
try:
import cPickle as pkl
except:
import pickle as pkl
import ziplib
import numpy as np
def send_to_db(data, compress=5):
send( zlib.compress(pkl.dumps(data),compress) )
Run Code Online (Sandbox Code Playgroud)
.. 但这是一个极其缓慢的过程。
即使压缩级别为 0(未压缩),该过程也非常缓慢,并且只是因为酸洗。
有没有办法在没有pickle的情况下将numpy数组转储到字符串中?我知道 numpy 允许获取缓冲区numpy.getbuffer,但对我来说,如何使用这个转储的缓冲区来获取一个数组并不明显。
您绝对应该使用numpy.save,您仍然可以在内存中执行此操作:
>>> import io
>>> import numpy as np
>>> import zlib
>>> f = io.BytesIO()
>>> arr = np.random.rand(100, 100)
>>> np.save(f, arr)
>>> compressed = zlib.compress(f.getvalue())
Run Code Online (Sandbox Code Playgroud)
要解压缩,请反转该过程:
>>> np.load(io.BytesIO(zlib.decompress(compressed)))
array([[ 0.80881898, 0.50553303, 0.03859795, ..., 0.05850996,
0.9174782 , 0.48671767],
[ 0.79715979, 0.81465744, 0.93529834, ..., 0.53577085,
0.59098735, 0.22716425],
[ 0.49570713, 0.09599001, 0.74023709, ..., 0.85172897,
0.05066641, 0.10364143],
...,
[ 0.89720137, 0.60616688, 0.62966729, ..., 0.6206728 ,
0.96160519, 0.69746633],
[ 0.59276237, 0.71586014, 0.35959289, ..., 0.46977027,
0.46586237, 0.10949621],
[ 0.8075795 , 0.70107856, 0.81389246, ..., 0.92068768,
0.38013495, 0.21489793]])
>>>
Run Code Online (Sandbox Code Playgroud)
如您所见,这与我们之前保存的内容相匹配:
>>> arr
array([[ 0.80881898, 0.50553303, 0.03859795, ..., 0.05850996,
0.9174782 , 0.48671767],
[ 0.79715979, 0.81465744, 0.93529834, ..., 0.53577085,
0.59098735, 0.22716425],
[ 0.49570713, 0.09599001, 0.74023709, ..., 0.85172897,
0.05066641, 0.10364143],
...,
[ 0.89720137, 0.60616688, 0.62966729, ..., 0.6206728 ,
0.96160519, 0.69746633],
[ 0.59276237, 0.71586014, 0.35959289, ..., 0.46977027,
0.46586237, 0.10949621],
[ 0.8075795 , 0.70107856, 0.81389246, ..., 0.92068768,
0.38013495, 0.21489793]])
>>>
Run Code Online (Sandbox Code Playgroud)