使用不同大小的h5py数组进行保存

Jos*_*tiz 11 python arrays numpy hdf5 h5py

我试图使用HDF5数据格式存储大约3000个numpy数组.数组长度从5306到121999 np.float64不等

我收到 Object dtype dtype('O') has no native HDF5 equivalent 错误,因为数据的不规则性numpy使用一般对象类.

我的想法是将所有数组填充到121999长度并将大小存储在另一个数据集中.

然而,这似乎在太空中效率很低,有更好的方法吗?

编辑:澄清一下,我想存储3126个数组dtype = np.float64.我将它们存储在a中,list并且当h5py执行例程时它将转换为数组,dtype = object因为它们的长度不同.为了说明它:

a = np.array([0.1,0.2,0.3],dtype=np.float64)
b = np.array([0.1,0.2,0.3,0.4,0.5],dtype=np.float64)
c = np.array([0.1,0.2],dtype=np.float64)

arrs = np.array([a,b,c]) # This is performed inside the h5py call
print(arrs.dtype)
>>> object
print(arrs[0].dtype)
>>> float64
Run Code Online (Sandbox Code Playgroud)

hpa*_*ulj 17

看起来你尝试过类似的东西:

In [364]: f=h5py.File('test.hdf5','w')    
In [365]: grp=f.create_group('alist')

In [366]: grp.create_dataset('alist',data=[a,b,c])
...
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
Run Code Online (Sandbox Code Playgroud)

但是,如果您将数组保存为单独的数据集,它可以工作:

In [367]: adict=dict(a=a,b=b,c=c)

In [368]: for k,v in adict.items():
    grp.create_dataset(k,data=v)
   .....:     

In [369]: grp
Out[369]: <HDF5 group "/alist" (3 members)>

In [370]: grp['a'][:]
Out[370]: array([ 0.1,  0.2,  0.3])
Run Code Online (Sandbox Code Playgroud)

并访问组中的所有数据集:

In [389]: [i[:] for i in grp.values()]
Out[389]: 
[array([ 0.1,  0.2,  0.3]),
 array([ 0.1,  0.2,  0.3,  0.4,  0.5]),
 array([ 0.1,  0.2])]
Run Code Online (Sandbox Code Playgroud)


Jos*_*Lim 5

Clean method for variable length internal arrays: http://docs.h5py.org/en/latest/special.html?highlight=dtype#arbitrary-vlen-data

hdf5_file = h5py.File('yourdataset.hdf5', mode='w')
dt = h5py.special_dtype(vlen=np.dtype('float64'))
hdf5_file.create_dataset('dataset', (3,), dtype=dt)
hdf5_file['dataset'][...] = arrs

print (hdf5_file['dataset'][...])
>>>array([array([0.1,0.2,0.3],dtype=np.float64), 
>>>array([0.1,0.2,0.3,0.4,0.5],dtype=np.float64, 
>>>array([0.1,0.2],dtype=np.float64], dtype=object)
Run Code Online (Sandbox Code Playgroud)

Only works for 1D arrays, https://github.com/h5py/h5py/issues/876