Joe*_*lip 9 python arrays numpy
我有一个三个大型阵列的Numpy ndarray,我只想将路径存储到某处生成数据的文件中.一些玩具数据:
A = array([[ 6.52479351e-01, 6.54686928e-01, 6.56884432e-01, ...,
2.55901861e+00, 2.56199503e+00, 2.56498647e+00],
[ nan, nan, 9.37914686e-17, ...,
1.01366425e-16, 3.20371075e-16, -6.33655223e-17],
[ nan, nan, 8.52057308e-17, ...,
4.26943463e-16, 1.51422386e-16, 1.55097437e-16]],
dtype=float32)
Run Code Online (Sandbox Code Playgroud)
我不能将它作为数组附加到ndarray,因为它需要与其他三个相同的长度.
我可以添加np.zeros(len(A[0]))并将第一个值作为字符串,以便我可以用A [-1] [0]检索它,但这看起来很荒谬.
是否有一些元数据键我可以用来存储字符串,/Documents/Data/foobar.txt'所以我可以用类似的东西检索它A.metadata.comment?
谢谢!
小智 7
TobiasR 的评论是最简单的方法,但您也可以继承 ndarray。请参阅numpy 文档或此问题
class MetaArray(np.ndarray):
"""Array with metadata."""
def __new__(cls, array, dtype=None, order=None, **kwargs):
obj = np.asarray(array, dtype=dtype, order=order).view(cls)
obj.metadata = kwargs
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.metadata = getattr(obj, 'metadata', None)
Run Code Online (Sandbox Code Playgroud)
用法示例:
>>> a = MetaArray([1,2,3], comment='/Documents/Data/foobar.txt')
>>> a.metadata
{'comment': '/Documents/Data/foobar.txt'}
Run Code Online (Sandbox Code Playgroud)
听起来您可能有兴趣以持久的方式将元数据与数组一起存储。如果是这样,HDF5是用作存储容器的绝佳选择。
例如,让我们创建一个数组并将其保存到带有一些元数据的HDF文件中h5py:
import numpy as np
import h5py
some_data = np.random.random((100, 100))
with h5py.File('data.hdf', 'w') as outfile:
dataset = outfile.create_dataset('my data', data=some_data)
dataset.attrs['an arbitrary key'] = 'arbitrary values'
dataset.attrs['foo'] = 10.2
Run Code Online (Sandbox Code Playgroud)
然后,我们可以将其读回:
import h5py
with h5py.File('data.hdf', 'r') as infile:
dataset = infile['my data']
some_data = dataset[...] # Load it into memory. Could also slice a subset.
print dataset.attrs['an arbitrary key']
print dataset.attrs['foo']
Run Code Online (Sandbox Code Playgroud)
正如其他人提到的那样,如果只考虑将数据+元数据存储在内存中,则更好的选择是一个dict或简单的包装器类。例如:
class Container:
def __init__(self, data, **kwargs):
self.data = data
self.metadata = kwargs
Run Code Online (Sandbox Code Playgroud)
当然,这不会像numpy数组那样直接运行,但是subclass通常不是一个好主意ndarrays。(您可以,但是容易做错。您最好总是设计一个将数组存储为属性的类。)
更好的是,使您正在执行的所有操作都与上述示例具有相似的类。例如:
import scipy.signal
import numpy as np
class SeismicCube(object):
def __init__(self, data, bounds, metadata=None):
self.data = data
self.x0, self.x1, self.y0, self.y1, self.z0, self.z1= bounds
self.bounds = bounds
self.metadata = {} if metadata is None else metadata
def inside(self, x, y, z):
"""Test if a point is inside the cube."""
inx = self.x0 >= x >= self.x1
iny = self.y0 >= y >= self.y1
inz = self.z0 >= z >= self.z1
return inx and iny and inz
def inst_amp(self):
"""Calculate instantaneous amplitude and return a new SeismicCube."""
hilb = scipy.signal.hilbert(self.data, axis=2)
data = np.hypot(hilb.real, hilb.imag)
return type(self)(data, self.bounds, self.metadata)
Run Code Online (Sandbox Code Playgroud)