Ada*_*hes 7 python encoding serialization json numpy
我正在尝试JSON编码一个复杂的numpy数组,我发现了一个来自astropy的实用程序(http://astropy.readthedocs.org/en/latest/_modules/astropy/utils/misc.html#JsonCustomEncoder)目的:
import numpy as np
class JsonCustomEncoder(json.JSONEncoder):
""" <cropped for brevity> """
def default(self, obj):
if isinstance(obj, (np.ndarray, np.number)):
return obj.tolist()
elif isinstance(obj, (complex, np.complex)):
return [obj.real, obj.imag]
elif isinstance(obj, set):
return list(obj)
elif isinstance(obj, bytes): # pragma: py3
return obj.decode()
return json.JSONEncoder.default(self, obj)
Run Code Online (Sandbox Code Playgroud)
这适用于复杂的numpy数组:
test = {'some_key':np.array([1+1j,2+5j, 3-4j])}
Run Code Online (Sandbox Code Playgroud)
倾销收益率:
encoded = json.dumps(test, cls=JsonCustomEncoder)
print encoded
>>> {"some key": [[1.0, 1.0], [2.0, 5.0], [3.0, -4.0]]}
Run Code Online (Sandbox Code Playgroud)
问题是,我无法自动将其读回复杂数组.例如:
json.loads(encoded)
>>> {"some_key": [[1.0, 1.0], [2.0, 5.0], [3.0, -4.0]]}
Run Code Online (Sandbox Code Playgroud)
你能帮助我找出覆盖加载/解码的方法,以便推断它必须是一个复杂的数组吗?IE而不是2元素项的列表,它应该只是将它们放回到复杂的数组中.JsonCustomDecoder没有default()覆盖的方法,编码文档对我来说有太多的术语.
这是我根据hpaulj的回答以及他对此线程的回答改编而成的最终解决方案:https ://stackoverflow.com/a/24375113/901925
这将对嵌套在任何数据类型的字典中任意深度的数组进行编码/解码。
import base64
import json
import numpy as np
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
"""
if input object is a ndarray it will be converted into a dict holding dtype, shape and the data base64 encoded
"""
if isinstance(obj, np.ndarray):
data_b64 = base64.b64encode(obj.data)
return dict(__ndarray__=data_b64,
dtype=str(obj.dtype),
shape=obj.shape)
# Let the base class default method raise the TypeError
return json.JSONEncoder(self, obj)
def json_numpy_obj_hook(dct):
"""
Decodes a previously encoded numpy ndarray
with proper shape and dtype
:param dct: (dict) json encoded ndarray
:return: (ndarray) if input was an encoded ndarray
"""
if isinstance(dct, dict) and '__ndarray__' in dct:
data = base64.b64decode(dct['__ndarray__'])
return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
return dct
# Overload dump/load to default use this behavior.
def dumps(*args, **kwargs):
kwargs.setdefault('cls', NumpyEncoder)
return json.dumps(*args, **kwargs)
def loads(*args, **kwargs):
kwargs.setdefault('object_hook', json_numpy_obj_hook)
return json.loads(*args, **kwargs)
def dump(*args, **kwargs):
kwargs.setdefault('cls', NumpyEncoder)
return json.dump(*args, **kwargs)
def load(*args, **kwargs):
kwargs.setdefault('object_hook', json_numpy_obj_hook)
return json.load(*args, **kwargs)
if __name__ == '__main__':
data = np.arange(3, dtype=np.complex)
one_level = {'level1': data, 'foo':'bar'}
two_level = {'level2': one_level}
dumped = dumps(two_level)
result = loads(dumped)
print '\noriginal data', data
print '\nnested dict of dict complex array', two_level
print '\ndecoded nested data', result
Run Code Online (Sandbox Code Playgroud)
产生输出:
original data [ 0.+0.j 1.+0.j 2.+0.j]
nested dict of dict complex array {'level2': {'level1': array([ 0.+0.j, 1.+0.j, 2.+0.j]), 'foo': 'bar'}}
decoded nested data {u'level2': {u'level1': array([ 0.+0.j, 1.+0.j, 2.+0.j]), u'foo': u'bar'}}
Run Code Online (Sandbox Code Playgroud)
该接受的答案是伟大的,但有一个缺陷。仅当您的数据为C_CONTIGUOUS时才有效。如果转置数据,那将不是事实。例如,测试以下内容:
A = np.arange(10).reshape(2,5)
A.flags
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : False
# WRITEABLE : True
# ALIGNED : True
# UPDATEIFCOPY : False
A = A.transpose()
#array([[0, 5],
# [1, 6],
# [2, 7],
# [3, 8],
# [4, 9]])
loads(dumps(A))
#array([[0, 1],
# [2, 3],
# [4, 5],
# [6, 7],
# [8, 9]])
A.flags
# C_CONTIGUOUS : False
# F_CONTIGUOUS : True
# OWNDATA : False
# WRITEABLE : True
# ALIGNED : True
# UPDATEIFCOPY : False
Run Code Online (Sandbox Code Playgroud)
要解决此问题,请将对象传递给b64encode时使用“ np.ascontiguousarray()”。具体来说,更改:
data_b64 = base64.b64encode(obj.data)
Run Code Online (Sandbox Code Playgroud)
至:
data_b64 = base64.b64encode(np.ascontiguousarray(obj).data)
Run Code Online (Sandbox Code Playgroud)
如果我对函数的理解正确,那么如果您的数据已经为C_CONTIGUOUS,则不会执行任何操作,因此,唯一的性能降低就是您拥有F_CONTIGUOUS数据。