numpy ndarray可靠性

Question

numpy ndarray可靠性

我有一些问题,了解如何管理numpy对象的可用性.

>>> import numpy as np
>>> class Vector(np.ndarray):
...     pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True

Run Code Online (Sandbox Code Playgroud)

怎么会

numpy对象定义一个__hash__方法,但不能清除
一个类派生numpy.ndarray定义__hash__和可以清除？

我错过了什么吗？

我正在使用Python 2.7.1和numpy 1.6.1

谢谢你的帮助!

编辑:添加对象ids

EDIT2:继deinonychusaur发表评论并试图弄清楚是否基于内容进行散列,我玩了numpy.nparray.dtype并且有一些我觉得很奇怪的东西:

>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]

Run Code Online (Sandbox Code Playgroud)

我很困惑...... numpy中有一些(类型独立)缓存机制吗？

Answer 1

Jam*_*mes 8

我在Python 2.6.6和numpy 1.3.0中得到了相同的结果.根据Python的词汇表,如果一个对象应哈希的__hash__是定义(并且不是None),以及任一__eq__或__cmp__定义. ndarray.__eq__并且ndarray.__hash__都被定义并返回有意义的东西,所以我不明白为什么hash要失败.快速谷歌之后,我在python.scientific.devel邮件列表上发现了这篇文章,该文章指出数组从来没有打算用于哈希 - 所以为什么ndarray.__hash__定义,我不知道.请注意isinstance(nparray, collections.Hashable)返回True.

编辑:注意nparray.__hash__()返回相同id(nparray),所以这只是默认实现.也许很难或不可能删除__hash__早期版本的python中的实现(该__hash__ = None技术显然是在2.6中引入的),所以他们使用某种C API魔法以不会传播到子类的方式实现这一点,并且不会阻止你ndarray.__hash__明确打电话吗？

Python 3.2.2和repo中目前的numpy 2.0.0有所不同.该__cmp__方法已不存在,所以hashability现在需要__hash__和__eq__(见Python 3的词汇表).在这个版本的numpy中,ndarray.__hash__定义了,但它只是None,所以无法调用. hash(nparray)失败并按预期isinstance(nparray, collections.Hashable)返回False. hash(vector)也失败了.

归档时间：	13 年，6 月前
查看次数：	7189 次
最近记录：	13 年，6 月前