如何在numpy中解决这个内存视图错误?

use*_*576 10 python numpy python-3.x

在此代码段train_dataset,test_dataset并且valid_dataset是类型numpy.ndarray.

def check_overlaps(images1, images2):
    images1.flags.writeable=False
    images2.flags.writeable=False
    print(type(images1))
    print(type(images2))
    start = time.clock()
    hash1 = set([hash(image1.data) for image1 in images1])
    hash2 = set([hash(image2.data) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start

r, execTime = check_overlaps(train_dataset, test_dataset)    
print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(train_dataset, valid_dataset)   
print("# overlaps between training and validation sets:", len(r), "execution time:", execTime) 
r, execTime = check_overlaps(valid_dataset, test_dataset) 
print("# overlaps between validation and test sets:", len(r), "execution time:", execTime)
Run Code Online (Sandbox Code Playgroud)

但是这会产生以下错误:(格式化为代码以使其可读!)

ValueError                                Traceback (most recent call last)
<ipython-input-14-337e73a1cb14> in <module>()
     12     return all_overlaps, time.clock()-start
     13 
---> 14 r, execTime = check_overlaps(train_dataset, test_dataset)
     15 print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
     16 r, execTime = check_overlaps(train_dataset, valid_dataset)

<ipython-input-14-337e73a1cb14> in check_overlaps(images1, images2)
      7     print(type(images2))
      8     start = time.clock()
----> 9     hash1 = set([hash(image1.data) for image1 in images1])
     10     hash2 = set([hash(image2.data) for image2 in images2])
     11     all_overlaps = set.intersection(hash1, hash2)

<ipython-input-14-337e73a1cb14> in <listcomp>(.0)
      7     print(type(images2))
      8     start = time.clock()
----> 9     hash1 = set([hash(image1.data) for image1 in images1])
     10     hash2 = set([hash(image2.data) for image2 in images2])
     11     all_overlaps = set.intersection(hash1, hash2)

ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'
Run Code Online (Sandbox Code Playgroud)

现在的问题是我甚至不知道错误意味着什么,更不用说考虑纠正它了.有什么帮助吗?

jot*_*asi 19

问题是你的哈希数组方法只适用于python2.因此,只要您尝试计算,代码就会失败hash(image1.data).错误消息告诉您只支持memoryviews格式的无符号字节('B'),'b'单字节('c')的字节(),并且我没有找到一种方法来从np.ndarray没有复制的情况下获得这样的视图.我提出的唯一方法包括复制数组,这可能在您的应用程序中不可行,具体取决于您的数据量.话虽这么说,你可以尝试将你的功能改为:

def check_overlaps(images1, images2):
    start = time.clock()
    hash1 = set([hash(image1.tobytes()) for image1 in images1])
    hash2 = set([hash(image2.tobytes()) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start
Run Code Online (Sandbox Code Playgroud)

  • 是的你是对的我正在研究python3 +.我使用函数`bytes()`作为:`hash(bytes(image1)))`它完美地工作.谢谢你的帮助.它来自大约200,000个MNIST图像的大型数据集. (2认同)