查找保存的numpy数组(.npy或.npz)的形状而无需加载到内存中

pir*_*pir 5 python io numpy

我有一个巨大的压缩numpy数组保存到磁盘(内存中约20gb,压缩后更少)。我需要知道此数组的形状,但是我没有可用的内存来加载它。如何在不将numpy数组加载到内存的情况下找到其形状?

hpa*_*ulj 7

打开文件mmap_mode可能会解决问题。

    If not None, then memory-map the file, using the given mode
    (see `numpy.memmap` for a detailed description of the modes).
    A memory-mapped array is kept on disk. However, it can be accessed
    and sliced like any ndarray.  Memory mapping is especially useful for
    accessing small fragments of large files without reading the entire
    file into memory.
Run Code Online (Sandbox Code Playgroud)

也可以在不读取数据缓冲区的情况下读取头块,但这需要深入挖掘底层lib/npyio/format代码。我在最近关于在单个文件中存储多个数组(并读取它们)的 SO 问题中对此进行了探讨。

/sf/answers/2502690991/

  • 的确。我已经在这里的答案中实现了它。 (2认同)

Joh*_*nck 7

这样做:

import numpy as np
import zipfile

def npz_headers(npz):
    """Takes a path to an .npz file, which is a Zip archive of .npy files.
    Generates a sequence of (name, shape, np.dtype).
    """
    with zipfile.ZipFile(npz) as archive:
        for name in archive.namelist():
            if not name.endswith('.npy'):
                continue

            npy = archive.open(name)
            version = np.lib.format.read_magic(npy)
            shape, fortran, dtype = np.lib.format._read_array_header(npy, version)
            yield name[:-4], shape, dtype
Run Code Online (Sandbox Code Playgroud)