增量附加 numpy.arrays 到保存文件

Question

增量附加 numpy.arrays 到保存文件

我已经尝试过 Hpaulji 概述的这种方法，但它似乎不起作用：

基本上，我正在迭代生成器，对数组进行一些更改，然后尝试保存每次迭代的数组。

我的示例代码如下所示：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

Run Code Online (Sandbox Code Playgroud)

在这里，我要进行 5 次迭代，因此我希望保存5 个不同的数组。

我打印出了每个数组的一部分，用于调试目的：

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

Run Code Online (Sandbox Code Playgroud)

但是，当我尝试加载数组时，如此处所述多次加载数组，如何将许多 numpy 文件添加到 python 中的一个 numpy 文件中，我收到一个 EOFERROR：

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

Run Code Online (Sandbox Code Playgroud)

它只输出最后一个数组，然后输出 EOFERROR：

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

Run Code Online (Sandbox Code Playgroud)

我原本期望保存所有 5 个数组，但是当我多次加载 save .npy 文件时，我只得到最后一个数组。

那么，我应该如何保存新数组并将其附加到文件中？

编辑：使用“.npz”进行测试仅保存最后一个数组

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']

Run Code Online (Sandbox Code Playgroud)

Answer 1

YSe*_*elf 3

所有调用都np.save使用文件名，而不是文件句柄。\n由于您不重复使用文件句柄，因此每次保存都会覆盖该文件，而不是向其附加数组。

\n\n

这应该有效：

\n\n

filename = \'testing.npy\'\n\nwith open(filename, \'wb\') as f:\n    for x, _ in train_generator:\n        prediction = base_model.predict(x)\n        print(prediction[0,0,0,0:5])\n        np.save(f, prediction)\n\n        current_iteration += 1\n    if current_iteration == 5:\n        break\n

Run Code Online (Sandbox Code Playgroud)\n\n

虽然将多个数组存储在一个数组中可能有一些优点.npy文件中存储多个数组可能有优势（我想在内存有限的情况下会有优势），但从技术上讲，它们只是存储一个数组，并且您可以使用.npzfiles (np.savez或np.savez_compressed) 来存储多个数组：

\n\n

filename = \'testing.npz\'\npredictions = []\nfor (x, _), index in zip(train_generator, range(5)):\n    prediction = base_model.predict(x)\n    predictions.append(prediction)\nnp.savez(filename, predictions) # will name it arr_0\n# np.savez(filename, predictions=predictions) # would name it predictions\n# np.savez(filename, *predictions) # would name it arr_0, arr_1, \xe2\x80\xa6, arr_4\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	8 年，3 月前
查看次数：	7706 次
最近记录：	8 年，3 月前