TypeError：保存.npy文件时，write（）参数必须为str，而不是字节

Question

TypeError：保存.npy文件时，write（）参数必须为str，而不是字节

Atu*_*aji 7 python file-io deep-learning keras

该代码将写入.npy文件，如下所示：

bottleneck_features_train = model.predict_generator(generator, nb_train_samples // batch_size)
np.save(open('bottleneck_features_train.npy', 'w'),bottleneck_features_train)

Run Code Online (Sandbox Code Playgroud)

然后从该文件读取：

def train_top_model():
    train_data = np.load(open('bottleneck_features_train.npy'))

Run Code Online (Sandbox Code Playgroud)

现在我得到一个错误，说：

Found 2000 images belonging to 2 classes.
Traceback (most recent call last):
  File "kerasbottleneck.py", line 103, in <module>
    save_bottlebeck_features()
  File "kerasbottleneck.py", line 69, in save_bottlebeck_features
    np.save(open('bottleneck_features_train.npy', 'w'),bottleneck_features_train)
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 511, in save
    pickle_kwargs=pickle_kwargs)
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/format.py", line 565, in write_array
version)
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/format.py", line 335, in _write_array_header
fp.write(header_prefix)
TypeError: write() argument must be str, not bytes

Run Code Online (Sandbox Code Playgroud)

之后，我尝试将文件模式从“ w”更改为“ wb”。这导致在读取文件时出错：

Found 2000 images belonging to 2 classes.
Found 800 images belonging to 2 classes.
Traceback (most recent call last):
  File "kerasbottleneck.py", line 104, in <module>
    train_top_model()
  File "kerasbottleneck.py", line 82, in train_top_model
    train_data = np.load(open('bottleneck_features_train.npy'))
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 404, in load
magic = fid.read(N)
  File "/opt/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

Run Code Online (Sandbox Code Playgroud)

如何解决此错误？

Answer 1

Mar*_*ers 10

博客文章中的代码针对的是Python 2，在其中写入和读取文件均使用字节串。在Python 3中，您需要以二进制模式打开文件，以进行写入然后再次读取：

np.save(
    open('bottleneck_features_train.npy', 'wb'),
    bottleneck_features_train)

Run Code Online (Sandbox Code Playgroud)

当阅读时：

train_data = np.load(open('bottleneck_features_train.npy', 'rb'))

Run Code Online (Sandbox Code Playgroud)

注意b那里的模式参数中的字符。

我将使用该文件作为上下文管理器，以确保将其完全关闭：

with open('bottleneck_features_train.npy', 'wb') as features_train_file
    np.save(features_train_file, bottleneck_features_train)

Run Code Online (Sandbox Code Playgroud)

和

with open('bottleneck_features_train.npy', 'wb') as features_train_file:
    train_data = np.load(features_train_file)

Run Code Online (Sandbox Code Playgroud)

博客文章中的代码无论如何都应该使用这两个更改，因为在Python 2中，b模式下的标记中没有标记，则文本文件具有特定于平台的换行约定，并且在Windows上，流中的某些字符将具有特定含义（包括引起如果出现EOF特征，则文件看起来比实际要短）。使用二进制数据可能是一个真正的问题。

归档时间：	7 年，6 月前
查看次数：	3726 次
最近记录：	7 年，6 月前