如何导入已手动下载的MNIST数据集?

use*_*099 4 keras

我一直在试验Keras示例,该示例需要导入MNIST数据

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
Run Code Online (Sandbox Code Playgroud)

它生成错误消息,例如 Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno 110] Connection timed out

它应该与我使用的网络环境有关。是否有任何功能或代码可以让我直接导入已手动下载的MNIST数据集?

我尝试了以下方法

import sys
import pickle
import gzip
f = gzip.open('/data/mnist.pkl.gz', 'rb')
  if sys.version_info < (3,):
    data = pickle.load(f)
else:
    data = pickle.load(f, encoding='bytes')
f.close()
import numpy as np
(x_train, _), (x_test, _) = data
Run Code Online (Sandbox Code Playgroud)

然后我得到以下错误信息

Traceback (most recent call last):
File "test.py", line 45, in <module>
(x_train, _), (x_test, _) = data
ValueError: too many values to unpack (expected 2)
Run Code Online (Sandbox Code Playgroud)

gog*_*sca 8

Keras 文件位于 Google Cloud Storage 中的新路径中(之前在 AWS S3 中):

https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Run Code Online (Sandbox Code Playgroud)

使用时:

tf.keras.datasets.mnist.load_data()

你可以传递一个path参数。

load_data()将调用get_file()which作为参数fname,如果路径是完整路径并且文件存在,则不会下载。

例子:

# gsutil cp gs://tensorflow/tf-keras-datasets/mnist.npz /tmp/data/mnist.npz
# python3
>>> import tensorflow as tf
>>> path = '/tmp/data/mnist.npz'
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> len(train_images)
>>> 60000
Run Code Online (Sandbox Code Playgroud)


syg*_*ygi 6

好吧,keras.datasets.mnist文件真的很短。您可以手动模拟相同的操作,即:

  1. https://s3.amazonaws.com/img-datasets/mnist.pkl.gz下载数据集
  2. import gzip
    f = gzip.open('mnist.pkl.gz', 'rb')
    if sys.version_info < (3,):
        data = cPickle.load(f)
    else:
        data = cPickle.load(f, encoding='bytes')
    f.close()
    (x_train, _), (x_test, _) = data
    
    Run Code Online (Sandbox Code Playgroud)


tar*_*dis 6

您不需要额外的代码,但可以告诉load_data首先加载本地版本:

  1. 您可以从具有适当(代理)访问权限的另一台计算机下载文件https://s3.amazonaws.com/img-datasets/mnist.npz(取自https://github.com/keras-team/keras/blob/ master/keras/datasets/mnist.py ),
  2. 将其复制到目录~/.keras/datasets/(在 Linux 和 macOS 上)
  3. load_data(path='mnist.npz')以正确的文件名运行


Sun*_*501 5

  1. 下载文件 https://s3.amazonaws.com/img-datasets/mnist.npz
  2. 移至目录mnist.npz.keras/datasets/
  3. 加载数据

    import keras
    from keras.datasets import mnist
    
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    
    Run Code Online (Sandbox Code Playgroud)