Keras 在训练分类 LSTM 序列到序列模型时给出 nan

Question

Keras 在训练分类 LSTM 序列到序列模型时给出 nan

W.P*_*ill 5 python machine-learning keras

我正在尝试编写一个 Keras 模型（使用 Tensorflow 后端），该模型使用 LSTM 来预测序列的标签，就像在词性标记任务中一样。我写的模型返回nan所有训练时期和所有标签预测的损失。我怀疑我的模型配置不正确，但我无法弄清楚我做错了什么。

\n\n

完整的程序在这里。

\n\n

from random import shuffle, sample\nfrom typing import Tuple, Callable\n\nfrom numpy import arange, zeros, array, argmax, newaxis\n\n\ndef sequence_to_sequence_model(time_steps: int, labels: int, units: int = 16):\n    from keras import Sequential\n    from keras.layers import LSTM, TimeDistributed, Dense\n\n    model = Sequential()\n    model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))\n    model.add(TimeDistributed(Dense(labels)))\n    model.compile(loss=\'categorical_crossentropy\', optimizer=\'adam\')\n    return model\n\n\ndef labeled_sequences(n: int, sequence_sampler: Callable[[], Tuple[array, array]]) -> Tuple[array, array]:\n    """\n    Create training data for a sequence-to-sequence labeling model.\n\n    The features are an array of size samples * time steps * 1.\n    The labels are a one-hot encoding of time step labels of size samples * time steps * number of labels.\n\n    :param n: number of sequence pairs to generate\n    :param sequence_sampler: a function that returns two numeric sequences of equal length\n    :return: feature and label sequences\n    """\n    from keras.utils import to_categorical\n\n    xs, ys = sequence_sampler()\n    assert len(xs) == len(ys)\n    x = zeros((n, len(xs)), int)\n    y = zeros((n, len(ys)), int)\n    for i in range(n):\n        xs, ys = sequence_sampler()\n        x[i] = xs\n        y[i] = ys\n    x = x[:, :, newaxis]\n    y = to_categorical(y)\n    return x, y\n\n\ndef digits_with_repetition_labels() -> Tuple[array, array]:\n    """\n    Return a random list of 10 digits from 0 to 9. Two of the digits will be repeated. The rest will be unique.\n    Along with this list, return a list of 10 labels, where the label is 0 if the corresponding digits is unique and 1\n    if it is repeated.\n\n    :return: digits and labels\n    """\n    n = 10\n    xs = arange(n)\n    ys = zeros(n, int)\n    shuffle(xs)\n    i, j = sample(range(n), 2)\n    xs[j] = xs[i]\n    ys[i] = ys[j] = 1\n    return xs, ys\n\n\ndef main():\n    # Train\n    x, y = labeled_sequences(1000, digits_with_repetition_labels)\n    model = sequence_to_sequence_model(x.shape[1], y.shape[2])\n    model.summary()\n    model.fit(x, y, epochs=20, verbose=2)\n    # Test\n    x, y = labeled_sequences(5, digits_with_repetition_labels)\n    y_ = model.predict(x, verbose=0)\n    x = x[:, :, 0]\n    for i in range(x.shape[0]):\n        print(\' \'.join(str(n) for n in x[i]))\n        print(\' \'.join([\' \', \'*\'][int(argmax(n))] for n in y[i]))\n        print(y_[i])\n\n\nif __name__ == \'__main__\':\n    main()\n

Run Code Online (Sandbox Code Playgroud)\n\n

我的特征序列是由 0 到 9 的 10 个数字组成的数组。我对应的标签序列是由 10 个零和 1 组成的数组，其中 0 表示唯一的数字，1 表示重复的数字。（这个想法是创建一个包含长距离依赖关系的简单分类任务。）

\n\n

训练看起来像这样

\n\n

Epoch 1/20\n - 1s - loss: nan\nEpoch 2/20\n - 0s - loss: nan\nEpoch 3/20\n - 0s - loss: nan\n

Run Code Online (Sandbox Code Playgroud)\n\n

所有标签数组预测看起来都是这样的

\n\n

[[nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]]\n

Run Code Online (Sandbox Code Playgroud)\n\n

很明显有些事情是错误的。

\n\n

传递到的特征矩阵model.fit的维度为samples\xc3\x97 time steps\xc3\x97 1。标签矩阵的维数为samples\xc3\x97 time steps\xc3\x972，其中 2 来自标签 0 和 1 的 one-hot 编码。

\n\n

我正在使用时间分布的密集层来预测序列，遵循 Keras 文档和类似这样的帖子。据我所知，模型拓扑定义在sequence_to_sequence_model是正确的。模型摘要如下所示

\n\n

_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nlstm_1 (LSTM)                (None, 10, 16)            1152      \n_________________________________________________________________\ntime_distributed_1 (TimeDist (None, 10, 2)             34        \n=================================================================\nTotal params: 1,186\nTrainable params: 1,186\nNon-trainable params: 0\n_________________________________________________________________\n

Run Code Online (Sandbox Code Playgroud)\n\n

像这样的Stack Overflow 问题听起来像是nan结果是数字问题的指标：梯度失控等等。然而，由于我正在处理一个很小的数据集，并且从我的模型返回的每个数字都是 a nan，所以我怀疑我没有看到数字问题，而是看到了我如何构建模型的问题。

\n\n

上面的代码是否具有用于序列到序列学习的正确模型/数据形状？如果是这样，为什么我会得到nans？

\n

Answer 1

W.P*_*ill 1

默认情况下，该Dense层没有激活。如果您指定一个，则nans 消失。更改上面代码中的以下行。

model.add(TimeDistributed(Dense(labels, activation='softmax')))

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	2619 次
最近记录：	2 年，9 月前