Keras 在训练分类 LSTM 序列到序列模型时给出 nan

W.P*_*ill 5 python machine-learning keras

我正在尝试编写一个 Keras 模型(使用 Tensorflow 后端),该模型使用 LSTM 来预测序列的标签,就像在词性标记任务中一样。我写的模型返回nan所有​​训练时期和所有标签预测的损失。我怀疑我的模型配置不正确,但我无法弄清楚我做错了什么。

\n\n

完整的程序在这里。

\n\n
from random import shuffle, sample\nfrom typing import Tuple, Callable\n\nfrom numpy import arange, zeros, array, argmax, newaxis\n\n\ndef sequence_to_sequence_model(time_steps: int, labels: int, units: int = 16):\n    from keras import Sequential\n    from keras.layers import LSTM, TimeDistributed, Dense\n\n    model = Sequential()\n    model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))\n    model.add(TimeDistributed(Dense(labels)))\n    model.compile(loss=\'categorical_crossentropy\', optimizer=\'adam\')\n    return model\n\n\ndef labeled_sequences(n: int, sequence_sampler: Callable[[], Tuple[array, array]]) -> Tuple[array, array]:\n    """\n    Create training data for a sequence-to-sequence labeling model.\n\n    The features are an array of size samples * time steps * 1.\n    The labels are a one-hot encoding of time step labels of size samples * time steps * number of labels.\n\n    :param n: number of sequence pairs to generate\n    :param sequence_sampler: a function that returns two numeric sequences of equal length\n    :return: feature and label sequences\n    """\n    from keras.utils import to_categorical\n\n    xs, ys = sequence_sampler()\n    assert len(xs) == len(ys)\n    x = zeros((n, len(xs)), int)\n    y = zeros((n, len(ys)), int)\n    for i in range(n):\n        xs, ys = sequence_sampler()\n        x[i] = xs\n        y[i] = ys\n    x = x[:, :, newaxis]\n    y = to_categorical(y)\n    return x, y\n\n\ndef digits_with_repetition_labels() -> Tuple[array, array]:\n    """\n    Return a random list of 10 digits from 0 to 9. Two of the digits will be repeated. The rest will be unique.\n    Along with this list, return a list of 10 labels, where the label is 0 if the corresponding digits is unique and 1\n    if it is repeated.\n\n    :return: digits and labels\n    """\n    n = 10\n    xs = arange(n)\n    ys = zeros(n, int)\n    shuffle(xs)\n    i, j = sample(range(n), 2)\n    xs[j] = xs[i]\n    ys[i] = ys[j] = 1\n    return xs, ys\n\n\ndef main():\n    # Train\n    x, y = labeled_sequences(1000, digits_with_repetition_labels)\n    model = sequence_to_sequence_model(x.shape[1], y.shape[2])\n    model.summary()\n    model.fit(x, y, epochs=20, verbose=2)\n    # Test\n    x, y = labeled_sequences(5, digits_with_repetition_labels)\n    y_ = model.predict(x, verbose=0)\n    x = x[:, :, 0]\n    for i in range(x.shape[0]):\n        print(\' \'.join(str(n) for n in x[i]))\n        print(\' \'.join([\' \', \'*\'][int(argmax(n))] for n in y[i]))\n        print(y_[i])\n\n\nif __name__ == \'__main__\':\n    main()\n
Run Code Online (Sandbox Code Playgroud)\n\n

我的特征序列是由 0 到 9 的 10 个数字组成的数组。我对应的标签序列是由 10 个零和 1 组成的数组,其中 0 表示唯一的数字,1 表示重复的数字。(这个想法是创建一个包含长距离依赖关系的简单分类任务。)

\n\n

训练看起来像这样

\n\n
Epoch 1/20\n - 1s - loss: nan\nEpoch 2/20\n - 0s - loss: nan\nEpoch 3/20\n - 0s - loss: nan\n
Run Code Online (Sandbox Code Playgroud)\n\n

所有标签数组预测看起来都是这样的

\n\n
[[nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]]\n
Run Code Online (Sandbox Code Playgroud)\n\n

很明显有些事情是错误的。

\n\n

传递到的特征矩阵model.fit的维度为samples\xc3\x97 time steps\xc3\x97 1。标签矩阵的维数为samples\xc3\x97 time steps\xc3\x972,其中 2 来自标签 0 和 1 的 one-hot 编码。

\n\n

我正在使用时间分布的密集层来预测序列,遵循 Keras 文档和类似这样帖子。据我所知,模型拓扑定义在sequence_to_sequence_model是正确的。模型摘要如下所示

\n\n
_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nlstm_1 (LSTM)                (None, 10, 16)            1152      \n_________________________________________________________________\ntime_distributed_1 (TimeDist (None, 10, 2)             34        \n=================================================================\nTotal params: 1,186\nTrainable params: 1,186\nNon-trainable params: 0\n_________________________________________________________________\n
Run Code Online (Sandbox Code Playgroud)\n\n

像这样的Stack Overflow 问题听起来像是nan结果是数字问题的指标:梯度失控等等。然而,由于我正在处理一个很小的数据集,并且从我的模型返回的每个数字都是 a nan,所以我怀疑我没有看到数字问题,而是看到了我如何构建模型的问题。

\n\n

上面的代码是否具有用于序列到序列学习的正确模型/数据形状?如果是这样,为什么我会得到nans?

\n

W.P*_*ill 1

默认情况下,该Dense层没有激活。如果您指定一个,则nans 消失。更改上面代码中的以下行。

model.add(TimeDistributed(Dense(labels, activation='softmax')))
Run Code Online (Sandbox Code Playgroud)