W.P*_*ill 5 python machine-learning keras
我正在尝试编写一个 Keras 模型(使用 Tensorflow 后端),该模型使用 LSTM 来预测序列的标签,就像在词性标记任务中一样。我写的模型返回nan
所有训练时期和所有标签预测的损失。我怀疑我的模型配置不正确,但我无法弄清楚我做错了什么。
完整的程序在这里。
\n\nfrom random import shuffle, sample\nfrom typing import Tuple, Callable\n\nfrom numpy import arange, zeros, array, argmax, newaxis\n\n\ndef sequence_to_sequence_model(time_steps: int, labels: int, units: int = 16):\n from keras import Sequential\n from keras.layers import LSTM, TimeDistributed, Dense\n\n model = Sequential()\n model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))\n model.add(TimeDistributed(Dense(labels)))\n model.compile(loss=\'categorical_crossentropy\', optimizer=\'adam\')\n return model\n\n\ndef labeled_sequences(n: int, sequence_sampler: Callable[[], Tuple[array, array]]) -> Tuple[array, array]:\n """\n Create training data for a sequence-to-sequence labeling model.\n\n The features are an array of size samples * time steps * 1.\n The labels are a one-hot encoding of time step labels of size samples * time steps * number of labels.\n\n :param n: number of sequence pairs to generate\n :param sequence_sampler: a function that returns two numeric sequences of equal length\n :return: feature and label sequences\n """\n from keras.utils import to_categorical\n\n xs, ys = sequence_sampler()\n assert len(xs) == len(ys)\n x = zeros((n, len(xs)), int)\n y = zeros((n, len(ys)), int)\n for i in range(n):\n xs, ys = sequence_sampler()\n x[i] = xs\n y[i] = ys\n x = x[:, :, newaxis]\n y = to_categorical(y)\n return x, y\n\n\ndef digits_with_repetition_labels() -> Tuple[array, array]:\n """\n Return a random list of 10 digits from 0 to 9. Two of the digits will be repeated. The rest will be unique.\n Along with this list, return a list of 10 labels, where the label is 0 if the corresponding digits is unique and 1\n if it is repeated.\n\n :return: digits and labels\n """\n n = 10\n xs = arange(n)\n ys = zeros(n, int)\n shuffle(xs)\n i, j = sample(range(n), 2)\n xs[j] = xs[i]\n ys[i] = ys[j] = 1\n return xs, ys\n\n\ndef main():\n # Train\n x, y = labeled_sequences(1000, digits_with_repetition_labels)\n model = sequence_to_sequence_model(x.shape[1], y.shape[2])\n model.summary()\n model.fit(x, y, epochs=20, verbose=2)\n # Test\n x, y = labeled_sequences(5, digits_with_repetition_labels)\n y_ = model.predict(x, verbose=0)\n x = x[:, :, 0]\n for i in range(x.shape[0]):\n print(\' \'.join(str(n) for n in x[i]))\n print(\' \'.join([\' \', \'*\'][int(argmax(n))] for n in y[i]))\n print(y_[i])\n\n\nif __name__ == \'__main__\':\n main()\n
Run Code Online (Sandbox Code Playgroud)\n\n我的特征序列是由 0 到 9 的 10 个数字组成的数组。我对应的标签序列是由 10 个零和 1 组成的数组,其中 0 表示唯一的数字,1 表示重复的数字。(这个想法是创建一个包含长距离依赖关系的简单分类任务。)
\n\n训练看起来像这样
\n\nEpoch 1/20\n - 1s - loss: nan\nEpoch 2/20\n - 0s - loss: nan\nEpoch 3/20\n - 0s - loss: nan\n
Run Code Online (Sandbox Code Playgroud)\n\n所有标签数组预测看起来都是这样的
\n\n[[nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]\n [nan nan]]\n
Run Code Online (Sandbox Code Playgroud)\n\n很明显有些事情是错误的。
\n\n传递到的特征矩阵model.fit
的维度为samples
\xc3\x97 time steps
\xc3\x97 1
。标签矩阵的维数为samples
\xc3\x97 time steps
\xc3\x972
,其中 2 来自标签 0 和 1 的 one-hot 编码。
我正在使用时间分布的密集层来预测序列,遵循 Keras 文档和类似这样的帖子。据我所知,模型拓扑定义在sequence_to_sequence_model
是正确的。模型摘要如下所示
_________________________________________________________________\nLayer (type) Output Shape Param # \n=================================================================\nlstm_1 (LSTM) (None, 10, 16) 1152 \n_________________________________________________________________\ntime_distributed_1 (TimeDist (None, 10, 2) 34 \n=================================================================\nTotal params: 1,186\nTrainable params: 1,186\nNon-trainable params: 0\n_________________________________________________________________\n
Run Code Online (Sandbox Code Playgroud)\n\n像这样的Stack Overflow 问题听起来像是nan
结果是数字问题的指标:梯度失控等等。然而,由于我正在处理一个很小的数据集,并且从我的模型返回的每个数字都是 a nan
,所以我怀疑我没有看到数字问题,而是看到了我如何构建模型的问题。
上面的代码是否具有用于序列到序列学习的正确模型/数据形状?如果是这样,为什么我会得到nan
s?
默认情况下,该Dense
层没有激活。如果您指定一个,则nan
s 消失。更改上面代码中的以下行。
model.add(TimeDistributed(Dense(labels, activation='softmax')))
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2619 次 |
最近记录: |