XOR 的 Tensorflow 在 500 个 epoch 后无法正确预测

Question

XOR 的 Tensorflow 在 500 个 epoch 后无法正确预测

gui*_*a84 4 python machine-learning neural-network keras tensorflow

我正在尝试使用 TensorFlow 实现神经网络来解决 XOR 问题。我选择 sigmoid 作为激活函数，形状(2, 2, 1)和optimizer=SGD()。我选择这个batch_size=1问题是因为问题的宇宙是 4，所以真的很小。问题是预测与正确答案相差甚远。我究竟做错了什么？

我是在Google Colab上做的，Tensorflow版本是2.3.0。

import tensorflow as tf
import numpy as np



x = np.array([[0, 0],
              [1, 1],
              [1, 0],
              [0, 1]],  dtype=np.float32)

y = np.array([[0], 
              [0], 
              [1], 
              [1]],     dtype=np.float32)



model =  tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(2,)))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))

model.compile(optimizer=tf.keras.optimizers.SGD(), 
              loss=tf.keras.losses.MeanSquaredError(), 
              metrics=['binary_accuracy'])

history = model.fit(x, y, batch_size=1, epochs=500, verbose=False)

print("Tensorflow version: ", tf.__version__)
predictions = model.predict_on_batch(x)
print(predictions)

Run Code Online (Sandbox Code Playgroud)

输出：

Tensorflow version:  2.3.0
WARNING:tensorflow:10 out of the last 10 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f69f7a83a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
[[0.5090364 ]
[0.4890102 ]
[0.50011414]
[0.49678832]]

Run Code Online (Sandbox Code Playgroud)

Answer 1

Nik*_*ido 5

问题在于你的学习率和优化权重的方式

训练时要记住的另一个因素是我们在梯度方向上采取的步长。如果这一步太大，我们最终可能会处于错误的位置，跳出局部最小值。如果太小，我们永远无法达到最小值。

默认情况下，keras 中的随机梯度下降 (SGD) 的学习率为 0.01。并且这个学习率在训练过程中是固定的。如果你检查你的训练，你会发现损失向全局最小值移动的速度太慢，或者有时会跳到更高的值。对于您的具体问题，以固定的学习率达到最小值是相当困难的，因为您没有考虑损失函数景观。

例如，使用Adamas 优化器算法和 a learning_rate = 0.02，我能够达到 1 的精度

import tensorflow as tf
import numpy as np

x = np.array([[0, 0],
              [1, 1],
              [1, 0],
              [0, 1]],  dtype=np.float32)

y = np.array([[0], 
              [0], 
              [1], 
              [1]],     dtype=np.float32)

model =  tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(2,)))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), # learning rate was 0.001 prior to this change
              loss=tf.keras.losses.MeanSquaredError(), 
              metrics=['mse', 'binary_accuracy'])
model.summary()
print("Tensorflow version: ", tf.__version__)
predictions = model.predict_on_batch(x)
print(predictions)history = model.fit(x, y, batch_size=1, epochs=500)

[[0.05162644]
[0.06670767]
[0.9240402 ]
[0.923379  ]]

Run Code Online (Sandbox Code Playgroud)

我使用 Adam 是因为它具有自适应学习率，该学习率在训练期间根据火车的运行情况进行调整。

如果您使用更大的学习率 (0.1)，但使用 SGD，则在历史训练损失中，您可以看到准确率在某个时刻达到 1，但紧接着它会跳到更低的值。那是因为你有固定的学习率。另一种策略是当您使用 SGD（也许使用 keras）达到该值时停止训练callback。

不要忘记调整你的学习率并选择正确的优化器。获得快速培训和良好的最低限度至关重要。

还要考虑更改网络架构（添加节点，并为隐藏层使用其他激活函数，例如 Relu）

这里有一些关于如何处理学习率的有用细节

归档时间：	5 年，6 月前
查看次数：	756 次
最近记录：	4 年，3 月前