TensorFlow:2层前馈神经网络

Cir*_*lar 3 python machine-learning neural-network tensorflow

我正在尝试在TensorFlow(Python 3版本)中实现一个简单的全连接前馈神经网络.网络有2个输入和1个输出,我正在尝试训练它输出两个输入的XOR.我的代码如下:

import numpy as np
import tensorflow as tf

sess = tf.InteractiveSession()

inputs = tf.placeholder(tf.float32, shape = [None, 2])
desired_outputs = tf.placeholder(tf.float32, shape = [None, 1])

weights_1 = tf.Variable(tf.zeros([2, 3]))
biases_1 = tf.Variable(tf.zeros([1, 3]))
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1)

weights_2 = tf.Variable(tf.zeros([3, 1]))
biases_2 = tf.Variable(tf.zeros([1, 1]))
layer_2_outputs = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)

error_function = -tf.reduce_sum(desired_outputs * tf.log(layer_2_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)

sess.run(tf.initialize_all_variables())

training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
training_outputs = [[0.0], [1.0], [1.0], [0.0]]

for i in range(10000):
    train_step.run(feed_dict = {inputs: np.array(training_inputs), desired_outputs: np.array(training_outputs)})

print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[0.0, 0.0]])}))
print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[0.0, 1.0]])}))
print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[1.0, 0.0]])}))
print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[1.0, 1.0]])}))
Run Code Online (Sandbox Code Playgroud)

这似乎很简单,但最后的印刷语句表明,无论训练迭代次数或学习率如何,神经网络都无法接近所需的输出.谁能看到我做错了什么?

谢谢.

编辑:我也尝试了以下替代错误功能:

error_function = 0.5 * tf.reduce_sum(tf.sub(layer_2_outputs, desired_outputs) * tf.sub(layer_2_outputs, desired_outputs))
Run Code Online (Sandbox Code Playgroud)

该误差函数是误差平方的总和.它总是导致网络输出正好0.5的值 - 在我的代码中某处出现错误的另一个迹象.

编辑2:我发现我的代码适用于AND和OR,但不适用于XOR.我现在非常困惑.

nes*_*uno 8

您的代码中存在几个问题.在下文中,我将对每行进行评论,以便为您提供解决方案.

注意:XOR不是线性可分的.您需要超过1个隐藏图层.

注意:开头# [!]的行是你错的行.

import numpy as np
import tensorflow as tf

sess = tf.InteractiveSession()

# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 2])

# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 1])

# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4 

# connect 2 inputs to 3 hidden units
# [!] Initialize weights with random numbers, to make the network learn
weights_1 = tf.Variable(tf.truncated_normal([2, HIDDEN_UNITS]))

# [!] The biases are single values per hidden unit
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))

# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1)

# [!] The XOR problem is that the function is not linearly separable
# [!] A MLP (Multi layer perceptron) can learn to separe non linearly separable points ( you can
# think that it will learn hypercurves, not only hyperplanes)
# [!] Lets' add a new layer and change the layer 2 to output more than 1 value

# connect first hidden units to 2 hidden units in the second hidden layer
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 2]))
# [!] The same of above
biases_2 = tf.Variable(tf.zeros([2]))

# connect the hidden units to the second hidden layer
layer_2_outputs = tf.nn.sigmoid(
    tf.matmul(layer_1_outputs, weights_2) + biases_2)

# [!] create the new layer
weights_3 = tf.Variable(tf.truncated_normal([2, 1]))
biases_3 = tf.Variable(tf.zeros([1]))

logits = tf.nn.sigmoid(tf.matmul(layer_2_outputs, weights_3) + biases_3)

# [!] The error function chosen is good for a multiclass classification taks, not for a XOR.
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))

train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)

sess.run(tf.initialize_all_variables())

training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]

training_outputs = [[0.0], [1.0], [1.0], [0.0]]

for i in range(20000):
    _, loss = sess.run([train_step, error_function],
                       feed_dict={inputs: np.array(training_inputs),
                                  desired_outputs: np.array(training_outputs)})
    print(loss)

print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 0.0]])}))
print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 1.0]])}))
print(sess.run(logits, feed_dict={inputs: np.array([[1.0, 0.0]])}))
print(sess.run(logits, feed_dict={inputs: np.array([[1.0, 1.0]])}))
Run Code Online (Sandbox Code Playgroud)

我增加了列车迭代次数,以确保无论随机初始化值是什么,网络都会收敛.

20000次列车迭代后的输出为:

[[ 0.01759939]]
[[ 0.97418505]]
[[ 0.97734243]]
[[ 0.0310041]]
Run Code Online (Sandbox Code Playgroud)

它看起来很不错.