简单的多层感知器模型在 TensorFlow 中不收敛

Esi*_*dor 5 deep-learning tensorflow

我是 TensorFlow 的新手。今天我尝试在 TF 中实现我的第一个模型,但它返回了奇怪的结果。我知道我在这里遗漏了一些东西,但我无法弄清楚。这是故事。

模型

我有一个简单的多层感知器模型,只有一个隐藏层应用于 MNIST 数据库。层定义为 [input(784) , hidden_​​layer(470) , output_layer(10)] ,tanh隐藏层的非线性和softmax输出层的损失。我使用的优化器是梯度下降算法,学习率为0.01。我的 mini batch size 是 1(我正在用样本一一训练模型)。

我的实现:

  1. 首先,我用 C++ 实现了我的模型并获得了大约 96% 的准确率。这是存储库:https : //github.com/amin2ros/Artificog
  2. 我在 TensorFlow 中实现了确切的模型,但令人惊讶的是该模型根本没有收敛。这是代码。

代码:

import sys
import input_data
import matplotlib.pyplot as plt
from pylab import *
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.1
training_epochs = 1
batch_size = 1
display_step = 1
# Network Parameters
n_hidden_1 = 470 # 1st layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
# Create model
def multilayer_perceptron(_X, _weights, _biases):
    layer_1 = tf.tanh(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1'])) 
    return tf.matmul(layer_1, _weights['out']) + _biases['out']
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax(pred)) # Softmax loss
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost) #
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        m= 0 
        total_batch = int(mnist.train.num_examples/batch_size)
        counter=0
        #print 'count = ' , total_batch
        #sys.stdin.read(1)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            label = tf.argmax(batch_ys,1).eval()[0] 
            counter+=1
            sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
            wrong_prediction = tf.not_equal(tf.argmax(pred, 1), tf.argmax(y, 1))
            missed=tf.cast(wrong_prediction, "float")
            m += missed.eval({x: batch_xs, y: batch_ys})[0]
            print "Sample #", counter , " - Label : " , label , " - Prediction :" , tf.argmax(pred, 1).eval({x: batch_xs, y: batch_ys})[0]  ,\
             "- Missed = " , m ,  " - Error Rate = " , 100 * float(m)/counter
    print "Optimization Finished!"
Run Code Online (Sandbox Code Playgroud)

我很好奇为什么会发生这种情况。任何帮助表示赞赏。

编辑:

正如下面评论的,成本函数的定义是不正确的,所以它应该像

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
Run Code Online (Sandbox Code Playgroud)

现在模型收敛了:)