即使在训练之前,所有时期的成本均为零

sig*_*nal 1 neural-network deep-learning tensorflow

我使用TensorFlow编写了简单的多层感知器程序。制作该程序以预测5个序列后的后续数字。(例如1 4 9 14 19 [24])是的,这非常简单。

但是我至少在四个小时内流浪了。因为即使在我做任何事情时,在所有时代的成本都是零。令人惊讶的是,我确保将权重和偏差初始化为非零(使用tf.ones),这没有帮助。

我怎么再看不到零值成本了?

import tensorflow as tf

n_input = 5
n_output = 1
n_hidden1 = 10
n_hidden2 = 10
learning_rate = 0.001
training_epochs = 20
batch_size = 100
display_step = 1

x = tf.placeholder(tf.float32, [None, n_input], name='X')
y = tf.placeholder(tf.float32, [None, n_output], name='Y')

with tf.name_scope('H1'):
    w1 = tf.Variable(tf.ones([n_input, n_hidden1]), name='W1')
    b1 = tf.Variable(tf.ones([n_hidden1]), name='b1')
    h1 = (tf.matmul(x, w1) + b1)

with tf.name_scope('H2'):
    w2 = tf.Variable(tf.ones([n_hidden1, n_hidden2]), name='W2')
    b2 = tf.Variable(tf.ones([n_hidden2]), name='b2')
    h2 = (tf.matmul(h1, w2) + b2)

with tf.name_scope('H3'):
    w3 = tf.Variable(tf.ones([n_hidden2, n_output]), name='W3')
    b3 = tf.Variable(tf.ones([n_output]), name='b3')
    pred = tf.matmul(h2, w3) + b3

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdadeltaOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()


def generate_sequences(size):
    def generate_sequence():
        from random import uniform
        start = uniform(0, 10000)
        seq = [start + i * (4 + uniform(0, 1)) for i in range(6)]
        return seq[:-1], [seq[-1]]
    seq = list(map(lambda _: generate_sequence(), range(size)))
    return [s[0] for s in seq], [s[1] for s in seq]

with tf.Session() as sess:
    sess.run(init)

    print('Before:', cost.eval(feed_dict={x: [[1, 5, 9, 14, 19]], y: [[24]]}))
    for epoch in range(1, training_epochs + 1):
        batch_x, batch_y = generate_sequences(batch_size)
        _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
        if epoch % display_step == 0:
            print('Epoch:', '%04d' % epoch, 'cost=', '{:.9f}'.format(c))
    print('Optimization Finished!')

    print(pred.eval(feed_dict={x: [[8, 12, 16, 20, 24]]}))
Run Code Online (Sandbox Code Playgroud)

控制台输出

Before: 0.0
Epoch: 0001 cost= 0.000000000
Epoch: 0002 cost= 0.000000000
Epoch: 0003 cost= 0.000000000
Epoch: 0004 cost= 0.000000000
Epoch: 0005 cost= 0.000000000
Epoch: 0006 cost= 0.000000000
Epoch: 0007 cost= 0.000000000
Epoch: 0008 cost= 0.000000000
Epoch: 0009 cost= 0.000000000
Epoch: 0010 cost= 0.000000000
Epoch: 0011 cost= 0.000000000
Epoch: 0012 cost= 0.000000000
Epoch: 0013 cost= 0.000000000
Epoch: 0014 cost= 0.000000000
Epoch: 0015 cost= 0.000000000
Epoch: 0016 cost= 0.000000000
Epoch: 0017 cost= 0.000000000
Epoch: 0018 cost= 0.000000000
Epoch: 0019 cost= 0.000000000
Epoch: 0020 cost= 0.000000000
Optimization Finished!
[[ 8142.25683594]]
Run Code Online (Sandbox Code Playgroud)

Dmi*_*kiy 5

问题是您正在使用损失函数进行分类(softmax通常用于分类),而您的网络可能会产生任意单个实数,因此它是回归而不是分类。使用适当的成本(例如,均方误差),您的网络将开始收敛。

在这种情况下,只需更改以下行:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
Run Code Online (Sandbox Code Playgroud)

像这样:

cost = tf.reduce_mean(tf.squared_difference(y, pred))
Run Code Online (Sandbox Code Playgroud)

  • 真的不明白你的意思。深度学习并不是真的很特别,它使用的是经典机器学习中使用的相同原理,尽管它具有更大的网络。 (3认同)