损失函数适用于reduce_mean但不适用于reduce_sum

Question

损失函数适用于reduce_mean但不适用于reduce_sum

我是张力流的新手,并且一直在看这里的例子.我想将多层感知器分类模型重写为回归模型.但是在修改损失函数时遇到了一些奇怪的行为.它工作正常tf.reduce_mean,但如果我尝试使用tf.reduce_sum它在输出中给出nan.这看起来很奇怪,因为函数非常相似 - 唯一的区别是均值将总和结果除以元素数量？所以我看不出这种变化会如何引入nan？

import tensorflow as tf

# Parameters
learning_rate = 0.001

# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 32 # 2nd layer number of features
n_input = 2 # number of inputs
n_output = 1 # number of outputs

# Make artificial data
SAMPLES = 1000
X = np.random.rand(SAMPLES, n_input)
T = np.c_[X[:,0]**2 + np.sin(X[:,1])]

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_output])

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with tanh activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.tanh(layer_1)
    # Hidden layer with tanh activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.tanh(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_output]))
}

pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
#se = tf.reduce_sum(tf.square(pred - y))   # Why does this give nans?
mse = tf.reduce_mean(tf.square(pred - y))  # When this doesn't?
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)

# Initializing the variables
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

training_epochs = 10
display_step = 1

# Training cycle
for epoch in range(training_epochs):
    avg_cost = 0.
    # Loop over all batches
    for i in range(100):
        # Run optimization op (backprop) and cost op (to get loss value)
        _, msev = sess.run([optimizer, mse], feed_dict={x: X, y: T})
    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch+1), "mse=", \
            "{:.9f}".format(msev))

Run Code Online (Sandbox Code Playgroud)

有问题的变量se被注释掉了.它应该用来代替mse.

随着mse输出看起来是这样的:

Epoch: 0001 mse= 0.051669389
Epoch: 0002 mse= 0.031438075
Epoch: 0003 mse= 0.026629323
...

Run Code Online (Sandbox Code Playgroud)

并且se它最终会像这样:

Epoch: 0001 se= nan
Epoch: 0002 se= nan
Epoch: 0003 se= nan
...

Run Code Online (Sandbox Code Playgroud)

Answer 1

use*_*291 21

通过批量累加的损失是1000倍(从略读代码我认为你的训练批量大小是1000)所以你的渐变和参数更新也是1000倍.更大的更新显然导致nans.

通常学习率是以每个例子表示的,因此找到更新梯度的损失也应该是例子.如果损失是每批,则需要通过批量大小减少学习率以获得可比较的培训结果.

归档时间：	9 年，1 月前
查看次数：	8605 次
最近记录：	7 年，6 月前