来自keras的Model.train_on_batch和来自tensorflow的Session.run([train_optimizer])有什么区别?

Luc*_*ucG 9 python machine-learning keras tensorflow

在以下Keras和Tensorflow实现的神经网络训练model.train_on_batch([x], [y])中,keras实现sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict)中的方法与Tensorflow实现中的不同?特别是:这两条线在训练中如何导致不同的计算?:

keras_version.py

input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)

model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)

nb_batchs = int(len(x_train)/batch_size)

for epoch in range(epochs):
    loss = 0.0
    for batch in range(nb_batchs):
        x = x_train[batch*batch_size:(batch+1)*batch_size]
        y = y_train[batch*batch_size:(batch+1)*batch_size]

        loss_batch, acc_batch = model.train_on_batch([x], [y])

        loss += loss_batch
    print(epoch, loss / nb_batchs)
Run Code Online (Sandbox Code Playgroud)

tensorflow_version.py

input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)

input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
    name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

nb_batchs = int(len(x_train)/batch_size)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(epochs):
        loss = 0.0
        acc = 0.0

        for batch in range(nb_batchs):
            x = x_train[batch*batch_size:(batch+1)*batch_size]
            y = y_train[batch*batch_size:(batch+1)*batch_size]

            feed_dict = {input_x: x,
                         input_y: y}
            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

            loss += loss_batch
        print(epoch, loss / nb_batchs)
Run Code Online (Sandbox Code Playgroud)

注意:这个问题遵循相同(?)模型收敛于Keras但不在Tensorflow中,这被认为过于宽泛但我在其中明确说明为什么我认为这两个语句在某种程度上不同并导致不同的计算.

mlR*_*cks 7

是的,结果可能不同.如果您事先知道以下事项,结果应该不会令人惊讶:

  1. corss-entropyTensorflow和Keras的实现是不同的.Tensorflow将输入tf.nn.softmax_cross_entropy_with_logits_v2视为原始非标准化logits,同时Keras接受输入作为概率
  2. 实现optimizers在Keras和Tensorflow是不同的.
  3. 可能是您正在洗牌数据并且传递的批次的顺序不同.虽然如果长时间运行模型并不重要,但最初的几个时期可能完全不同.确保将相同的批次传递给两者,然后比较结果.