Tensorflow多GPU情况下如何使用feed_dict

Sea*_*ean 5 python distributed tensorflow

最近,我尝试学习如何在多个GPU上使用Tensorflow来加快训练速度。我找到了有关基于Cifar10数据集的训练分类模型的官方教程。但是,我发现本教程使用队列读取图像。出于好奇,我如何通过向Session输入价值来使用多个GPU?似乎很难解决将同一数据集的不同值提供给不同GPU的问题。谢谢大家!以下代码是官方教程的一部分。

images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
      [images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        # Dequeues one batch for the GPU
        image_batch, label_batch = batch_queue.dequeue()
        # Calculate the loss for one tower of the CIFAR model. This function
        # constructs the entire CIFAR model but shares the variables across
        # all towers.
        loss = tower_loss(scope, image_batch, label_batch)

        # Reuse variables for the next tower.
        tf.get_variable_scope().reuse_variables()

        # Retain the summaries from the final tower.
        summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

        # Calculate the gradients for the batch of data on this CIFAR tower.
        grads = opt.compute_gradients(loss)

        # Keep track of the gradients across all towers.
        tower_grads.append(grads)
Run Code Online (Sandbox Code Playgroud)

Ami*_*mir 0

QueueRunner和基于队列的 API 相对过时,在 Tensorflow文档中明确提到:

使用基于队列的 API 的输入管道可以被tf.dataAPI干净地替换

因此,建议使用tf.dataAPI​​。它针对多 GPU 和 TPU 目的进行了优化。

如何使用它?

dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
iterator = dataset.make_one_shot_iterator()
x,y = iterator.get_next()
# define your model
logit = tf.layers.dense(x,2) # use x directrly in your model
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
train_step = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
  sess.run(train_step) 
Run Code Online (Sandbox Code Playgroud)

您可以使用Dataset.shard()或更轻松地使用估算器 API 为每个 GPU 创建多个迭代器。

有关完整的教程,请参见此处