Tensorflow队列 - 在列车和验证数据之间切换

Question

Tensorflow队列 - 在列车和验证数据之间切换

我试图利用队列从Tensorflow中的文件加载数据.

我想在每个时代结束时使用验证数据运行图表,以更好地了解培训的进展情况.

这就是我遇到问题的地方.我似乎无法弄清楚如何在使用队列时在训练数据和验证数据之间进行切换.

我已经将我的代码剥离到一个简单的玩具示例,以便更容易获得帮助.我没有包含加载图像文件的所有代码,执行推理和训练,而是将文件名加载到队列中.

import tensorflow as tf

#  DATA
train_items = ["train_file_{}".format(i) for i in range(6)]
valid_items = ["valid_file_{}".format(i) for i in range(3)]

# SETTINGS
batch_size = 3
batches_per_epoch = 2
epochs = 2

# CREATE GRAPH
graph = tf.Graph()
with graph.as_default():
    file_list = tf.placeholder(dtype=tf.string, shape=None)

    # Create a queue consisting of the strings in `file_list`
    q = tf.train.string_input_producer(train_items, shuffle=False, num_epochs=None)

    # Create batch of items.
    x = q.dequeue_many(batch_size)

    # Inference, train op, and accuracy calculation after this point
    # ...


# RUN SESSION
with tf.Session(graph=graph) as sess:
    # Initialize variables
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())

    # Start populating the queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    try:
        for epoch in range(epochs):
            print("-"*60)
            for step in range(batches_per_epoch):
                if coord.should_stop():
                    break
                train_batch = sess.run(x, feed_dict={file_list: train_items})
                print("TRAIN_BATCH: {}".format(train_batch))

            valid_batch = sess.run(x, feed_dict={file_list: valid_items})
            print("\nVALID_BATCH : {} \n".format(valid_batch))

    except Exception, e:
        coord.request_stop(e)
    finally:
        coord.request_stop()
        coord.join(threads)

Run Code Online (Sandbox Code Playgroud)

变化和实验

尝试不同的值 `num_epochs`

num_epochs =无

如果我将num_epochs参数设置为tf.train.string_input_producer(), None则给出以下输出,表明它正在按预期运行两个纪元,但它在运行评估时使用来自训练集的数据.

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']

------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']

VALID_BATCH : ['train_file_3' 'train_file_4' 'train_file_5']

Run Code Online (Sandbox Code Playgroud)

num_epochs = 2

如果我将num_epochs参数设置为tf.train.string_input_producer(),2 则给出以下输出,这表明它甚至根本没有运行完整的两个批次(并且evaliation仍在使用训练数据)

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']

------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

Run Code Online (Sandbox Code Playgroud)

num_epochs = 1

如果我设置num_epochs的参数tf.train.string_input_producer(),以1 在希望它将从队列中冲洗掉任何aditional的训练数据,因此它可以利用验证数据的,我得到下面的输出,这表明它尽快结束,因为它得到通过一个训练数据的时代,并没有经过加载评估数据.

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

Run Code Online (Sandbox Code Playgroud)

将`capacity`参数设置为各种值

我也尝试将capacity参数设置tf.train.string_input_producer()为小值,例如3和1.但这些对结果没有影响.

我应该采取什么其他方法？

我可以采取哪些其他方法来切换培训和验证数据？我是否必须创建单独的队列？我不知道如何让它工作.我还需要创建额外的协调员和队列运行器吗？

Answer 1

ron*_*est 10

我正在编制一份可能解决此问题的潜在方法列表.其中大多数只是模糊的建议,没有实际的代码示例来展示如何使用它们.

占位符默认

建议在这里

使用tf.cond()

建议在这里

sygi也在这个stackoverflow线程上提出了建议.链接

使用tf.group()和tf.cond()

建议在这里

make_template()方法

建议在这里和这里

共享权重法

sygi在这个stackoverflow线程(链接)中建议.这可能与make_template()方法相同.

QueueBase()方法.

这里建议使用示例代码此代码在此主题上适用于我的问题代码.链接

训练桶方法

建议在这里

Answer 2

syg*_*ygi 8

首先,您可以手动读取代码中的示例(到numpy数组)并以您想要的任何方式传递它:

data = tf.placeholder(tf.float32, [None, DATA_SHAPE])
for _ in xrange(num_epochs):
  some_training = read_some_data()
  sess.run(train_op, feed_dict={data: some_training})
  some_testing = read_some_test_data()
  sess.run(eval_op, feed_dict={data: some_testing})

Run Code Online (Sandbox Code Playgroud)

如果您需要使用队列,您可以尝试有条件地将队列从"训练"更改为"测试":

train_filenames = tf.string_input_producer(["training_file"])
train_q = some_reader(train_filenames)
test_filenames = tf.string_input_producer(["testing_file"])
test_q = some_reader(test_filenames)

am_testing = tf.placeholder(dtype=bool,shape=())
data = tf.cond(am_testing, lambda:test_q, lambda:train_q)
train_op, accuracy = model(data)

for _ in xrange(num_epochs):
  sess.run(train_op, feed_dict={am_testing: False})
  sess.run(accuracy, feed_dict={am_testing: True})

Run Code Online (Sandbox Code Playgroud)

第二种方法被认为是不安全的 - 在这篇文章中,鼓励建立两个单独的图表用于训练和测试(共享权重),这是实现你想要的另一种方式.

归档时间：	9 年，2 月前
查看次数：	7712 次
最近记录：	8 年，8 月前

Tensorflow队列 - 在列车和验证数据之间切换

变化和实验

尝试不同的值 num_epochs

num_epochs =无

num_epochs = 2

num_epochs = 1

将capacity参数设置为各种值

我应该采取什么其他方法？

占位符默认

使用tf.cond()

使用tf.group()和tf.cond()

make_template()方法

共享权重法

QueueBase()方法.

训练桶方法

尝试不同的值 `num_epochs`

将`capacity`参数设置为各种值