如何在 TensorFlow 中将字符串标签转换为单热向量？

Question

如何在 TensorFlow 中将字符串标签转换为单热向量？

so_*_*ser 6 python machine-learning tensorflow

我是 TensorFlow 的新手，想读取一个逗号分隔值 (csv) 文件，其中包含 2 列，第 1 列是索引，第 2 列是标签字符串。我有以下代码逐行读取 csv 文件中的行，并且我能够使用打印语句正确获取 csv 文件中的数据。但是，我想从字符串标签进行单热编码转换，而不是如何在 TensorFlow 中进行。最终目标是使用 tf.train.batch() 函数，这样我就可以获得一批单热标签向量来训练神经网络。

正如您在下面的代码中看到的，我可以在 TensorFlow 会话中为每个标签条目手动创建一个单热向量。但是如何使用 tf.train.batch() 函数？如果我移动线

label_batch = tf.train.batch([col2], batch_size=5)

Run Code Online (Sandbox Code Playgroud)

进入 TensorFlow 会话块（用 label_one_hot 替换 col2），程序块什么都不做。我试图将 one-hot 向量转换移到 TensorFlow 会话之外，但未能使其正常工作。正确的做法是什么？请帮忙。

label_files = []
label_files.append(LABEL_FILE)
print "label_files: ", label_files

filename_queue = tf.train.string_input_producer(label_files)

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
print "key:", key, ", value:", value

record_defaults = [['default_id'], ['default_label']]
col1, col2 = tf.decode_csv(value, record_defaults=record_defaults)

num_lines = sum(1 for line in open(LABEL_FILE))

label_batch = tf.train.batch([col2], batch_size=5)

with tf.Session() as sess:
    coordinator = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coordinator)

    for i in range(100):
        column1, column2 = sess.run([col1, col2])

        index = 0
        if column2 == 'airplane':
            index = 0
        elif column2 == 'automobile':
            index = 1
        elif column2 == 'bird':
            index = 2
        elif column2 == 'cat':
            index = 3
        elif column2 == 'deer':
            index = 4
        elif column2 == 'dog':
            index = 5
        elif column2 == 'frog':
            index = 6
        elif column2 == 'horse':
            index = 7
        elif column2 == 'ship':
            index = 8
        elif column2 == 'truck':
            index = 9

        label_one_hot = tf.one_hot([index], 10)  # depth=10 for 10 categories
        print "column1:", column1, ", column2:", column2
        # print "onehot label:", sess.run([label_one_hot])

    print sess.run(label_batch)

    coordinator.request_stop()
    coordinator.join(threads)

Run Code Online (Sandbox Code Playgroud)

Answer 1

VS_*_*_FF 2

您可能想尝试将index变量输入占位符，该占位符又通过tf.one_hot?转换为 one-hot 向量。沿着这些思路：

lbl = tf.placeholder(tf.uint8, [YOUR_BATCH_SIZE])
lbl_one_hot = tf.one_hot(lbl, YOUR_VOCAB_SIZE, 1.0, 0.0)
lb_h = sess.run([lbl_one_hot], feed_dict={lbl: index})

Run Code Online (Sandbox Code Playgroud)

不确定您是否正在批量执行操作，因此如果不是您的情况，则 YOUR_BATCH_SIZE 可能无关紧要。您也可以使用 numpy.zeros 来完成此操作，但我发现上面的方法更干净、更容易，尤其是在批处理方面。

通过这种方法，我们在TF之外准备标签索引列表？我如何在 TF 中进行所有操作？ (4认同)

归档时间：	8 年，10 月前
查看次数：	10350 次
最近记录：	6 年，8 月前