如何在 TensorFlow 中对标签进行编码?

Vla*_*tiy 5 tensorflow

我需要将字符串标签转换为向量,例如 [0, 0, ... , 1, ... 0]。
据我所知,这就是所谓的“一个热向量”。
我有 10 个类,因此有 10 个不同的字符串标签。

有人可以帮忙进行正变换和逆变换吗?
我是张量流的新手,所以请友善。

All*_*oie 4

前进的方向很简单,因为有这样的tf.one_hot操作:

import tensorflow as tf

original_indices = tf.constant([1, 5, 3])
depth = tf.constant(10)
one_hot_encoded = tf.one_hot(indices=original_indices, depth=depth)

with tf.Session():
  print(one_hot_encoded.eval())
Run Code Online (Sandbox Code Playgroud)

输出:

[[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]]
Run Code Online (Sandbox Code Playgroud)

其反面也不错,可以tf.where找到非零索引:

def decode_one_hot(batch_of_vectors):
  """Computes indices for the non-zero entries in batched one-hot vectors.

  Args:
    batch_of_vectors: A Tensor with length-N vectors, having shape [..., N].
  Returns:
    An integer Tensor with shape [...] indicating the index of the non-zero
    value in each vector.
  """
  nonzero_indices = tf.where(tf.not_equal(
      batch_of_vectors, tf.zeros_like(batch_of_vectors)))
  reshaped_nonzero_indices = tf.reshape(
      nonzero_indices[:, -1], tf.shape(batch_of_vectors)[:-1])
  return reshaped_nonzero_indices

with tf.Session():
  print(decode_one_hot(one_hot_encoded).eval())
Run Code Online (Sandbox Code Playgroud)

印刷:

[1 5 3]
Run Code Online (Sandbox Code Playgroud)