标准TensorFlow格式的Unicode

Rus*_*ell 5 python unicode protocol-buffers tensorflow

这里的文档之后,我试图从unicode字符串创建功能.这是特征创建方法的样子,

def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
Run Code Online (Sandbox Code Playgroud)

这将引发异常,

  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 512, in init
    copy.extend(field_value)
  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/containers.py", line 275, in extend
    new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/type_checkers.py", line 108, in CheckValue
    raise TypeError(message)
TypeError: u'Gross' has type <type 'unicode'>, but expected one of: (<type 'str'>,)
Run Code Online (Sandbox Code Playgroud)

当然,如果我将其包装成valuea str,它会遇到它遇到的第一个实际的 unicode角色.

Yar*_*tov 6

BytesList 定义在feature.proto中,它是类型repeated bytes,这意味着您需要将可转换为可转换为字节序列列表的内容传递给它.

unicode转换为字节列表的方法不止一种,因此模糊不清.你可以手动完成.IE,要使用UTF-8编码

value.encode("utf-8")
Run Code Online (Sandbox Code Playgroud)