Six.text_type 与 text.decode('utf8') 相同吗？

Question

Six.text_type 与 text.decode('utf8') 相同吗？

给定一个如下函数：

import six

def convert_to_unicode(text):
  """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
  if six.PY3:
    if isinstance(text, str):
      return text
    elif isinstance(text, bytes):
      return text.decode("utf-8", "ignore")
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  elif six.PY2:
    if isinstance(text, str):
      return text.decode("utf-8", "ignore")
    elif isinstance(text, unicode):
      return text
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  else:
    raise ValueError("Not running on Python2 or Python 3?")

Run Code Online (Sandbox Code Playgroud)

由于six处理 python2 和 python3 兼容性，上面的convert_to_unicode(text)函数是否相当于six.text_type(text)？IE

def convert_to_unicode(text):
    return six.text_type(text)

Run Code Online (Sandbox Code Playgroud)

是否存在原始convert_to_unicode捕获但six.text_type无法捕获的情况？

Answer 1

len*_*enz 5

由于ist 只是对or类型的six.text_type引用，因此等效函数如下：strunicode

def convert_to_unicode(text):
    return six.text_type(text, encoding='utf8', errors='ignore')

Run Code Online (Sandbox Code Playgroud)

但在极端情况下，它的行为并不相同，例如。它会很高兴地转换一个整数，所以你必须先在那里进行一些检查。

另外，我不明白你为什么想要errors='ignore'. 你说你假设UTF-8。但如果违反了这一假设，您就会默默地删除数据。我强烈建议使用errors='strict'.

编辑：

text我刚刚意识到如果这已经是你想要的，这是行不通的。此外，它很乐意为任何非字符串输入引发 TypeError。那么这个怎么样：

def convert_to_unicode(text):
    if isinstance(text, six.text_type):
        return text
    return six.text_type(text, encoding='utf8', errors='ignore')

Run Code Online (Sandbox Code Playgroud)

这里发现的唯一极端情况是 Python 版本既不是 2 也不是 3。而且我仍然认为你应该使用errors='strict'.

归档时间：	6 年，4 月前
查看次数：	3180 次
最近记录：	6 年，4 月前