用于utf8编码的字节串的unicode()与str.decode()(python 2.x)

Question

用于utf8编码的字节串的unicode()与str.decode()(python 2.x)

是否有任何理由unicode(somestring, 'utf8')相反somestring.decode('utf8')？

我唯一想到的是这.decode()是一个绑定方法,所以python可以更有效地解决它,但如果我错了,请纠正我.

Answer 1

它很容易进行基准测试:

>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'") 
>>> tu.timeit()
2.7656929492950439
>>>

Run Code Online (Sandbox Code Playgroud)

显然,unicode()更快.

FWIW,我不知道你在哪里得到的方法会更快 - 这恰恰相反.

Answer 2

dF.*_*dF. 23

我更喜欢,'something'.decode(...)因为unicodePython 3.0中的类型不再存在,但text = b'binarydata'.decode(encoding)仍然有效.

好点.另外,请注意字符串在python 3中默认是unicode http://docs.python.org/3.0/whatsnew/3.0.html (4认同)

归档时间：	17 年，1 月前
查看次数：	29603 次
最近记录：	13 年，11 月前