fly*_*eep 43 python python-3.x
bytes在Python 3中迭代对象时,可以将个体bytes作为ints:
>>> [b for b in b'123']
[49, 50, 51]
Run Code Online (Sandbox Code Playgroud)
如何获得1长度的bytes对象?
以下是可能的,但对于读者来说不是很明显,并且很可能表现不好:
>>> [bytes([b]) for b in b'123']
[b'1', b'2', b'3']
Run Code Online (Sandbox Code Playgroud)
jfs*_*jfs 34
如果您担心此代码的性能,并且int在您的情况下字节不适合您,那么您应该重新考虑您使用的数据结构,例如,使用str对象.
您可以切片bytes对象以获得1个长度的bytes对象:
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
Run Code Online (Sandbox Code Playgroud)
有PEP 0467 - 二进制序列的次要API改进,提出了bytes.iterbytes()方法:
>>> list(b'123'.iterbytes())
[b'1', b'2', b'3']
Run Code Online (Sandbox Code Playgroud)
sna*_*erb 13
int.to_bytes
int对象具有to_bytes方法,可用于将int转换为其相应的字节:
>>> import sys
>>> [i.to_bytes(1, sys.byteorder) for i in b'123']
[b'1', b'2', b'3']
Run Code Online (Sandbox Code Playgroud)
与其他一些答案一样,还不清楚这是否比OP的原始解决方案更具可读性:我认为,length和byteorder参数使其更嘈杂。
struct.unpack
另一种方法是使用struct.unpack,尽管除非您熟悉struct模块,否则也可能很难阅读:
>>> import struct
>>> struct.unpack('3c', b'123')
(b'1', b'2', b'3')
Run Code Online (Sandbox Code Playgroud)
(正如jfs在评论中观察到的那样,的格式字符串struct.unpack可以动态构造;在这种情况下,我们知道结果中的单个字节数必须等于原始字节串中的字节数,因此struct.unpack(str(len(bytestring)) + 'c', bytestring)是可能的。)
性能
>>> import random, timeit
>>> bs = bytes(random.randint(0, 255) for i in range(100))
>>> # OP's solution
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[bytes([b]) for b in bs]")
46.49886950897053
>>> # Accepted answer from jfs
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[bs[i:i+1] for i in range(len(bs))]")
20.91463226894848
>>> # Leon's answer
>>> timeit.timeit(setup="from __main__ import bs",
stmt="list(map(bytes, zip(bs)))")
27.476876026019454
>>> # guettli's answer
>>> timeit.timeit(setup="from __main__ import iter_bytes, bs",
stmt="list(iter_bytes(bs))")
24.107485140906647
>>> # user38's answer (with Leon's suggested fix)
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[chr(i).encode('latin-1') for i in bs]")
45.937552741961554
>>> # Using int.to_bytes
>>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder",
stmt="[x.to_bytes(1, byteorder) for x in bs]")
32.197659170022234
>>> # Using struct.unpack, converting the resulting tuple to list
>>> # to be fair to other methods
>>> timeit.timeit(setup="from __main__ import bs;from struct import unpack",
stmt="list(unpack('100c', bs))")
1.902243083808571
Run Code Online (Sandbox Code Playgroud)
struct.unpack似乎比其他方法快至少一个数量级,大概是因为它在字节级别上运行。 int.to_bytes另一方面,与大多数“显而易见”的方法相比,其效果要差。
我使用这个辅助方法:
def iter_bytes(my_bytes):
for i in range(len(my_bytes)):
yield my_bytes[i:i+1]
Run Code Online (Sandbox Code Playgroud)
适用于 Python2 和 Python3。
从python 3.5开始,您可以使用%格式化字节和bytearray:
[b'%c' % i for i in b'123']
Run Code Online (Sandbox Code Playgroud)
输出:
[b'1', b'2', b'3']
Run Code Online (Sandbox Code Playgroud)
上面的解决方案比您的初始方法快2-3倍,如果您想要更快的解决方案,我建议您使用numpy.frombuffer:
import numpy as np
np.frombuffer(b'123', dtype='S1')
Run Code Online (Sandbox Code Playgroud)
输出:
array([b'1', b'2', b'3'],
dtype='|S1')
Run Code Online (Sandbox Code Playgroud)
第二种解决方案比struct.unpack快10%(我对100个随机字节使用了与@snakecharmerb相同的性能测试)
I thought it might be useful to compare the runtimes of the different approaches so I made a benchmark (using my library simple_benchmark):
Probably unsurprisingly the NumPy solution is by far the fastest solution for large bytes object.
But if a resulting list is desired then both the NumPy solution (with the tolist()) and the struct solution are much faster than the other alternatives.
I didn't include guettlis answer because it's almost identical to jfs solution just instead of a comprehension a generator function is used.
import numpy as np
import struct
import sys
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
@b.add_function()
def jfs(bytes_obj):
return [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
@b.add_function()
def snakecharmerb_tobytes(bytes_obj):
return [i.to_bytes(1, sys.byteorder) for i in bytes_obj]
@b.add_function()
def snakecharmerb_struct(bytes_obj):
return struct.unpack(str(len(bytes_obj)) + 'c', bytes_obj)
@b.add_function()
def Leon(bytes_obj):
return list(map(bytes, zip(bytes_obj)))
@b.add_function()
def rusu_ro1_format(bytes_obj):
return [b'%c' % i for i in bytes_obj]
@b.add_function()
def rusu_ro1_numpy(bytes_obj):
return np.frombuffer(bytes_obj, dtype='S1')
@b.add_function()
def rusu_ro1_numpy_tolist(bytes_obj):
return np.frombuffer(bytes_obj, dtype='S1').tolist()
@b.add_function()
def User38(bytes_obj):
return [chr(i).encode() for i in bytes_obj]
@b.add_arguments('byte object length')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, b'a' * size
r = b.run()
r.plot()
Run Code Online (Sandbox Code Playgroud)
的三重奏map(),bytes()并且zip()可以解决问题:
>>> list(map(bytes, zip(b'123')))
[b'1', b'2', b'3']
Run Code Online (Sandbox Code Playgroud)
但是我不认为它比它更具可读性[bytes([b]) for b in b'123']或性能更好。
| 归档时间: |
|
| 查看次数: |
25153 次 |
| 最近记录: |