DNS*_*DNS 75 python performance data-structures
我正在优化一些主要瓶颈正在运行的代码并访问一个非常大的类似结构的对象列表.目前我正在使用namedtuples,以提高可读性.但是使用'timeit'的一些快速基准测试表明,在性能是一个因素的情况下,这确实是错误的方法:
用a,b,c命名的元组:
>>> timeit("z = a.c", "from __main__ import a")
0.38655471766332994
Run Code Online (Sandbox Code Playgroud)
使用__slots__a,b,c的类:
>>> timeit("z = b.c", "from __main__ import b")
0.14527461047146062
Run Code Online (Sandbox Code Playgroud)
带键a,b,c的字典:
>>> timeit("z = c['c']", "from __main__ import c")
0.11588272541098377
Run Code Online (Sandbox Code Playgroud)
具有三个值的元组,使用常量键:
>>> timeit("z = d[2]", "from __main__ import d")
0.11106188992948773
Run Code Online (Sandbox Code Playgroud)
使用常量键列出三个值:
>>> timeit("z = e[2]", "from __main__ import e")
0.086038238242508669
Run Code Online (Sandbox Code Playgroud)
具有三个值的元组,使用本地密钥:
>>> timeit("z = d[key]", "from __main__ import d, key")
0.11187358437882722
Run Code Online (Sandbox Code Playgroud)
使用本地密钥列出三个值:
>>> timeit("z = e[key]", "from __main__ import e, key")
0.088604143037173344
Run Code Online (Sandbox Code Playgroud)
首先,这些小timeit测试是否会使它们无效?我跑了几次,以确保没有任何随机系统事件抛出它们,结果几乎相同.
看起来字典在性能和可读性之间提供了最佳平衡,而类别排在第二位.这是不幸的,因为为了我的目的,我还需要对象是序列式的; 因此我选择了namedtuple.
列表要快得多,但常量键不可维护; 我必须创建一堆索引常量,即KEY_1 = 1,KEY_2 = 2等,这也是不理想的.
我是坚持这些选择,还是有一种我错过的选择?
Bri*_*ian 50
要记住的一件事是,namedtuples被优化为作为元组访问.如果你改变你的访问是a[2]不是a.c,你会看到类似的性能的元组.原因是名称访问器有效地转换为对self [idx]的调用,因此同时支付索引和名称查找价格.
如果您的使用模式是按名称访问是常见的,但是作为元组访问不是,您可以编写一个快速等效于namedtuple,它以相反的方式执行操作:将索引查找推迟到按名称访问.但是,您将支付索引查找的价格.例如,这是一个快速实现:
def makestruct(name, fields):
fields = fields.split()
import textwrap
template = textwrap.dedent("""\
class {name}(object):
__slots__ = {fields!r}
def __init__(self, {args}):
{self_fields} = {args}
def __getitem__(self, idx):
return getattr(self, fields[idx])
""").format(
name=name,
fields=fields,
args=','.join(fields),
self_fields=','.join('self.' + f for f in fields))
d = {'fields': fields}
exec template in d
return d[name]
Run Code Online (Sandbox Code Playgroud)
但__getitem__必须被称为时间非常糟糕:
namedtuple.a : 0.473686933517
namedtuple[0] : 0.180409193039
struct.a : 0.180846214294
struct[0] : 1.32191514969
Run Code Online (Sandbox Code Playgroud)
即,相同的性能__slots__对属性的访问(不出所料-这是它是什么)类,但巨大的处罚由于基于索引的访问的双查找.(值得注意的是,__slots__它实际上并没有太大的速度.它可以节省内存,但没有它们的访问时间大致相同.)
三分之一的选择是复制数据,例如.列表中的子类,并将值存储在attributes和listdata中.但是,您实际上并没有获得与列表等效的性能.在进行子类化(引入纯python重载检查)时,速度很快.因此结构[0]仍需要0.5s左右在此情况下(与0.18原始列表进行比较),和你做双倍的内存使用情况,所以这可能并不值得.
Ger*_*rat 44
这个问题相当陈旧(互联网时间),所以我想我今天尝试复制你的测试,包括常规CPython(2.7.6)和pypy(2.2.1),看看各种方法如何比较.(我还在命名元组的索引查找中添加了.)
这是一个微观基准,所以YMMV,但pypy似乎加速命名元组访问速度比CPython高30倍(而字典访问速度只增加了3倍).
from collections import namedtuple
STest = namedtuple("TEST", "a b c")
a = STest(a=1,b=2,c=3)
class Test(object):
__slots__ = ["a","b","c"]
a=1
b=2
c=3
b = Test()
c = {'a':1, 'b':2, 'c':3}
d = (1,2,3)
e = [1,2,3]
f = (1,2,3)
g = [1,2,3]
key = 2
if __name__ == '__main__':
from timeit import timeit
print("Named tuple with a, b, c:")
print(timeit("z = a.c", "from __main__ import a"))
print("Named tuple, using index:")
print(timeit("z = a[2]", "from __main__ import a"))
print("Class using __slots__, with a, b, c:")
print(timeit("z = b.c", "from __main__ import b"))
print("Dictionary with keys a, b, c:")
print(timeit("z = c['c']", "from __main__ import c"))
print("Tuple with three values, using a constant key:")
print(timeit("z = d[2]", "from __main__ import d"))
print("List with three values, using a constant key:")
print(timeit("z = e[2]", "from __main__ import e"))
print("Tuple with three values, using a local key:")
print(timeit("z = d[key]", "from __main__ import d, key"))
print("List with three values, using a local key:")
print(timeit("z = e[key]", "from __main__ import e, key"))
Run Code Online (Sandbox Code Playgroud)
Python结果:
Named tuple with a, b, c:
0.124072679784
Named tuple, using index:
0.0447055962367
Class using __slots__, with a, b, c:
0.0409136944224
Dictionary with keys a, b, c:
0.0412045334915
Tuple with three values, using a constant key:
0.0449477955531
List with three values, using a constant key:
0.0331083467148
Tuple with three values, using a local key:
0.0453569025139
List with three values, using a local key:
0.033030056702
Run Code Online (Sandbox Code Playgroud)
PyPy结果:
Named tuple with a, b, c:
0.00444889068604
Named tuple, using index:
0.00265598297119
Class using __slots__, with a, b, c:
0.00208616256714
Dictionary with keys a, b, c:
0.013897895813
Tuple with three values, using a constant key:
0.00275301933289
List with three values, using a constant key:
0.002760887146
Tuple with three values, using a local key:
0.002769947052
List with three values, using a local key:
0.00278806686401
Run Code Online (Sandbox Code Playgroud)
这个问题可能很快就会过时。CPython 开发人员显然对通过属性名称访问命名元组值的性能进行了重大改进。这些更改计划于2019 年 10 月下旬在Python 3.8 中发布。
请参阅:https : //bugs.python.org/issue32492和https://github.com/python/cpython/pull/10495。
由于这是一个老问题,而且我们现在有更新的数据结构(例如数据类),因此我们应该稍微重新审视一下这个问题:)
在 AMD 5950x 上测试
Python 3.11:
test_slots 0.082s
test_dataclass 0.085s
test_dataclass_slots 0.086s
test_namedtuple_index 0.143s
test_dict 0.144s
test_namedtuple_attr 0.169s
test_namedtuple_unpack 0.314s
test_enum_attr 0.615s
test_enum_item 1.082s
test_enum_call 3.018s
Run Code Online (Sandbox Code Playgroud)
Python 3.10:
test_dataclass_slots 0.155s
test_slots 0.156s
test_dataclass 0.177s
test_namedtuple_index 0.210s
test_dict 0.214s
test_namedtuple_attr 0.261s
test_namedtuple_unpack 0.473s
test_enum_attr 0.989s
test_enum_item 1.790s
test_enum_call 4.476s
Run Code Online (Sandbox Code Playgroud)
根据这些结果,我建议使用数据类进行命名访问或使用元组/命名元组进行索引访问。
测试代码可以在这里分叉:https://gist.github.com/WoLpH/02fae0b20b914354734aaac01c06d23b
import sys
import enum
import math
import random
import timeit
import typing
import dataclasses
import collections
repeat = 5
number = 1000
N = 5000
class PointTuple(typing.NamedTuple):
x: int
y: int
z: int
@dataclasses.dataclass
class PointDataclass:
x: int
y: int
z: int
@dataclasses.dataclass(slots=True)
class PointDataclassSlots:
x: int
y: int
z: int
class PointObject:
__slots__ = 'x', 'y', 'z'
x: int
y: int
z: int
def test_namedtuple_attr():
point = PointTuple(1234, 5678, 9012)
for i in range(N):
x, y, z = point.x, point.y, point.z
def test_namedtuple_index():
point = PointTuple(1234, 5678, 9012)
for i in range(N):
x, y, z = point
def test_namedtuple_unpack():
point = PointTuple(1234, 5678, 9012)
for i in range(N):
x, *y = point
def test_dataclass():
point = PointDataclass(1234, 5678, 9012)
for i in range(N):
x, y, z = point.x, point.y, point.z
def test_dataclass_slots():
point = PointDataclassSlots(1234, 5678, 9012)
for i in range(N):
x, y, z = point.x, point.y, point.z
def test_dict():
point = dict(x=1234, y=5678, z=9012)
for i in range(N):
x, y, z = point['x'], point['y'], point['z']
def test_slots():
point = PointObject()
point.x = 1234
point.y = 5678
point.z = 9012
for i in range(N):
x, y, z = point.x, point.y, point.z
class PointEnum(enum.Enum):
x = 1
y = 2
z = 3
def test_enum_attr():
point = PointEnum
for i in range(N):
x, y, z = point.x, point.y, point.z
def test_enum_call():
point = PointEnum
for i in range(N):
x, y, z = point(1), point(2), point(3)
def test_enum_item():
point = PointEnum
for i in range(N):
x, y, z = point['x'], point['y'], point['z']
if __name__ == '__main__':
tests = [
test_namedtuple_attr,
test_namedtuple_index,
test_namedtuple_unpack,
test_dataclass,
test_dataclass_slots,
test_dict,
test_slots,
test_enum_attr,
test_enum_call,
test_enum_item,
]
print(f'Running tests {repeat} times with {number} calls.')
print(f'Using {N} iterations in the loop')
results = collections.defaultdict(lambda: math.inf)
for i in range(repeat):
# Shuffling tests to prevent skewed results due to CPU boosting or
# thermal throttling
random.shuffle(tests)
print(f'Run {i}:', end=' ')
for t in tests:
name = t.__name__
print(name, end=', ')
sys.stdout.flush()
timer = timeit.Timer(f'{name}()', f'from __main__ import {name}')
results[name] = min(results[name], timer.timeit(number))
print()
for name, result in sorted(results.items(), key=lambda x: x[::-1]):
print(f'{name:30} {result:.3f}s')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
24120 次 |
| 最近记录: |