通过普通类、数据类和命名元组创建对象的有趣性能

big*_*nty 7 python python-3.x python-internals python-performance python-dataclasses

我正在浏览数据类并命名元组。我发现这种行为在使用 python 的不同特性创建对象时具有不同的性能。

数据类

In [1]: from dataclasses import dataclass
   ...:
   ...: @dataclass
   ...: class Position:
   ...:     lon: float = 0.0
   ...:     lat: float = 0.0
   ...:

In [2]: %timeit for _ in range(1000): Position(12.5, 345)
326 µs ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)

普通班:

In [1]: class Position:
   ...:
   ...:     def __init__(self, lon=0.0, lat=0.0):
   ...:         self.lon = lon
   ...:         self.lat = lat
   ...:

In [2]: %timeit for _ in range(1000): Position(12.5, 345)
248 µs ± 2.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)

命名元组:

In [2]: Position = namedtuple("Position", ["lon","lat"], defaults=[0.0,0.0])

In [3]: %timeit for _ in range(1000): Position(12.5, 345)
286 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)
Python Env - Python 3.7.3  
OS - MacOS Mojave
Run Code Online (Sandbox Code Playgroud)

所有实现都具有相同的对象属性、相同的默认值。

  1. 为什么时间(data_classes) > time(named_tuple) > time(normal_class)是这个趋势?
  2. 每个实现如何花费各自的时间?
  3. 哪种实现在什么场景下表现最好?

这里,时间表示创建对象所花费的时间。

Mic*_*ski 3

在 Python 中,一切都是字典。对于数据类,该字典中有更多条目,因此需要更多时间将它们放在那里。

这种变化是如何发生的?@Arne 的评论发现我在这里遗漏了一些东西。我做了示例代码:

from dataclasses import dataclass
import time

@dataclass
class Position:
    lon: float = 0.0
    lat: float = 0.0


start_time = time.time()
for i in range(100000):
    p = Position(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"dataclass {elapsed}")
print(dir(p))


class Position2:
    lon: float = 0.0
    lat: float = 0.0

    def __init__(self, lon, lat):
        self.lon = lon
        self.lat = lat


start_time = time.time()
for i in range(100000):
    p = Position2(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"just class {elapsed}")
print(dir(p))

start_time = time.time()
for i in range(100000):
    p = {"lon": 1.0, "lat": 1.0}
elapsed = time.time() - start_time
print(f"dict {elapsed}")
Run Code Online (Sandbox Code Playgroud)

结果:

from dataclasses import dataclass
import time

@dataclass
class Position:
    lon: float = 0.0
    lat: float = 0.0


start_time = time.time()
for i in range(100000):
    p = Position(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"dataclass {elapsed}")
print(dir(p))


class Position2:
    lon: float = 0.0
    lat: float = 0.0

    def __init__(self, lon, lat):
        self.lon = lon
        self.lat = lat


start_time = time.time()
for i in range(100000):
    p = Position2(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"just class {elapsed}")
print(dir(p))

start_time = time.time()
for i in range(100000):
    p = {"lon": 1.0, "lat": 1.0}
elapsed = time.time() - start_time
print(f"dict {elapsed}")
Run Code Online (Sandbox Code Playgroud)

Dict示例供参考。

查看dataclass,这个函数:

(489) def _init_fn(fields, frozen, has_post_init, self_name, globals):
Run Code Online (Sandbox Code Playgroud)

负责创建构造函数。正如 Arne 发现的那样 - post_init 代码是可选的,并且不会生成。我有其他想法,围绕领域有一些工作,但是:

In [5]: p = Position(lat = 1.1, lon=2.2)                                                                                                                                                                           

In [7]: p.lat.__class__                                                                                                                                                                                            
Out[7]: float
Run Code Online (Sandbox Code Playgroud)

所以这里没有额外的包装/代码。从所有这些中,我看到的唯一额外的东西 - 是更多的方法。