为什么使用参数使这个函数变得如此慢？

Question

为什么使用参数使这个函数变得如此慢？

Mar*_*rco 15 python algorithm math performance

我最近一直在努力创建一个素数查找程序.但是,我注意到一个函数在使用参数时比使用预设值时慢得多.

在3个不同的版本中,很明显变量显着减慢了程序,我想知道原因.

版本1:7.5秒

这是原始的(这个问题有点简化)功能:

def version1(n, p):
  return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)

Run Code Online (Sandbox Code Playgroud)

使用timeit模块运行100次时:

timeit.timeit("version1(200, 500000000)", "from __main__ import version1", number=100)

Run Code Online (Sandbox Code Playgroud)

这需要7.5几秒钟.

版本2:0.0001秒

但是,这是第二个版本,其中没有参数,并且数字直接放在返回值中.该方程与第1版完全相同:

def version2():
  return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)

Run Code Online (Sandbox Code Playgroud)

使用timeit模块运行100次时:

timeit.timeit("version2()", "from __main__ import version2", number=100

Run Code Online (Sandbox Code Playgroud)

在这只需要0.00001几秒钟!

版本3:6.3秒

最后,为了完整性,我尝试了一个没有参数但仍将其值保存为变量的版本:

def version3():
  n = 200
  p = 500000000
  return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)

Run Code Online (Sandbox Code Playgroud)

运行时timeit:

timeit.timeit("version3()", "from __main__ import version3", number = 100)

Run Code Online (Sandbox Code Playgroud)

花了6.3几秒钟,相对接近第1版.

为什么当涉及变量时,相同的函数可以花费更长的时间,以及如何使版本1更加有效？

Answer 1

Mar*_*ers 26

当编译为所谓的窥视孔优化时,Python预先计算计算:

>>> import dis
>>> def version2():
...   return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
...
>>> dis.dis(version2)
  2           0 LOAD_CONST              13 (39998)
              2 RETURN_VALUE

Run Code Online (Sandbox Code Playgroud)

version2()返回已经计算的值,并且没有实际工作.返回常数当然要比每次计算值快得多.

请参阅fold_binops_on_constants功能中peephole.c的编译器如何做这个细节Python源文件.

因此,编译version2需要(很多)时间比version1:

>>> import timeit
>>> version1_text = '''\
... def version1(n, p):
...   return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)
... '''
>>> version2_text = '''\
... def version2():
...   return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
... '''
>>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version1_text as t', number=10)
0.00028649598243646324
>>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version2_text as t', number=10)
2.2103765579813626

Run Code Online (Sandbox Code Playgroud)

好的东西Python缓存编译的字节码结果!

每个子表达式的中间结果也存储在co_consts代码对象的属性中,其中一些相当大:

>>> import sys
>>> consts = version2.__code__.co_consts
>>> for obj in consts:
...     size = sys.getsizeof(obj)
...     print(f'{type(obj)!s:<18} {size:<8} {"<too large to print>" if size > 100 else obj}')
...
<class 'NoneType'> 16       None
<class 'int'>      28       200
<class 'int'>      28       2
<class 'int'>      28       1
<class 'int'>      28       500000000
<class 'int'>      28       40000
<class 'int'>      28       39998
<class 'int'>      66666692 <too large to print>
<class 'int'>      66666692 <too large to print>
<class 'int'>      28       39998
<class 'int'>      28       40000
<class 'int'>      28       39998
<class 'int'>      24       0
<class 'int'>      28       39998

Run Code Online (Sandbox Code Playgroud)

所以这确实使字节码缓存更大一些:

>>> import marshal
>>> len(marshal.dumps(version1.__code__))
129
>>> len(marshal.dumps(version2.__code__))
133333481

Run Code Online (Sandbox Code Playgroud)

.pyc对于包含非参数版本的模块,该文件的最小值为127MB !

我已经提交了http://bugs.python.org/issue30293来查看优化器是否应该避免创建如此大的折叠结果. (3认同)

归档时间：	8 年，5 月前
查看次数：	799 次
最近记录：	8 年，5 月前