Ars*_*sen 4 python assembly clock timeit
我想测量在Python 3中进行加法运算所需的时钟周期数.
我写了一个程序来计算加法运算的平均值:
from timeit import timeit
def test(n):
for i in range(n):
1 + 1
if __name__ == '__main__':
times = {}
for i in [2 ** n for n in range(10)]:
t = timeit.timeit("test(%d)" % i, setup="from __main__ import test", number=100000)
times[i] = t
print("%d additions takes %f" % (i, t))
keys = sorted(list(times.keys()))
for i in range(len(keys) - 2):
print("1 addition takes %f" % ((times[keys[i+1]] - times[keys[i]]) / (keys[i+1] - keys[i])))
Run Code Online (Sandbox Code Playgroud)
输出:
16 additions takes 0.288647
32 additions takes 0.422229
64 additions takes 0.712617
128 additions takes 1.275438
256 additions takes 2.415222
512 additions takes 5.050155
1024 additions takes 10.381530
2048 additions takes 21.185604
4096 additions takes 43.122559
8192 additions takes 88.323853
16384 additions takes 194.353927
1 addition takes 0.008292
1 addition takes 0.010068
1 addition takes 0.008654
1 addition takes 0.010318
1 addition takes 0.008349
1 addition takes 0.009075
1 addition takes 0.008794
1 addition takes 0.008905
1 addition takes 0.010293
1 addition takes 0.010413
1 addition takes 0.010551
1 addition takes 0.010711
1 addition takes 0.011035
Run Code Online (Sandbox Code Playgroud)
因此,根据此输出,一次添加大约需要0.0095个usecs.按照本页说明,我计算出一次加法需要25个CPU周期.这是正常值,为什么?因为汇编指令ADD只需要1-2个CPU周期.
您正在计时函数调用(test()),for循环和调用range().增加的时间根本没有.
def test(n):
for i in range(n):
1 + 1
import dis
dis.dis(test)
Run Code Online (Sandbox Code Playgroud)
这是测试函数的字节代码(不包括调用test()):
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_GLOBAL 0 (range)
6 LOAD_FAST 0 (n)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 10 (to 26)
16 STORE_FAST 1 (i)
3 19 LOAD_CONST 2 (2) ****
22 POP_TOP
23 JUMP_ABSOLUTE 13
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)
****注意,添加是在编译时完成的.相当多的其他语言及其编译器将会这样做,包括C.但是,标准很少定义何时1 + 1实际完成,因此它通常依赖于实现.
编辑:
你的timeit函数调用可能是这样的:
t = timeit("x += 1", setup="x = 1", number=100000)
Run Code Online (Sandbox Code Playgroud)
我们可以创建一个虚函数来检查操作:
def myfunc(x):
x += 1
import dis
dis.dis(myfunc)
Run Code Online (Sandbox Code Playgroud)
做出改变会给出:
1 additions takes 0.008976
2 additions takes 0.007419
4 additions takes 0.007282
8 additions takes 0.007693
16 additions takes 0.007026
32 additions takes 0.007793
64 additions takes 0.010168
128 additions takes 0.008124
256 additions takes 0.009064
512 additions takes 0.007256
1 addition takes -0.001557
1 addition takes -0.000068
1 addition takes 0.000103
1 addition takes -0.000083
1 addition takes 0.000048
1 addition takes 0.000074
1 addition takes -0.000032
1 addition takes 0.000007
26 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)
注意,这x += 1是一个INPLACE_ADD,不同的x = x + 1是BINARY_ADD.因此,您需要确定要测量的OPCode.
通过使用该dis模块,您可以更深入地了解幕后发生的事情.
具体来说,该dis.dis函数获取一段已编译的Python代码,并返回所述片段被解释为的字节代码.在1 + 1的情况下:
In [1]: import dis
In [2]: def add1and1():
return 1 + 1
In [3]: dis.dis(add1and1)
2 0 LOAD_CONST 2 (2)
3 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)
因此,在这种情况下,当源代码被编译为字节代码时,操作1 + 1仅执行一次,然后将结果存储为常量.我们可以通过返回传递给函数的参数总和来解决这个问题:
In [1]: import dis
In [2]: def add(x, y):
return x + y
In [3]: dis.dis(add)
2 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 BINARY_ADD
7 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)
所以你真正感兴趣的字节码指令是BINARY_ADD.如果您想了解更多相关信息,可以在CPython解释器ceval.c文件(此处)中找到相关部分:
TARGET(BINARY_ADD) {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
if (PyUnicode_CheckExact(left) &&
PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(left, right, f, next_instr);
/* unicode_concatenate consumed the ref to v */
}
else {
sum = PyNumber_Add(left, right);
Py_DECREF(left);
}
Py_DECREF(right);
SET_TOP(sum);
if (sum == NULL)
goto error;
DISPATCH();
}
Run Code Online (Sandbox Code Playgroud)
所以这里发生的事情比你原先预期的要多.我们有:
一个条件来确定我们是BINARY_ADD用于字符串连接还是用于添加数字类型
实际调用PyNumber_Add人们可能期望更多的东西left + right
Python的动态特性解释了这两点; 由于Python不知道实际调用的类型x或类型,因此类型检查是在运行时而不是编译时完成的.可以在动态语言中进行巧妙的优化来解决这个问题(参见V8 for JavaScript或PyPy for Python),但一般来说,这是您为解释的动态类型语言的灵活性付出的代价.yadd