函数调用开销 - 为什么内置 Python 内置函数看起来比我的内置函数更快?

Pau*_*zer 5 c python overhead function-call

我对开销很感兴趣,所以我编写了一个最小的 C 扩展,导出两个函数nop,并且starnop或多或少不执行任何操作。他们只是传递他们的输入(两个相关的函数位于顶部,其余的只是乏味的样板代码):

\n\n

amanmodule.c:

\n\n
#include <Python.h>\n\nstatic PyObject* aman_nop(PyObject *self, PyObject *args)\n{\n  PyObject *obj;\n\n  if (!PyArg_UnpackTuple(args, "arg", 1, 1, &obj))\n    return NULL;\n  Py_INCREF(obj);\n  return obj;\n}\n\nstatic PyObject* aman_starnop(PyObject *self, PyObject *args)\n{\n  Py_INCREF(args);\n  return args;\n}\n\nstatic PyMethodDef AmanMethods[] = {\n  {"nop",  (PyCFunction)aman_nop, METH_VARARGS,\n   PyDoc_STR("nop(arg) -> arg\\n\\nReturn arg unchanged.")},\n  {"starnop", (PyCFunction)aman_starnop, METH_VARARGS,\n   PyDoc_STR("starnop(*args) -> args\\n\\nReturn tuple of args unchanged")},\n  {NULL, NULL}\n};\n\nstatic struct PyModuleDef amanmodule = {\n    PyModuleDef_HEAD_INIT,\n    "aman",\n    "aman - a module about nothing.\\n\\n"\n    "Provides functions \'nop\' and \'starnop\' which do nothing:\\n"\n    "nop(arg) -> arg; starnop(*args) -> args\\n",\n    -1,\n    AmanMethods\n};\n\nPyMODINIT_FUNC\nPyInit_aman(void)\n{\n    return PyModule_Create(&amanmodule);\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

设置.py:

\n\n
from setuptools import setup, extension\n\nsetup(name=\'aman\', version=\'1.0\',\n      ext_modules=[extension.Extension(\'aman\', [\'amanmodule.c\'])],\n      author=\'n.n.\',\n      description="""aman - a module about nothing\n\n      Provides functions \'nop\' and \'starnop\' which do nothing:\n      nop(arg) -> arg; starnop(*args) -> args\n      """,\n      license=\'public domain\',\n      keywords=\'nop pass-through identity\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

接下来,我针对纯 Python 实现和几个几乎不执行任何操作的内置函数对它们进行计时:

\n\n
import numpy as np\nfrom aman import nop, starnop\nfrom timeit import timeit\n\ndef mnsd(x): return \'{:8.6f} \\u00b1 {:8.6f} \\u00b5s\'.format(np.mean(x), np.std(x))\n\ndef pnp(x): x\n\nglobals={}\nfor globals[\'nop\'] in (int, bool, (0).__add__, hash, starnop, nop, pnp, lambda x: x):\n    print(\'{:60s}\'.format(repr(globals[\'nop\'])),\n          mnsd([timeit(\'nop(1)\', globals=globals) for i in range(10)]),\n          \'  \',\n          mnsd([timeit(\'nop(True)\',globals=globals) for i in range(10)]))\n
Run Code Online (Sandbox Code Playgroud)\n\n

第一个问题我没有做一些方法论上落后的事情?

\n\n

10 个区块(每个区块有 1,000,000 个调用)的结果:

\n\n
<class \'int\'>                                                0.099754 \xc2\xb1 0.003917 \xc2\xb5s    0.103933 \xc2\xb1 0.000585 \xc2\xb5s\n<class \'bool\'>                                               0.097711 \xc2\xb1 0.000661 \xc2\xb5s    0.094412 \xc2\xb1 0.000612 \xc2\xb5s\n<method-wrapper \'__add__\' of int object at 0x8c7000>         0.065146 \xc2\xb1 0.000728 \xc2\xb5s    0.064976 \xc2\xb1 0.000605 \xc2\xb5s\n<built-in function hash>                                     0.039546 \xc2\xb1 0.000671 \xc2\xb5s    0.039566 \xc2\xb1 0.000452 \xc2\xb5s\n<built-in function starnop>                                  0.056490 \xc2\xb1 0.000873 \xc2\xb5s    0.056234 \xc2\xb1 0.000181 \xc2\xb5s\n<built-in function nop>                                      0.060094 \xc2\xb1 0.000799 \xc2\xb5s    0.059959 \xc2\xb1 0.000170 \xc2\xb5s\n<function pnp at 0x7fa31c0512f0>                             0.090452 \xc2\xb1 0.001077 \xc2\xb5s    0.098479 \xc2\xb1 0.003314 \xc2\xb5s\n<function <lambda> at 0x7fa31c051378>                        0.086387 \xc2\xb1 0.000817 \xc2\xb5s    0.086536 \xc2\xb1 0.000714 \xc2\xb5s\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在我的实际问题是:即使我的 nops 是用 C 编写的并且不执行任何操作(starnop甚至不解析其参数),内置hash函数始终更快。我知道 int 在 Python 中是它们自己的哈希值,所以hash这里也是一个 nop,但它并不比我的 nop 更,那么为什么速度不同呢?

\n\n

更新:完全忘记了:我在一台非常标准的 x86_64 机器上,linux gcc4.8.5。我使用安装的扩展python3 setup.py install --user

\n

cas*_*evh 4

Python 函数调用中的大部分(大部分?)开销是元args组的创建。参数解析也增加了一些开销。

\n\n

使用调用METH_VARARGS约定的函数定义需要创建一个元组来存储所有参数。如果您只需要一个参数,则可以使用METH_O调用约定。使用 时METH_O,不会创建任何元组。单个参数直接传递。我已nop1在您的示例中添加了使用METH_O.

\n\n

可以使用 定义不需要参数的函数METH_NOARGS。查看nop2尽可能少的开销。

\n\n

使用时METH_VARARGS,可以通过直接解析元args组而不是调用PyArg_UnpackTuple相关PyArg_函数来稍微减少开销。速度稍快一些。看nop3

\n\n

内置hash()函数使用METH_O调用约定。

\n\n

修改amanmodule.c

\n\n
#include <Python.h>\n\nstatic PyObject* aman_nop(PyObject *self, PyObject *args)\n{\n  PyObject *obj;\n\n  if (!PyArg_UnpackTuple(args, "arg", 1, 1, &obj))\n    return NULL;\n  Py_INCREF(obj);\n  return obj;\n}\n\nstatic PyObject* aman_nop1(PyObject *self, PyObject *other)\n{\n  Py_INCREF(other);\n  return other;\n}\n\nstatic PyObject* aman_nop2(PyObject *self)\n{\n  Py_RETURN_NONE;\n}\n\nstatic PyObject* aman_nop3(PyObject *self, PyObject *args)\n{\n  PyObject *obj;\n\n  if (PyTuple_GET_SIZE(args) == 1) {\n    obj = PyTuple_GET_ITEM(args, 0);\n    Py_INCREF(obj);\n    return obj;\n  }\n  else {\n    PyErr_SetString(PyExc_TypeError, "nop3 requires 1 argument");\n    return NULL;\n  }\n}\n\nstatic PyObject* aman_starnop(PyObject *self, PyObject *args)\n{\n  Py_INCREF(args);\n  return args;\n}\n\nstatic PyMethodDef AmanMethods[] = {\n  {"nop",  (PyCFunction)aman_nop, METH_VARARGS,\n   PyDoc_STR("nop(arg) -> arg\\n\\nReturn arg unchanged.")},\n  {"nop1",  (PyCFunction)aman_nop1, METH_O,\n   PyDoc_STR("nop(arg) -> arg\\n\\nReturn arg unchanged.")},\n  {"nop2",  (PyCFunction)aman_nop2, METH_NOARGS,\n   PyDoc_STR("nop(arg) -> arg\\n\\nReturn arg unchanged.")},\n  {"nop3",  (PyCFunction)aman_nop3, METH_VARARGS,\n   PyDoc_STR("nop(arg) -> arg\\n\\nReturn arg unchanged.")},\n  {"starnop", (PyCFunction)aman_starnop, METH_VARARGS,\n   PyDoc_STR("starnop(*args) -> args\\n\\nReturn tuple of args unchanged")},\n  {NULL, NULL}\n};\n\nstatic struct PyModuleDef amanmodule = {\n    PyModuleDef_HEAD_INIT,\n    "aman",\n    "aman - a module about nothing.\\n\\n"\n    "Provides functions \'nop\' and \'starnop\' which do nothing:\\n"\n    "nop(arg) -> arg; starnop(*args) -> args\\n",\n    -1,\n    AmanMethods\n};\n\nPyMODINIT_FUNC\nPyInit_aman(void)\n{\n    return PyModule_Create(&amanmodule);\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

修改后的test.py

\n\n
import numpy as np\nfrom aman import nop, nop1, nop2, nop3, starnop\nfrom timeit import timeit\n\ndef mnsd(x): return \'{:8.6f} \\u00b1 {:8.6f} \\u00b5s\'.format(np.mean(x), np.std(x))\n\ndef pnp(x): x\n\nglobals={}\nfor globals[\'nop\'] in (int, bool, (0).__add__, hash, starnop, nop, nop1, nop3, pnp, lambda x: x):\n    print(\'{:60s}\'.format(repr(globals[\'nop\'])),\n          mnsd([timeit(\'nop(1)\', globals=globals) for i in range(10)]),\n          \'  \',\n          mnsd([timeit(\'nop(True)\',globals=globals) for i in range(10)]))\n\n# To test with no arguments\nfor globals[\'nop\'] in (nop2,):\n    print(\'{:60s}\'.format(repr(globals[\'nop\'])),\n          mnsd([timeit(\'nop()\', globals=globals) for i in range(10)]),\n          \'  \',\n          mnsd([timeit(\'nop()\',globals=globals) for i in range(10)]))\n
Run Code Online (Sandbox Code Playgroud)\n\n

结果

\n\n
$ python3 test.py  \n<class \'int\'>                                                0.080414 \xc2\xb1 0.004360 \xc2\xb5s    0.086166 \xc2\xb1 0.003216 \xc2\xb5s\n<class \'bool\'>                                               0.080501 \xc2\xb1 0.008929 \xc2\xb5s    0.075601 \xc2\xb1 0.000598 \xc2\xb5s\n<method-wrapper \'__add__\' of int object at 0xa6dca0>         0.045652 \xc2\xb1 0.004229 \xc2\xb5s    0.044146 \xc2\xb1 0.000114 \xc2\xb5s\n<built-in function hash>                                     0.035122 \xc2\xb1 0.003317 \xc2\xb5s    0.033419 \xc2\xb1 0.000136 \xc2\xb5s\n<built-in function starnop>                                  0.044056 \xc2\xb1 0.001300 \xc2\xb5s    0.044280 \xc2\xb1 0.001629 \xc2\xb5s\n<built-in function nop>                                      0.047297 \xc2\xb1 0.000777 \xc2\xb5s    0.049536 \xc2\xb1 0.007577 \xc2\xb5s\n<built-in function nop1>                                     0.030402 \xc2\xb1 0.001423 \xc2\xb5s    0.031249 \xc2\xb1 0.002352 \xc2\xb5s\n<built-in function nop3>                                     0.044673 \xc2\xb1 0.004041 \xc2\xb5s    0.042936 \xc2\xb1 0.000177 \xc2\xb5s\n<function pnp at 0x7f946342d840>                             0.071846 \xc2\xb1 0.005377 \xc2\xb5s    0.071085 \xc2\xb1 0.003314 \xc2\xb5s\n<function <lambda> at 0x7f946342d8c8>                        0.066621 \xc2\xb1 0.001499 \xc2\xb5s    0.067163 \xc2\xb1 0.002962 \xc2\xb5s\n<built-in function nop2>                                     0.027736 \xc2\xb1 0.001487 \xc2\xb5s    0.027035 \xc2\xb1 0.000397 \xc2\xb5s\n
Run Code Online (Sandbox Code Playgroud)\n