如何逐行分析cython功能

Question

如何逐行分析cython功能

我经常很难找到cython代码中的瓶颈.如何cython逐行分析功能？

Answer 1

罗伯特布拉德肖帮助我让Robert Kern的line_profiler工具为cdef函数工作,我想我会分享结果stackoverflow.

简而言之,设置一个常规.pyx文件和构建脚本,并在调用之前添加以下内容cythonize.

from Cython.Compiler.Options import directive_defaults

directive_defaults['linetrace'] = True
directive_defaults['binding'] = True

Run Code Online (Sandbox Code Playgroud)

此外,您需要CYTHON_TRACE=1通过修改您的extensions设置来定义C宏

extensions = [
    Extension("test", ["test.pyx"], define_macros=[('CYTHON_TRACE', '1')])
]

Run Code Online (Sandbox Code Playgroud)

%%cython在iPython笔记本中使用魔术的一个工作示例如下:http: //nbviewer.ipython.org/gist/tillahoffmann/296501acea231cbdf5e7

directive_defaults = Cython.Compiler.Options.get_directive_defaults() # 因为“from Cython.Compiler.Options importdirective_defaults”似乎已被弃用 (3认同)
截至 2023 年，这似乎根本不起作用。我制作了一个新笔记本，它与旧笔记本基本相同，尽管更新了 get_directive_defaults() 内容。它全部运行，但实际输出只是“计时器单位：1e-09 s”。还有前进的道路吗？https://nbviewer.org/gist/battaglia01/f138f6b85235a530f7f62f5af5a002f0?flush_cache=true (3认同)
有没有人在不使用笔记本的情况下尝试过这个解决方案？我试过它只是忽略了cythonized代码。此外，如果我尝试用``@profile`` 装饰该函数，我将无法使用返回 ``未声明名称未内置：profile`` 的 disutils 编译文件 (2认同)
请注意最近版本中的更改：https://github.com/cython/cython/issues/1497#issuecomment-256400972 (2认同)
iPython 笔记本中的 @%%cython 魔法使用以下代码，因为“from Cython.Compiler.Options import directive_defaults”已弃用 import Cython directive_defaults = Cython.Compiler.Options.get_directive_defaults() (2认同)

Answer 2

Bar*_*art 7

虽然我不会真的把它剖析,还有运行分析你用Cython代码另一种选择cython用-a(译注),这将创建其中的主要瓶颈中突出显示网页.例如,当我忘记声明一些变量时:

在正确声明它们之后(cdef double dudz, dvdz):

没错,不输入变量会降低代码速度.但是`-a`不会给你任何关于实际运行时的信息,而只是你是否正在进行`python`调用. (5认同)

Answer 3

ead*_*ead 6

虽然@Till的回答显示了使用setup.py-approach分析 Cython 代码的方法，但这个答案是关于 IPython/Jupiter 笔记本中的临时分析，并且或多或少是Cython 文档到 IPython/Jupiter 的“翻译” 。

%prun-魔法：

如果应该使用%prun-magic，那么将 Cython 的编译器指令设置profile为True（这里有来自 Cython 文档的示例）就足够了：

%%cython
# cython: profile=True

def recip_square(i):
    return 1. / i ** 3

def approx_pi(n=10000000):
    val = 0.
    for k in range(1, n + 1):
        val += recip_square(k)
    return (6 * val) ** .5

Run Code Online (Sandbox Code Playgroud)

使用 global 指令（即# cython: profile=True）是比修改全局 Cython 状态更好的方法，因为更改它会导致扩展被重新编译（如果全局 Cython 状态发生更改，则情况并非如此 - 使用旧全局编译的旧缓存版本状态将被重新加载/重用）。

现在

%prun -s cumulative approx_pi(1000000)

Run Code Online (Sandbox Code Playgroud)

产量：

        1000005 function calls in 1.860 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.860    1.860 {built-in method builtins.exec}
        1    0.000    0.000    1.860    1.860 <string>:1(<module>)
        1    0.000    0.000    1.860    1.860 {_cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.approx_pi}
        1    0.612    0.612    1.860    1.860 _cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.pyx:7(approx_pi)
  1000000    1.248    0.000    1.248    0.000 _cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.pyx:4(recip_square)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Run Code Online (Sandbox Code Playgroud)

%lprun-魔法

如果应使用行分析器（即%lprun-magic），则应使用不同的指令编译 Cython 模块：

%%cython
# cython: linetrace=True
# cython: binding=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1
...

Run Code Online (Sandbox Code Playgroud)

linetrace=True触发在生成的 C 代码中创建跟踪并暗示profile=True它不能另外设置。没有binding=Trueline_profiler 就没有必要的代码信息并且CYTHON_TRACE_NOGIL=1是必需的，所以当用 C 编译器编译时也会激活行分析（而不是被 C 预处理器丢弃）。CYTHON_TRACE=1如果 nogil-blocks 不应该在每行基础上进行配置，也可以使用。

现在它可以如下使用，传递函数，这些函数应该通过-f选项进行行分析（用于%lprun?获取有关可能选项的信息）：

%load_ext line_profiler
%lprun -f approx_pi -f recip_square approx_pi(1000000)

Run Code Online (Sandbox Code Playgroud)

产生：

Timer unit: 1e-06 s

Total time: 1.9098 s
File: /XXXX.pyx
Function: recip_square at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def recip_square(i):
     6   1000000    1909802.0      1.9    100.0      return 1. / i ** 2

Total time: 6.54676 s
File: /XXXX.pyx
Function: approx_pi at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           def approx_pi(n=10000000):
     9         1          3.0      3.0      0.0      val = 0.
    10   1000001    1155778.0      1.2     17.7      for k in range(1, n + 1):
    11   1000000    5390972.0      5.4     82.3          val += recip_square(k)
    12         1          9.0      9.0      0.0      return (6 * val) ** .5

Run Code Online (Sandbox Code Playgroud)

line_profiler但是cpdef-function有一个小问题：它没有正确检测到函数体。在此 SO-post 中，显示了一种可能的解决方法。

人们应该知道，与“正常”运行相比，分析（所有在线分析）会改变执行时间及其分布。在这里我们看到，对于相同的功能，根据分析的类型需要不同的时间：

Method (N=10^6):        Running Time:       Build with:
%timeit                 1 second
%prun                   2 seconds           profile=True
%lprun                  6.5 seconds         linetrace=True,binding=True,CYTHON_TRACE_NOGIL=1

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年前
查看次数：	7357 次
最近记录：	6 年，6 月前