如何找出我的代码中哪些部分在Python中效率低下

Question

如何找出我的代码中哪些部分在Python中效率低下

Rul*_*rld 16 python performance profiling processing-efficiency python-3.x

在上一个问题中,我问过多处理,使用多个内核使程序运行得更快,有人告诉我:

通常情况下,使用更好的代码可以获得100x +优化,而使用多处理可以获得4倍的改进和额外的复杂性

然后他们建议我应该:

使用分析器来了解什么是慢,然后专注于优化.

所以我回答了这个问题:你如何描述一个脚本？

在这里,我发现cProfile并将其实现到一些测试代码中,以了解它是如何工作的.

这是我的代码:

import cProfile

def foo():
    for i in range(10000):
        a = i**i
        if i % 1000 == 0:
            print(i)

cProfile.run('foo()')

Run Code Online (Sandbox Code Playgroud)

然而,在运行之后,这就是我得到的:

0
1000
2000
3000
4000
5000
6000
7000
8000
9000
         1018 function calls in 20.773 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   20.773   20.773 <string>:1(<module>)
      147    0.000    0.000    0.000    0.000 rpc.py:150(debug)
       21    0.000    0.000    0.050    0.002 rpc.py:213(remotecall)
       21    0.000    0.000    0.002    0.000 rpc.py:223(asynccall)
       21    0.000    0.000    0.048    0.002 rpc.py:243(asyncreturn)
       21    0.000    0.000    0.000    0.000 rpc.py:249(decoderesponse)
       21    0.000    0.000    0.048    0.002 rpc.py:287(getresponse)
       21    0.000    0.000    0.000    0.000 rpc.py:295(_proxify)
       21    0.001    0.000    0.048    0.002 rpc.py:303(_getresponse)
       21    0.000    0.000    0.000    0.000 rpc.py:325(newseq)
       21    0.000    0.000    0.002    0.000 rpc.py:329(putmessage)
       21    0.000    0.000    0.000    0.000 rpc.py:55(dumps)
       20    0.000    0.000    0.001    0.000 rpc.py:556(__getattr__)
        1    0.000    0.000    0.001    0.001 rpc.py:574(__getmethods)
       20    0.000    0.000    0.000    0.000 rpc.py:598(__init__)
       20    0.000    0.000    0.050    0.002 rpc.py:603(__call__)
       20    0.000    0.000    0.051    0.003 run.py:340(write)
        1   20.722   20.722   20.773   20.773 test.py:3(foo)
       42    0.000    0.000    0.000    0.000 threading.py:1226(current_thread)
       21    0.000    0.000    0.000    0.000 threading.py:215(__init__)
       21    0.000    0.000    0.047    0.002 threading.py:263(wait)
       21    0.000    0.000    0.000    0.000 threading.py:74(RLock)
       21    0.000    0.000    0.000    0.000 {built-in method _struct.pack}
       21    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
       42    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}
        1    0.000    0.000   20.773   20.773 {built-in method builtins.exec}
       42    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
       63    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       10    0.000    0.000    0.051    0.005 {built-in method builtins.print}
       21    0.000    0.000    0.000    0.000 {built-in method select.select}
       21    0.000    0.000    0.000    0.000 {method '_acquire_restore' of '_thread.RLock' objects}
       21    0.000    0.000    0.000    0.000 {method '_is_owned' of '_thread.RLock' objects}
       21    0.000    0.000    0.000    0.000 {method '_release_save' of '_thread.RLock' objects}
       21    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.RLock' objects}
       42    0.047    0.001    0.047    0.001 {method 'acquire' of '_thread.lock' objects}
       21    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       21    0.000    0.000    0.000    0.000 {method 'dump' of '_pickle.Pickler' objects}
       20    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       21    0.000    0.000    0.000    0.000 {method 'getvalue' of '_io.BytesIO' objects}
       21    0.000    0.000    0.000    0.000 {method 'release' of '_thread.RLock' objects}
       21    0.001    0.000    0.001    0.000 {method 'send' of '_socket.socket' objects}

Run Code Online (Sandbox Code Playgroud)

我期待它能告诉我我的代码中哪些部分花费的时间最长,例如它表明a = i**i计算时间最长,但是我能从它所告诉我的内容中收集的是foo()函数花费的时间最长,然而,在数据中我不知道该功能中最长的内容.

此外,当我将其实现到我的实际代码中时,它也会做同样的事情.一切都在函数中,它只告诉我哪些函数花费的时间最长而不是函数花了这么长时间.

所以这是我的主要问题:

如何查看函数内部的代码使代码需要这么长时间(我应该使用cProfile吗？)
一旦我知道什么是使用最多的CPU,什么是设置优化我的代码的最佳方法

注意:我的RAM和磁盘等绝对没问题,它只是最大化的CPU(CPU占12%,因为它只在单核上运行)

Answer 1

MSe*_*ert 17

我如何看到函数内部使代码需要这么长时间(我应该使用cProfile吗？)

是的,你可以使用,cProfile但你问这个问题的方式让我想知道line_profiler(第三方模块,你需要安装它)是不是一个更好的工具.

当我想要分析一个函数时,我正在使用这个包的IPython/Jupyter绑定:

%load_ext line_profiler

Run Code Online (Sandbox Code Playgroud)

要实际配置功能:

%lprun -f foo foo()
#             ^^^^^---- this call will be profiled
#         ^^^-----------function to profile

Run Code Online (Sandbox Code Playgroud)

哪个产生这个输出:

Timer unit: 5.58547e-07 s

Total time: 17.1189 s
File: <ipython-input-1-21b5a5f52f66>
Function: foo at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def foo():
     2     10001        31906      3.2      0.1      for i in range(10000):
     3     10000     30534065   3053.4     99.6          a = i**i
     4     10000        75998      7.6      0.2          if i % 1000 == 0:
     5        10         6953    695.3      0.0              print(i)

Run Code Online (Sandbox Code Playgroud)

这包括一些可能有趣的事情.例如99.6%,i**i在线上花费的时间.

一旦我知道什么是使用最多的CPU,什么是设置优化我的代码的最佳方法

那要看.有时您需要使用不同的功能/数据结构/算法 - 有时您无法做任何事情.但至少你知道你的瓶颈在哪里,你可以估计瓶颈或其他地方的变化会产生多大的影响.

line_profiler不依赖Ipython或Jupyter。您可以在要分析的函数上使用@profile装饰器，将kernprof.py和line_profiler.py文件复制/粘贴到脚本所在的文件夹中，然后运行python kernprof.py -l- v my_script.py> output_file.txt`。这将输出一个定时文本文件。没有复制/粘贴源文件，我还没有成功完成此操作，但是将它们复制过来很简单。 (3认同)

归档时间：	8 年，2 月前
查看次数：	5564 次
最近记录：	8 年，2 月前