Ale*_*ing 8 python memory-leaks numpy
我的一位学生向我展示了以下测试用例,该用例显示了 NumPy 中明显的内存泄漏。我想知道内存分析器是否正确,或者发生了什么。这是测试用例:
from memory_profiler import profile
import numpy as np
import gc
@profile
def test():
arr = np.ones((10000, 6912))
for i in range(2000):
arr[0:75,:] = np.ones((75, 6912))
del arr
gc.collect()
pass
test()
Run Code Online (Sandbox Code Playgroud)
这会产生以下输出:
Filename: test.py
Line # Mem usage Increment Occurences Line Contents
============================================================
5 32.9 MiB 32.9 MiB 1 @profile
6 def test():
7 560.3 MiB 527.4 MiB 1 arr = np.ones((10000, 6912))
8 564.2 MiB 0.0 MiB 2001 for i in range(2000):
9 564.2 MiB 3.9 MiB 2000 arr[0:75,:] = np.ones((75, 6912))
10 37.0 MiB -527.3 MiB 1 del arr
11 37.0 MiB -0.0 MiB 1 gc.collect()
12 37.0 MiB 0.0 MiB 1 pass
Run Code Online (Sandbox Code Playgroud)
看起来该行np.ones((75, 6912))正在慢慢泄漏内存(这里大约 4MB)。如果我们用 just 替换这个表达式1,那么明显的泄漏就会消失。
我已经在 Python 3.8.10 和 3.9.5 上使用 Numpy 版本 1.21.3(撰写本文时最新)和 1.20.3 以及 memory_profiler 版本 0.58.0(撰写本文时最新)进行了测试。我的操作系统是 Ubuntu Linux 20.04 LTS;我的学生在 macOS 上演示了这一点(不确定是哪个版本)。
这是怎么回事?
简短的答案(但尚未给对话添加任何新内容)是,@hpaulj 是对的,每次调用 test() 都不会出现接近 4.1 MiB 的重大泄漏,而且正在发生的情况是,并非所有内存都获得了分配的内容返回给操作系统。原因是基于 python arena 的分配器和 libc malloc 都从操作系统请求一定范围的内存,然后将其分割成更小的区域以满足分配请求。如果给定区域的至少一部分正在使用,则通常无法释放较大的区域。例如,如果尚未释放 python arena 的任何分配,则无法释放该 arena。
您可以对程序进行一些微小的修改,以查看 test() 每次调用不会泄漏 4.1 MiB。例如,假设您将最后一行更改为 2 行:
while True:
test()
Run Code Online (Sandbox Code Playgroud)
如果您随后运行该程序并检查该程序使用的虚拟地址空间(例如,使用 top 或 ps),您将看到该程序使用的虚拟地址空间在第一次运行 test() 后几乎立即停止增加。
即使使用 memory_profiler 提供的指标,您也可以通过更改原始程序使其仅调用 test() 两次来看到这一点,如下所示:
test()
test()
Run Code Online (Sandbox Code Playgroud)
如果您随后运行程序,您将看到报告的增长仅在第一次调用期间发生:
tim@tim-OptiPlex-3020:~$ python3 so3.py 文件名:so3.py
Line # Mem usage Increment Occurences Line Contents
============================================================
5 32.9 MiB 32.9 MiB 1 @profile
6 def test():
7 560.0 MiB 527.1 MiB 1 arr = np.ones((10000, 6912))
8 564.1 MiB 0.0 MiB 2001 for i in range(2000):
9 564.1 MiB 4.1 MiB 2000 arr[0:75,:] = np.ones((75, 6912))
10 36.9 MiB -527.3 MiB 1 del arr
11 36.8 MiB -0.0 MiB 1 gc.collect()
12 36.8 MiB 0.0 MiB 1 pass
Filename: so3.py
Line # Mem usage Increment Occurences Line Contents
============================================================
5 36.8 MiB 36.8 MiB 1 @profile
6 def test():
7 564.1 MiB 527.3 MiB 1 arr = np.ones((10000, 6912))
8 564.1 MiB 0.0 MiB 2001 for i in range(2000):
9 564.1 MiB 0.0 MiB 2000 arr[0:75,:] = np.ones((75, 6912))
10 36.8 MiB -527.3 MiB 1 del arr
11 36.8 MiB 0.0 MiB 1 gc.collect()
12 36.8 MiB 0.0 MiB 1 pass
Run Code Online (Sandbox Code Playgroud)
因此,您可能会问下一个问题,为什么在第一次调用test()期间内存会增长,但在第二次调用期间显然不会增长。为了回答这个问题,我们可以使用https://github.com/vmware/chap,它是开源的,可以由学生在 Linux 上编译。
作为输入,chap 通常只需要一个 core 文件。在这种特殊情况下,我们至少需要 2 个核心文件,因为我们想知道在第一次调用 test() 期间进行了哪些分配但从未释放。
为此,我们可以将程序修改为在测试调用之间休眠,以便我们有时间收集核心文件。经过这个小小的修改,修改后的程序如下所示:
from time import sleep
from memory_profiler import profile
import numpy as np
import gc
@profile
def test():
arr = np.ones((10000, 6912))
for i in range(2000):
arr[0:75,:] = np.ones((75, 6912))
del arr
gc.collect()
pass
print('sleep before first test()')
sleep(120)
test()
print('sleep before second test()')
sleep(120)
test()
print('sleep after second test()')
sleep(120)
Run Code Online (Sandbox Code Playgroud)
通过这些修改,我们可以在后台运行程序,并在第一次调用 test() 之前收集一个核心,在第二次调用 test() 之前收集一个核心,在第三次调用 test() 之前收集一个核心。
首先,作为管理细节,我们将 shell 使用的 coredump_filter 设置为 0x37,这样当我们运行 python 时,进程将继承这个 coredump_filter 值,这样当我们创建核心时,它们将拥有有关文件支持内存的信息。
tim@tim-OptiPlex-3020:~$ cat /proc/self/coredump_filter
00000033
tim@tim-OptiPlex-3020:~$ echo 0x37 >/proc/self/coredump_filter
tim@tim-OptiPlex-3020:~$ cat /proc/self/coredump_filter
00000037
Run Code Online (Sandbox Code Playgroud)
现在我们准备在后台启动程序并在程序执行第一次 sleep() 时收集第一个核心。
tim@tim-OptiPlex-3020:~$ python3 so4.py &
[2] 125315
tim@tim-OptiPlex-3020:~$ sleep before first test()
sudo gcore -o beforeFirst 125315
[sudo] password for tim:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile beforeFirst.125315
[Inferior 1 (process 125315) detached]
Run Code Online (Sandbox Code Playgroud)
然后我们等到第一次调用test()完成并在程序执行第二次sleep()时收集另一个核心。
sleep before second test()
sudo gcore -o beforeSecond 125315
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile beforeSecond.125315
[Inferior 1 (process 125315) detached]
Run Code Online (Sandbox Code Playgroud)
然后,我们等待第二次调用 test() 完成并收集第三个核心,同时程序执行第三次sleep()。
sleep after second test()
sudo gcore -o afterSecond 125315
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile afterSecond.125315
[Inferior 1 (process 125315) detached]
Run Code Online (Sandbox Code Playgroud)
现在我们准备使用 chap 来分析核心,它以核心文件作为输入。就进程大小而言,最有趣的内存是可写范围,我们可以通过在第一次调用 test() 之前在核心上使用 chap 来获取有关此的一些详细信息。
tim@tim-OptiPlex-3020:~$ chap beforeFirst.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x51c000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8a31000 (144,904,192) bytes.
Run Code Online (Sandbox Code Playgroud)
请注意上面带有“python arena”的行。该分配器与 python 基于 arena 的分配器相关。另请注意带有“libc malloc main arena pages”和“libc malloc mmapped allocation”的行。毫不奇怪,这些与 libc malloc 相关,本机库和某些情况下 python 都使用它,例如当分配超过一定大小时。
正如我之前提到的,这些大范围用于分配小分配。我们可以获得已用分配(尚未释放的分配)和空闲分配(占用尚未返还给操作系统并可用于将来分配的空间)的计数。
chap> count used
114423 allocations use 0xf4df58 (16,047,960) bytes.
chap> count free
730 allocations use 0x5fb30 (391,984) bytes.
Run Code Online (Sandbox Code Playgroud)
现在我们可以在第二个核心上使用 chap 中相同的 3 个命令进行比较。我们看到的是,增长全部出现在“libc malloc main arena page”使用的总结范围内,从 0x51c000 字节增长到 0x926000 字节,即略多于 4 MiB。
tim@tim-OptiPlex-3020:~$ chap beforeSecond.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x926000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8e3b000 (149,139,456) bytes.
Run Code Online (Sandbox Code Playgroud)
如果进一步深入,我们可以看到已用分配增长了略小于 100,000 字节,而可用分配增长了约 4 MiB。
chap> count used
114686 allocations use 0xf64ac8 (16,141,000) bytes.
chap> count free
1312 allocations use 0x4522e8 (4,530,920) bytes.
chap>
Run Code Online (Sandbox Code Playgroud)
这基本上证明了 @hpaulj 的理论,除了在第一次运行 test() 期间使用的分配有一点增长。了解这一点可能会很有趣,但现在我只想指出大部分增长是由免费分配解释的。这还不错,因为这些内存区域可以重用。
因此,现在我们检查第二次运行test()期间发生的情况,可以看到进程没有变大,但多了一个已使用的分配,并且用于自由分配的内存略有减少。
tim@tim-OptiPlex-3020:~$ chap afterSecond.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x926000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8e3b000 (149,139,456) bytes.
chap> count used
114687 allocations use 0xf64ca8 (16,141,480) bytes.
chap> count free
1249 allocations use 0x452148 (4,530,504) bytes.
chap>
Run Code Online (Sandbox Code Playgroud)
因此,第二次运行test()使用了第一次运行后空闲的分配,然后在不需要时再次释放大部分分配。这按预期工作。
人们可能仍然要求解释第一次调用test()后额外使用的分配以及第二次调用test()后额外使用的分配。可以使用现有的核心文件来做到这一点,但我将在此停止,因为这需要更多时间,并且我已经展示了以下内容:
| 归档时间: |
|
| 查看次数: |
478 次 |
| 最近记录: |