Python subprocess.Popen"OSError:[Errno 12]无法分配内存"

Dav*_*idM 108 python linux memory

注意:这个问题最初是在这里被问到的,但即使没有找到可接受的答案,赏金时间也已过期.我正在重新询问这个问题,包括原始问题中提供的所有细节.

python脚本使用sched模块每60秒运行一组类函数:

# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))
Run Code Online (Sandbox Code Playgroud)

该脚本使用此处的代码作为守护进程运行.

作为doChecks的一部分调用的许多类方法使用进程模块来调用系统函数以获取系统统计信息:

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
Run Code Online (Sandbox Code Playgroud)

这可以在整个脚本崩溃之前运行一段时间,并出现以下错误:

File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory
Run Code Online (Sandbox Code Playgroud)

脚本崩溃后,服务器上的free -m输出为:

$ free -m
                  total       used       free     shared     buffers    cached
Mem:                894        345        549          0          0          0
-/+ buffers/cache:  345        549
Swap:                 0          0          0
Run Code Online (Sandbox Code Playgroud)

服务器正在运行CentOS 5.3.我无法在我自己的CentOS盒子上复制,也无法在报告相同问题的任何其他用户上复制.

我按照原始问题中的建议尝试了许多调试方法:

  1. 在Popen调用之前和之后记录free -m的输出.内存使用率没有显着变化,即脚本运行时内存不会逐渐耗尽.

  2. 我将close_fds = True添加到Popen调用中,但这没有任何区别 - 脚本仍然因同样的错误而崩溃.建议在这里这里.

  3. 我查了一下这表明(-1,-1)两个RLIMIT_DATA和RLIMIT_AS作为建议rlimits 这里.

  4. 文章建议没有交换空间可能是原因,但交换的需求实际可用的(根据虚拟主机),这也被认为是一个假的原因在这里.

  5. 该过程被关闭,因为这是使用.communicate的行为()由Python源代码和注释备份这里.

可以在GitHub上找到整个检查,其中使用从第442行定义的getProcesses函数.这由doChecks()从第520行开始调用.

在崩溃之前,脚本使用strace运行以下输出:

recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4)                                = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5])                            = 0
pipe([6, 7])                            = 0
fcntl64(7, F_GETFD)                     = 0
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
write(2, "    ", 4)                     = 4
write(2, "void = action(*argument)\n", 25) = 25
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "errread, errwrite)\n", 19)    = 19
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
read(8, "table(self, handle):\n           "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None\n         "..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "self.pid = os.fork()\n", 21)  = 21
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
write(2, "OSError", 7)                  = 7
write(2, ": ", 2)                       = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "\n", 1)                       = 1
unlink("/var/run/sd-agent.pid")         = 0
close(3)                                = 0
munmap(0xb7e0d000, 4096)                = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000)                          = 0xa022000
exit_group(1)                           = ?
Run Code Online (Sandbox Code Playgroud)

vla*_*adr 81

作为一般规则(即香草内核),fork/ clone有故障ENOMEM 发生的具体的,因为无论一个诚实的神了内存不足的条件(dup_mm,dup_task_struct,alloc_pid,mpol_dup,mm_init等呱呱叫),或者是因为security_vm_enough_memory_mm你失望实施过载策略.

首先检查在fork尝试时未能分叉的进程的vmsize,然后比较与overcommit策略相关的可用内存量(物理和交换)(插入数字).

在您的特定情况下,请注意Virtuozzo 在过度使用执行中额外的检查.而且,我不确定你容器内,交换和过度使用配置中有多少控制权(为了影响执行的结果).

现在,为了实际向前迈进,我会说你有两种选择:

  • 切换到更大的实例,或
  • 加入一些编码工作来更有效地控制脚本的内存占用

请注意,如果事实证明它不是你,那么编码工作可能一无所获,但是其他一些人在运行amock的同一服务器上的不同实例中并置.

在记忆方面,我们已经知道subprocess.Popen使用fork/ clone 引擎盖,这意味着每次你调用它时,你再次请求的内存就像Python已经消耗的那样多,即在数百个额外的MB中,所有这些都是为了那么exec一个微不足道的10kB可执行文件,如freeps.在不利的过度使用政策的情况下,你很快就会看到ENOMEM.

替代方案fork没有这个父页面表等.复制问题是vforkposix_spawn.但是,如果你不喜欢重写块subprocess.Popen来讲vfork/ posix_spawn,可以考虑使用suprocess.Popen只有一次,在你的脚本开始时(Python的内存占用是最小的),以产生一个shell脚本,然后运行free/ ps/ sleep和其他任何一个循环并与您的脚本; 轮询脚本的输出或同步读取它,可能来自一个单独的线程,如果你有其他东西需要异步处理 - 在Python中进行数据处理,但将分支留给下级进程.

但是,在您的特定情况下,您可以跳过调用psfree完全; 无论您是选择自己访问还是通过现有的库和/或包,都可以直接从Python中获取procfs这些信息.如果和是唯一的公用事业你正在运行,那么你就可以弄死完全.psfreesubprocess.Popen

最后,无论你做什么subprocess.Popen,如果你的脚本泄漏内存,你最终仍然会碰壁.留意它,并检查内存泄漏.

  • 我发现在`subprocess.Popen`之前运行`gc.collect()`有助于垃圾收集器运行一段时间的情况. (7认同)

Nim*_*ima 17

看看free -m它的输出在我看来你实际上没有可用的交换内存.我不确定在Linux中是否总是可以按需自动提供交换,但我遇到了同样的问题,这里没有任何答案对我有帮助.然而,添加一些交换内存,解决了我的问题,因为这可能有助于其他人面临同样的问题,我发布我的答案如何添加1GB交换(在Ubuntu 12.04上,但它应该适用于其他发行版.)

您可以先检查是否启用了交换内存.

$sudo swapon -s
Run Code Online (Sandbox Code Playgroud)

如果它为空,则表示您没有启用任何交换.要添加1GB交换:

$sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$sudo mkswap /swapfile
$sudo swapon /swapfile
Run Code Online (Sandbox Code Playgroud)

添加以下行以fstab使交换永久化.

$sudo vim /etc/fstab

     /swapfile       none    swap    sw      0       0 
Run Code Online (Sandbox Code Playgroud)

可以在此处找到来源和更多信息.


pil*_*row 8

交换可能不是之前建议的红鲱鱼.在之前的python进程有多大ENOMEM

在内核2.6下,/proc/sys/vm/swappiness控制内核转向交换的积极程度,并记录overcommit*内核可以通过眨眼和点头分配内存的程度和精确程度.就像你的Facebook关系状态一样,它很复杂.

...但交换实际上是按需提供的(根据网络主机)...

但不是根据您的free(1)命令输出,它显示您的服务器实例无法识别交换空间.现在,您的Web主机当然可能比我更了解这个主题,但我使用的虚拟RHEL/CentOS系统已报告可用于客户操作系统的交换.

改编Red Hat KB第15252条:

只要匿名内存和系统V共享内存的总和小于RAM量的3/4,Red Hat Enterprise Linux 5系统就可以正常运行,没有交换空间.....具有4GB或更少RAM的系统 [建议具有]至少2GB的交换空间.

将您的/proc/sys/vm设置与普通的CentOS 5.3安装进行比较.添加交换文件.拉下来swappiness,看看你是否还活着.


Jim*_*nis 5

我继续怀疑你的客户/用户有一些内核模块或驱动程序加载干扰clone()系统调用(可能是一些模糊的安全增强,像LIDS但更加模糊?)或者某种方式填补了一些内核数据结构是fork()/ clone()操作所必需的(进程表,页表,文件描述符表等).

这是fork(2)手册页的相关部分:

ERRORS
       EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task  structure  for  the
              child.

       EAGAIN It  was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered.  To
              exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.

       ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.

我建议让用户在启动到库存,通用内核之后尝试这个,并且只加载一组最小的模块和驱动程序(运行应用程序/脚本所需的最少).从那里开始,假设它在该配置中工作,他们可以在该配置和显示该问题的配置之间执行二进制搜索.这是标准的系统管理员故障排除101.

您的相关行strace是:

clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
Run Code Online (Sandbox Code Playgroud)

...我知道其他人已经讨论过交换和内存可用性(我建议你至少设置一个小的交换分区,具有讽刺意味的是,即使它在RAM磁盘上......代码路径通过Linux内核,当它有甚至可用的一小部分交换也比那些交换零交换的那些(异常处理路径)更广泛地运用.

但我怀疑这仍然是一个红鲱鱼.

free报告缓存和缓冲区使用的0(ZERO)内存的事实是非常令人不安的.我怀疑free输出......以及可能是你的应用程序问题,是由一些专有内核模块引起的,它以某种方式干扰了内存分配.

根据fork()/ clone()的手册页,如果你的调用会导致资源限制违规(RLIMIT_NPROC),fork()系统调用应该返回EAGAIN ...但是,它没有说是否要返回EAGAIN其他RLIMIT*违规行为.在任何情况下,如果您的目标/主机具有某种奇怪的Vormetric或其他安全设置(或者即使您的进程在某些奇怪的SELinux策略下运行),那么它可能会导致此-ENOMEM失败.

它几乎不可能成为普通的普通Linux/UNIX问题.你有一些非标准的东西在那里.


ser*_*inc 5

为了轻松解决,您可以

echo 1 > /proc/sys/vm/overcommit_memory
Run Code Online (Sandbox Code Playgroud)

如果您确定系统有足够的内存。参见Linux以上的提交启发式