Roo*_*oke 43 c linux gcc swap ld
有时用一个使用大块静态内存的小程序来模拟一些东西很方便.我注意到在改用Fedora 15之后程序需要很长时间才能编译.我们说30秒对0.1秒.更奇怪的是ld(链接器)正在最大化CPU并慢慢开始吃掉所有可用的内存.经过一番摆弄后,我设法找到了这个新问题与我的交换文件大小之间的关联.以下是用于本讨论的示例程序:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#define M 1000000
#define GIANT_SIZE (200*M)
size_t g_arr[GIANT_SIZE];
int main( int argc, char **argv){
int i;
for(i = 0; i<10; i++){
printf("This should be zero: %d\n",g_arr[i]);
}
exit(1);
}
Run Code Online (Sandbox Code Playgroud)
该程序有一个巨大的数组,其声明大小约为200*8MB = 1.6GB的静态内存.编译此程序需要花费大量时间:
[me@bleh]$ time gcc HugeTest.c
real 0m12.954s
user 0m6.995s
sys 0m3.890s
[me@bleh]$
Run Code Online (Sandbox Code Playgroud)
13s对于~13行C程序!?那是不对的.键号是静态存储空间的大小.一旦它大于总交换空间,它就会再次开始快速编译.例如,我有5.3GB的交换空间,因此将GIANT_SIZE更改为(1000*M)会给出以下时间:
[me@bleh]$ time gcc HugeTest.c
real 0m0.087s
user 0m0.026s
sys 0m0.027s
Run Code Online (Sandbox Code Playgroud)
啊,那更像是它!为了进一步说服自己(和你自己,如果你在家里尝试这个)交换空间确实是神奇的数字,我尝试将可用的交换空间更改为真正庞大的19GB,并尝试再次编译(1000*M)版本:
[me@bleh]$ ls -ali /extraswap
5986 -rw-r--r-- 1 root root 14680064000 Jul 26 15:01 /extraswap
[me@bleh]$ sudo swapon /extraswap
[me@bleh]$ time gcc HugeTest.c
real 4m28.089s
user 0m0.016s
sys 0m0.010s
Run Code Online (Sandbox Code Playgroud)
它在4.5分钟后甚至没有完成!
显然链接器在这里做错了,但我不知道如何解决这个问题,除了重写程序或搞乱交换空间.我想知道是否有解决方案,或者我偶然发现了一些神秘的bug.
顺便说一下,程序都可以正确编译和运行,独立于所有交换业务.
作为参考,这里有一些可能相关的信息:
[]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 27027
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[]$ uname -r
2.6.40.6-0.fc15.x86_64
[]$ ld --version
GNU ld version 2.21.51.0.6-6.fc15 20110118
Copyright 2011 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
[]$ gcc --version
gcc (GCC) 4.6.1 20110908 (Red Hat 4.6.1-9)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[]$ cat /proc/meminfo
MemTotal: 3478272 kB
MemFree: 1749388 kB
Buffers: 16680 kB
Cached: 212028 kB
SwapCached: 368056 kB
Active: 489688 kB
Inactive: 942820 kB
Active(anon): 401340 kB
Inactive(anon): 803436 kB
Active(file): 88348 kB
Inactive(file): 139384 kB
Unevictable: 32 kB
Mlocked: 32 kB
SwapTotal: 19906552 kB
SwapFree: 17505120 kB
Dirty: 172 kB
Writeback: 0 kB
AnonPages: 914972 kB
Mapped: 60916 kB
Shmem: 1008 kB
Slab: 55248 kB
SReclaimable: 26720 kB
SUnreclaim: 28528 kB
KernelStack: 3608 kB
PageTables: 63344 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 21645688 kB
Committed_AS: 11208980 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 139336 kB
VmallocChunk: 34359520516 kB
HardwareCorrupted: 0 kB
AnonHugePages: 151552 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 730752 kB
DirectMap2M: 2807808 kB
Run Code Online (Sandbox Code Playgroud)
TL; DR:当ac程序的(大)静态存储器略小于可用的交换空间时,链接器将永远链接该程序.然而,这是相当活泼当静态空间稍微大超过可用交换空间.那是怎么回事!?
Soa*_*Box 25
我能够在Ubuntu 10.10系统(GNU ld (GNU Binutils for Ubuntu) 2.20.51-system.20100908
)上重现这个,我想我有你的答案.首先,一些方法论.
在确认这一点发生在一个小VM(512MB内存,2GB交换)之后,从这里我决定最简单的事情就是扫视gcc,看看当一切都变成地狱时到底发生了什么:
~# strace -f gcc swap.c
Run Code Online (Sandbox Code Playgroud)
它阐明了以下内容:
vfork() = 3589
[pid 3589] execve("/usr/lib/gcc/x86_64-linux-gnu/4.4.5/collect2", ["/usr/lib/gcc/x86_64-linux-gnu/4."..., "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "--hash-style=gnu", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "-o", "swap", "-z", "relro", "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "-L/usr/lib/gcc/x86_64-linux-gnu/"..., ...], [/* 26 vars */]) = 0
...
[pid 3589] vfork() = 3590
...
[pid 3590] execve("/usr/bin/ld", ["/usr/bin/ld", "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "--hash-style=gnu", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "-o", "swap", "-z", "relro", "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "-L/usr/lib/gcc/x86_64-linux-gnu/"..., ...], [/* 27 vars */]) = 0
...
[pid 3590] lseek(13, 4096, SEEK_SET) = 4096
[pid 3590] read(13, ".\4@\0\0\0\0\0>\4@\0\0\0\0\0N\4@\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 3590] mmap(NULL, 1600004096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1771931000
<system comes to screeching halt>
Run Code Online (Sandbox Code Playgroud)
看起来,正如我们可能怀疑的那样,看起来ld
实际上是在试图匿名mmap
整个数组的整个静态内存空间(或者可能是整个程序,因为程序的其余部分非常小,所以很难说一切都适合额外的4096).
所以这一切都很好,但是为什么当我们超过系统上的可用交换时它会起作用?让我们把swapoff
和运行strace -f
再次...
[pid 3618] lseek(13, 4096, SEEK_SET) = 4096
[pid 3618] read(13, ".\4@\0\0\0\0\0>\4@\0\0\0\0\0N\4@\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 3618] mmap(NULL, 1600004096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 3618] brk(0x60638000) = 0x1046000
[pid 3618] mmap(NULL, 1600135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 3618] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fd011864000
...
Run Code Online (Sandbox Code Playgroud)
不出所料,ld似乎做了上次尝试的同样的事情,以整个空间mmap.但系统不再能够做到这一点,它失败了!ld再次尝试,它再次失败,然后ld做了一些意想不到的事情......它以更少的内存继续前进.
奇怪,我想我们最好看看的ld
代码即可.Drat,它没有明确表达mmap
.这必须来自一个普通的老人malloc
.我们必须使用一些调试符号构建ld来跟踪它.不幸的是,当我构建bin-utils 2.21.1时,问题就消失了.Perhap它已在较新版本的bin-utils中得到修复?