为什么x86-64 GCC函数序言分配的堆栈少于局部变量?

css*_*233 11 assembly stack gcc x86-64

考虑以下简单程序:

int main(int argc, char **argv)
{
        char buffer[256];

        buffer[0] = 0x41;
        buffer[128] = 0x41;
        buffer[255] = 0x41;

        return 0;
}
Run Code Online (Sandbox Code Playgroud)

在x86-64机器上使用GCC 4.7.0编译.用GDB反汇编main()给出:

0x00000000004004cc <+0>:     push   rbp
0x00000000004004cd <+1>:     mov    rbp,rsp
0x00000000004004d0 <+4>:     sub    rsp,0x98
0x00000000004004d7 <+11>:    mov    DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>:    mov    QWORD PTR [rbp-0x110],rsi
0x00000000004004e4 <+24>:    mov    BYTE PTR [rbp-0x100],0x41
0x00000000004004eb <+31>:    mov    BYTE PTR [rbp-0x80],0x41
0x00000000004004ef <+35>:    mov    BYTE PTR [rbp-0x1],0x41
0x00000000004004f3 <+39>:    mov    eax,0x0
0x00000000004004f8 <+44>:    leave  
0x00000000004004f9 <+45>:    ret    
Run Code Online (Sandbox Code Playgroud)

当缓冲区为256字节时,为什么sub rsp只有0x98 = 152d?当我将数据移入缓冲区[0]时,它似乎只是使用分配的堆栈帧之外的数据并使用rbp来引用,那么甚至sub rsp的点是什么,0x98?

另一个问题,这些线路做了什么?

0x00000000004004d7 <+11>:    mov    DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>:    mov    QWORD PTR [rbp-0x110],rsi
Run Code Online (Sandbox Code Playgroud)

为什么需要保存EDI而不是RDI?我看到它将它移动到C代码中分配的缓冲区的最大范围之外.同样令人感兴趣的是为什么两个变量之间的差异如此之大.由于EDI只有4个字节,为什么两个变量需要12个字节的间隔?

Mat*_*ery 21

The x86-64 ABI used by Linux (and some other OSes, although notably not Windows, which has its own different ABI) defines a "red zone" of 128 bytes below the stack pointer, which is guaranteed not to be touched by signal or interrupt handlers. (See figure 3.3 and §3.2.2.)

A leaf function (i.e. one which does not call anything else) may therefore use this area for whatever it wants - it isn't doing anything like a call which would place data at the stack pointer; and any signal or interrupt handler will follow the ABI and drop the stack pointer by at least an additional 128 bytes before storing anything.

(Shorter instruction encodings are available for signed 8-bit displacements, so the point of the red zone is that it increases the amount of local data that a leaf function can access using these shorter instructions.)

That's what's happening here.

But... this code isn't making use of those shorter encodings (it's using offsets from rbp rather than rsp). Why not? It's also saving edi and rsi completely unnecessarily - you ask why it's saving edi instead of rdi, but why is it saving it at all?

The answer is that the compiler is generating really crummy code, because no optimisations are enabled. If you enable any optimisation, your entire function is likely to collapse down to:

mov eax, 0
ret
Run Code Online (Sandbox Code Playgroud)

because that's really all it needs to do: buffer[] is local, so the changes made to it will never be visible to anything else, so can be optimised away; beyond that, all the function needs to do is return 0.


所以,这是一个更好的例子.这个函数是完全无意义的,但是使用了类似的数组,同时做了足够的事情来确保事情不会全部被优化掉:

$ cat test.c
int foo(char *bar)
{
    char tmp[256];
    int i;

    for (i = 0; bar[i] != 0; i++)
      tmp[i] = bar[i] + i;

    return tmp[1] + tmp[200];
}
Run Code Online (Sandbox Code Playgroud)

通过一些优化编译,你可以看到红色区域的类似用法,除了这次它真的使用偏移rsp:

$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   53                      push   rbx
   1:   48 81 ec 88 00 00 00    sub    rsp,0x88
   8:   0f b6 17                movzx  edx,BYTE PTR [rdi]
   b:   84 d2                   test   dl,dl
   d:   74 26                   je     35 <foo+0x35>
   f:   4c 8d 44 24 88          lea    r8,[rsp-0x78]
  14:   48 8d 4f 01             lea    rcx,[rdi+0x1]
  18:   4c 89 c0                mov    rax,r8
  1b:   89 c3                   mov    ebx,eax
  1d:   44 28 c3                sub    bl,r8b
  20:   89 de                   mov    esi,ebx
  22:   01 f2                   add    edx,esi
  24:   88 10                   mov    BYTE PTR [rax],dl
  26:   0f b6 11                movzx  edx,BYTE PTR [rcx]
  29:   48 83 c0 01             add    rax,0x1
  2d:   48 83 c1 01             add    rcx,0x1
  31:   84 d2                   test   dl,dl
  33:   75 e6                   jne    1b <foo+0x1b>
  35:   0f be 54 24 50          movsx  edx,BYTE PTR [rsp+0x50]
  3a:   0f be 44 24 89          movsx  eax,BYTE PTR [rsp-0x77]
  3f:   8d 04 02                lea    eax,[rdx+rax*1]
  42:   48 81 c4 88 00 00 00    add    rsp,0x88
  49:   5b                      pop    rbx
  4a:   c3                      ret    
Run Code Online (Sandbox Code Playgroud)

现在让我们通过插入对另一个函数的调用来稍微调整一下,这样foo()就不再是叶子函数了:

$ cat test.c
extern void dummy(void);  /* ADDED */

int foo(char *bar)
{
    char tmp[256];
    int i;

    for (i = 0; bar[i] != 0; i++)
      tmp[i] = bar[i] + i;

    dummy();  /* ADDED */

    return tmp[1] + tmp[200];
}
Run Code Online (Sandbox Code Playgroud)

现在无法使用红色区域,因此您会看到更像您原先预期的内容:

$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   53                      push   rbx
   1:   48 81 ec 00 01 00 00    sub    rsp,0x100
   8:   0f b6 17                movzx  edx,BYTE PTR [rdi]
   b:   84 d2                   test   dl,dl
   d:   74 24                   je     33 <foo+0x33>
   f:   49 89 e0                mov    r8,rsp
  12:   48 8d 4f 01             lea    rcx,[rdi+0x1]
  16:   48 89 e0                mov    rax,rsp
  19:   89 c3                   mov    ebx,eax
  1b:   44 28 c3                sub    bl,r8b
  1e:   89 de                   mov    esi,ebx
  20:   01 f2                   add    edx,esi
  22:   88 10                   mov    BYTE PTR [rax],dl
  24:   0f b6 11                movzx  edx,BYTE PTR [rcx]
  27:   48 83 c0 01             add    rax,0x1
  2b:   48 83 c1 01             add    rcx,0x1
  2f:   84 d2                   test   dl,dl
  31:   75 e6                   jne    19 <foo+0x19>
  33:   e8 00 00 00 00          call   38 <foo+0x38>
  38:   0f be 94 24 c8 00 00    movsx  edx,BYTE PTR [rsp+0xc8]
  3f:   00 
  40:   0f be 44 24 01          movsx  eax,BYTE PTR [rsp+0x1]
  45:   8d 04 02                lea    eax,[rdx+rax*1]
  48:   48 81 c4 00 01 00 00    add    rsp,0x100
  4f:   5b                      pop    rbx
  50:   c3                      ret    
Run Code Online (Sandbox Code Playgroud)

(注意,tmp[200]在第一种情况下,在有符号的8位位移范围内,但不在此范围内.)