Why are global variables in x86-64 accessed relative to the instruction pointer?

Question

Why are global variables in x86-64 accessed relative to the instruction pointer?

roy*_*uly 4 c compiler-construction assembly x86-64

I have tried to compile c code to assembly code using gcc -S -fasm foo.c. The c code declare global variable and variable in the main function as shown below:

int y=6;
int main()
{
        int x=4;
        x=x+y;
        return 0;
}

Run Code Online (Sandbox Code Playgroud)

now I looked in the assembly code that has been generated from this C code and I saw, that the global variable y is stored using the value of the rip instruction pointer.

I thought that only const global variable stored in the text segment but, looking at this example it seems that also regular global variables are stored in the text segment which is very weird.

I guess that some assumption i made is wrong, so can someone please explain it to me?

the assembly code generated by c compiler:

        .file   "foo.c"
        .text
        .globl  y
        .data
        .align 4
        .type   y, @object
        .size   y, 4
y:
        .long   6
        .text
        .globl  main
        .type   main, @function

main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $4, -4(%rbp)
        movl    y(%rip), %eax
        addl    %eax, -4(%rbp)
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:

Run Code Online (Sandbox Code Playgroud)

Answer 1

Pet*_*des 8

可执行文件不同部分之间的偏移量是链接时间常量，因此RIP相对寻址可用于任何部分（包括.data非const全局变量所在的位置）。注意.dataasm输出中的。

这甚至适用于PIE可执行文件或共享库，在这些库中，绝对地址直到运行时（ASLR）才知道。

与位置无关的可执行文件（PIE）的运行时ASLR将整个程序的一个基地址随机化，而不是相对于彼此的单个段起始地址。

所有对静态变量的访问都使用RIP相对寻址，因为这是最有效的，即使在位置依赖的可执行文件中也可以使用绝对寻址（因为静态地址是链接时间常量）。

在32位x86中，有两种冗余方式可以对没有寄存器和disp32绝对地址的寻址模式进行编码。（有和没有SIB字节）。x86-64将较短的字节重新用作RIP+rel32，因此mov foo, %eax比长度长1个字节mov foo(%rip), %eax。

64位绝对寻址将占用更多空间，并且仅适用于mov往返于RAX / EAX / AX / AL的地址，除非您使用单独的指令首先将地址获取到寄存器中。

（在X86-64的Linux PIE / PIC，64位绝对寻址是允许的，并通过加载时间的修正处理，把正确的地址到代码或跳转表或静态初始化函数指针。因此码在技术上不具备位置无关，但通常效率更高。并且不允许32位绝对寻址，因为ASLR不仅限于虚拟地址空间的低31位。）

请注意，在非PIE Linux可执行文件中，gcc将使用32位绝对寻址将静态数据的地址放入寄存器中。例如puts("hello");通常将编译为

mov   $.LC0, %edi     # mov r32, imm32
call  puts

Run Code Online (Sandbox Code Playgroud)

在默认的非PIE内存模型中，静态代码和数据链接到虚拟地址空间的低32位，因此无论32位绝对地址是零扩展还是符号扩展为64位，它们都可以工作。同样，这对于索引静态数组也很方便mov array(%rax), %edx。add $4, %eax例如。

看到x86-64 Linux中不再允许的32位绝对地址吗？有关PIE可执行文件的更多信息，该文件对所有内容均使用与位置无关的代码，包括相对于RIP的LEA（例如7字节lea .LC0(%rip), %rdi而不是5字节）mov $.LC0, %edi。

我之所以提到Linux，是因为它从.cfi指令中看起来就像您正在为非Windows平台进行编译。

Answer 2

pho*_*ger 7

尽管 .data 和 .text 段彼此独立，但一旦链接，它们相对于彼此的偏移量就是固定的（至少在 gcc x86-64-mcmodel=small代码模型中，这是默认的代码模型，适用于其代码的所有程序） +数据小于2GB）。

因此，无论系统在进程的地址空间中加载可执行文件，它们引用的指令和数据都将具有相对于彼此的固定偏移量。

由于这些原因，为（默认）小代码模型编译的 x86-64 程序对代码和全局数据使用 RIP 相对寻址。这样做意味着编译器不需要专用一个寄存器来指向系统加载可执行文件的 .data 部分的位置；程序已经知道它自己的 RIP 值以及它与它想要访问的全局数据之间的偏移量，因此访问它的最有效方法是通过 RIP 的 32 位固定偏移量。

（32位绝对寻址模式会占用更多空间，64位绝对寻址模式效率更低，仅适用于RAX/EAX/AX/AL。）

您可以在 Eli Bendersky 的网站上找到更多相关信息：Understanding the x64 code models

归档时间：	6 年，8 月前
查看次数：	340 次
最近记录：	6 年，8 月前