为什么 mov %ax, %ds 汇编+反汇编为 mov %eax,%ds，与原来不一致？

Question

为什么 mov %ax, %ds 汇编+反汇编为 mov %eax,%ds，与原来不一致？

Mar*_*ity 3 x86 assembly gnu-assembler objdump memory-segmentation

测试.S

.text
.global _start
    _start:
        xor %ax, %ax
        mov %ax, %ds
        mov %ax, %ss
        mov %ax, %es
        mov %ax, %fs
        mov %ax, %gs

Run Code Online (Sandbox Code Playgroud)

我通过这样做得到了反汇编代码文件

$ x86_64-elf-gcc -g -c -O0 -m32 -fno-pie -fno-stack-protector -fno-asynchronous-unwind-tables .\test.S
$ x86_64-elf-ld .\test.o -m elf_i386  -Ttext=0x7c00 -o test.elf
$ x86_64-elf-objdump -x -d -S -m i386 ./test.elf > test_dis.txt

Run Code Online (Sandbox Code Playgroud)

测试_dis.txt


./test.elf:     file format elf32-i386
./test.elf
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00007c00

Program Header:
    LOAD off    0x00000000 vaddr 0x00007000 paddr 0x00007000 align 2**12
         filesz 0x00000c0d memsz 0x00000c0d flags r-x

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000000d  00007c00  00007c00  00000c00  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .debug_aranges 00000020  00000000  00000000  00000c10  2**3
                  CONTENTS, READONLY, DEBUGGING
  2 .debug_info   00000049  00000000  00000000  00000c30  2**0
                  CONTENTS, READONLY, DEBUGGING
  3 .debug_abbrev 00000014  00000000  00000000  00000c79  2**0
                  CONTENTS, READONLY, DEBUGGING
  4 .debug_line   0000003b  00000000  00000000  00000c8d  2**0
                  CONTENTS, READONLY, DEBUGGING
SYMBOL TABLE:
00007c00 l    d  .text  00000000 .text
00000000 l    d  .debug_aranges 00000000 .debug_aranges
00000000 l    d  .debug_info    00000000 .debug_info
00000000 l    d  .debug_abbrev  00000000 .debug_abbrev
00000000 l    d  .debug_line    00000000 .debug_line
00007c00 g       .text  00000000 _start
00008c0d g       .text  00000000 __bss_start
00008c0d g       .text  00000000 _edata
00008c10 g       .text  00000000 _end



Disassembly of section .text:

00007c00 <_start>:
.text
.global _start
    _start:
        xor %ax, %ax
    7c00:   66 31 c0                xor    %ax,%ax
        mov %ax, %ds
    7c03:   8e d8                   mov    %eax,%ds
        mov %ax, %ss
    7c05:   8e d0                   mov    %eax,%ss
        mov %ax, %es
    7c07:   8e c0                   mov    %eax,%es
        mov %ax, %fs
    7c09:   8e e0                   mov    %eax,%fs
    7c0b:   8e e8                   mov    %eax,%gs

Run Code Online (Sandbox Code Playgroud)

我的问题

我想知道为什么我得到这样的代码mov %eax,%ds，它不适合我原来的汇编代码？为什么 objdump 输出矛盾的结果

我的期望

我期望mov %eax,%ds应该是mov %ax,%ds，并且我认为%eax（32 位）不适合%ds（16 位）

Answer 1

fuz*_*fuz 5

指令mov %eax, %ds和mov %ax, %ds执行完全相同的操作（您可以说它们实际上是相同的指令），只是前者由于缺少66前缀字节而具有较短的编码。汇编器会为您选择较短的编码，而反汇编器则通过不同的寄存器大小人为地区分两者。

https://www.felixcloutier.com/x86/mov 记录了 `mov %ds, %eax` 仅保证在 Pentium Pro 及更高版本的 Intel CPU 上将 EAX 的高 16 位清零。（@Markity）。至少我认为这就是在这种情况下“MOV Reg, Sreg”（英特尔语法）的含义，操作数大小为 32 位而不是 16。我在 Skylake CPU 上以 64 位模式进行了测试。`66 8c d8 mov ax, ds` 保留高 16 位不变，`8c d8 mov eax, ds` 按预期将它们归零。显然，在 P5 Pentium 及更早版本上，“mov %ds, %eax” (AT&T) 不一定会进行零扩展。 (2认同)
@PeterCordes https://hg.pushbx.org/ecm/ldebug/rev/912356bc7941 引用脚注“在 32 位模式下，汇编器可以使用此指令插入 16 位操作数大小前缀”，这意味着它是一个无操作，尽管它没有明确声明内存操作数始终是“m16”。我正在使用的 NASM 版本（接近 2.16rc10）在“bits 16”和“bits 32”模式下拒绝“mov dword [100h], es”，“mov [100h], es”和“mov word [100h” ]、es` 都被接受，但在任一位数模式下都不会发出 `osize` 前缀。ndisasm 将发出“o32”或“o16”前缀关键字。 (2认同)

归档时间：	2 年，10 月前
查看次数：	162 次
最近记录：	2 年，10 月前