通过将所有寄存器名称从eXX更改为rXX,从32位移植到64位,使因子返回0？

Question

通过将所有寄存器名称从eXX更改为rXX,从32位移植到64位,使因子返回0？

幸运的是,学习计算机编程艺术的所有用途都可以访问Stack Overflow等社区!我决定承担学习如何编程计算机的任务,我正在通过一本名为"从头开始编程"的电子书的知识这样做,该电子书教会读者如何用汇编语言创建程序在GNU/Linux环境中.

我在本书中的进展已经到了创建一个程序,该程序用函数计算整数4的阶乘,我已经完成并完成了没有由GCC的汇编程序引起的或由运行程序引起的任何错误.但是,我的程序中的功能没有返回正确的答案!阶乘4是24,但程序返回值0!说对了,我不知道为什么会这样!

以下是供您考虑的代码:

.section .data

.section .text

.globl _start

.globl factorial

_start:

push $4                    #this is the function argument
call factorial             #the function is called
add $4, %rsp               #the stack is restored to its original 
                           #state before the function was called
mov %rax, %rbx             #this instruction will move the result 
                           #computed by the function into the rbx 
                           #register and will serve as the return 
                           #value 
mov $1, %rax               #1 must be placed inside this register for 
                           #the exit system call
int $0x80                  #exit interrupt

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #saves the base-pointer
mov %rsp, %rbp             #moves the stack-pointer into the base-
                           #pointer register so that data in the stack 
                           #can be referenced as indexes of the base-
                           #pointer
mov $1, %rax               #the rax register will contain the product 
                           #of the factorial
mov 8(%rbp), %rcx          #moves the function argument into %rcx
start_loop:                #the process loop begins
cmp $1, %rcx               #this is the exit condition for the loop
je loop_exit               #if the value in %rcx reaches 1, exit loop
imul %rcx, %rax            #multiply the current integer of the 
                           #factorial by the value stored in %rax
dec %rcx                   #reduce the factorial integer by 1
jmp start_loop             #unconditional jump to the start of loop
loop_exit:                 #the loop exit begins
mov %rbp, %rsp             #restore the stack-pointer
pop %rbp                   #remove the saved base-pointer from stack
ret                        #return

Run Code Online (Sandbox Code Playgroud)

Answer 1

Pet*_*des 5

TL:DR:返回地址的阶乘溢出%rax,留0,因为你输错了.

将32位代码移植到64位并不像更改所有寄存器名称那么简单. 这可能会让它组装起来,但正如你所发现的那样,这个简单的程序行为也不同.在X86-64,push %reg并call都推64位的值,并修改rsp通过8.如果你单步你的代码调试器你会看到这一点.(有关asm的信息,请参阅x86标记wiki的底部gdb.)

您正在阅读一本使用32位示例的书,因此您可能只需将它们构建为32位可执行文件,而不是在知道如何将它们移植到64位之前.

您sys_exit()使用32位int 0x80ABI仍然有效(如果您在64位代码中使用32位int 0x80 Linux ABI会发生什么？),但如果您尝试传递64位指针,则会遇到系统调用问题. 使用64位ABI.

如果要调用任何库函数,也会遇到问题,因为标准的函数调用约定也不同.请参阅为什么参数存储在寄存器中而不是x86-64汇编中的堆栈中？,以及64位ABI链接,以及x86标记wiki 中的其他调用约定文档.

但是你没有做任何这样的事情,所以你的程序问题只是归结为不考虑x86-64中加倍的"堆栈宽度". 您的factorial函数将返回地址作为其参数.

这是你的代码,评论解释它实际上做了什么

push $4                    # rsp-=8.  (rsp) = qword 4
                           # non-standard calling convention with args on the stack.
call factorial             # rsp-=8.  (rsp) = return address.  RIP=factorial
add $4, %rsp               # misalign the stack, so it's pointing to the top half of the 4 you pushed earlier.
# if this was in a function that wanted to return, you'd be screwed.

mov %rax, %rbx             # copy return value to first arg of system call
mov $1, %rax               #eax = __NR_EXIT from asm/unistd_32.h, wasting 2 bytes vs. mov $1, %eax
int $0x80                  # 32-bit ABI system call, eax=call number, ebx=first arg.  sys_exit(factorial(4))

Run Code Online (Sandbox Code Playgroud)

所以调用者很好(对于你发明的非标准64位调用约定,它会传递堆栈中的所有args).您也add可以%rsp完全省略to ,因为您即将退出而不再触及堆栈.

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #rsp-=8, (rsp) = rbp
mov %rsp, %rbp             # make a traditional stack frame

mov $1, %rax               #retval = 1.  (Wasting 2 bytes vs. the exactly equivalent mov $1, %eax)

mov 8(%rbp), %rcx          #load the return address into %rcx

... and calculate the factorial

Run Code Online (Sandbox Code Playgroud)

对于静态可执行文件(以及未通过PIE启用ASLR的动态链接可执行文件),_start通常为0x4000c0.您的程序仍将在现代CPU上几乎瞬间运行,因为0x4000c0*3c延迟imul仍然只有1250万个核心时钟周期.在4GHz CPU上,这是3毫秒的CPU时间.

如果您通过链接gcc foo.o最近的发行版创建了与位置无关的可执行文件,_start那么会有一个类似的地址0x5555555545a0,并且您的函数需要大约70368秒才能在具有3个周期的imul延迟的4GHz CPU上运行.

4194496!包括许多偶数,因此它的二进制表示有许多尾随零.当%rax你完成时,整数将为零,乘以每个数字,从0x4000c0下降到1.

Linux进程的退出状态只是你传递给的整数的低8位sys_exit()(因为wstatus它只是一个32位的int并包含其他东西,比如什么信号结束了进程.参见wait4(2)).因此,即使使用小型args,也不需要花费太多时间.

归档时间：	8 年前
查看次数：	138 次
最近记录：	8 年前