如何拆卸剥离应用程序的主要功能?

kar*_*lip 32 c linux gdb strip disassembly

假设我编译了下面的应用程序并删除了它的符号.

#include <stdio.h>

int main()
{
    printf("Hello\n");
}
Run Code Online (Sandbox Code Playgroud)

构建过程:

gcc -o hello hello.c
strip --strip-unneeded hello
Run Code Online (Sandbox Code Playgroud)

如果应用程序没有被剥离,则拆卸主要功能将很容易.但是,我不知道如何反汇编应用程序的主要功能.

(gdb) disas main
No symbol table is loaded.  Use the "file" command.

(gdb) info line main
Function "main" not defined.
Run Code Online (Sandbox Code Playgroud)

我怎么能这样做?它甚至可能吗?

注意:这必须仅使用GDB.忘掉objdump.假设我无权访问代码.

我们将非常感谢一步一步的例子.

Dr *_*eco 42

好的,这是我以前的答案的大版本.我想我现在找到了办法.

你(仍然:)有这个具体问题:

(gdb) disas main
No symbol table is loaded.  Use the "file" command.
Run Code Online (Sandbox Code Playgroud)

现在,如果你编译代码(我return 0在最后添加了一个代码),你会得到gcc -S:

    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %edi
    call    puts
    movl    $0, %eax
    leave
    ret
Run Code Online (Sandbox Code Playgroud)

现在,您可以看到您的二进制文件为您提供了一些信息:

条纹:

(gdb) info files
Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip".
Local exec file:
    `/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.
    Entry point: 0x400440
    0x0000000000400238 - 0x0000000000400254 is .interp
    ...
    0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn
    0x00000000004003c0 - 0x00000000004003f0 is .rela.plt
    0x00000000004003f0 - 0x0000000000400408 is .init
    0x0000000000400408 - 0x0000000000400438 is .plt
    0x0000000000400440 - 0x0000000000400618 is .text
    ...
    0x0000000000601010 - 0x0000000000601020 is .data
    0x0000000000601020 - 0x0000000000601030 is .bss
Run Code Online (Sandbox Code Playgroud)

这里最重要的条目是.text.它是代码汇编开始的通用名称,从我们对主要波纹管的解释,从它的大小,你可以看到它包括main.如果你反汇编它,你会看到对__libc_start_main的调用.最重要的是,您正在拆分一个真正的代码(您不会误导将DATA更改为CODE).

disas 0x0000000000400440,0x0000000000400618
Dump of assembler code from 0x400440 to 0x400618:
   0x0000000000400440:  xor    %ebp,%ebp
   0x0000000000400442:  mov    %rdx,%r9
   0x0000000000400445:  pop    %rsi
   0x0000000000400446:  mov    %rsp,%rdx
   0x0000000000400449:  and    $0xfffffffffffffff0,%rsp
   0x000000000040044d:  push   %rax
   0x000000000040044e:  push   %rsp
   0x000000000040044f:  mov    $0x400540,%r8
   0x0000000000400456:  mov    $0x400550,%rcx
   0x000000000040045d:  mov    $0x400524,%rdi
   0x0000000000400464:  callq  0x400428 <__libc_start_main@plt>
   0x0000000000400469:  hlt
   ...

   0x000000000040046c:  sub    $0x8,%rsp
   ...
   0x0000000000400482:  retq   
   0x0000000000400483:  nop
   ...
   0x0000000000400490:  push   %rbp
   ..
   0x00000000004004f2:  leaveq 
   0x00000000004004f3:  retq   
   0x00000000004004f4:  data32 data32 nopw %cs:0x0(%rax,%rax,1)
   ...
   0x000000000040051d:  leaveq 
   0x000000000040051e:  jmpq   *%rax
   ...
   0x0000000000400520:  leaveq 
   0x0000000000400521:  retq   
   0x0000000000400522:  nop
   0x0000000000400523:  nop
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  mov    $0x40062c,%edi
   0x000000000040052d:  callq  0x400418 <puts@plt>
   0x0000000000400532:  mov    $0x0,%eax
   0x0000000000400537:  leaveq 
   0x0000000000400538:  retq   
Run Code Online (Sandbox Code Playgroud)

__libc_start_main的调用将第一个参数作为指向main()的指针.因此,在调用之前的堆栈中的最后一个参数是main()地址.

   0x000000000040045d:  mov    $0x400524,%rdi
   0x0000000000400464:  callq  0x400428 <__libc_start_main@plt>
Run Code Online (Sandbox Code Playgroud)

这是0x400524(我们已经知道).现在你设置一个断点试试这个:

(gdb) break *0x400524
Breakpoint 1 at 0x400524
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2 

Breakpoint 1, 0x0000000000400524 in main ()
(gdb) n
Single stepping until exit from function main, 
which has no line number information.
hello 1
__libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, 
    init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, 
    stack_end=0x7fffffffdc38) at libc-start.c:258
258 libc-start.c: No such file or directory.
    in libc-start.c
(gdb) n

Program exited normally.
(gdb) 
Run Code Online (Sandbox Code Playgroud)

现在您可以使用以下方法反汇编:

(gdb) disas 0x0000000000400524,0x0000000000400600
Dump of assembler code from 0x400524 to 0x400600:
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  sub    $0x10,%rsp
   0x000000000040052c:  movl   $0x1,-0x4(%rbp)
   0x0000000000400533:  mov    $0x40064c,%eax
   0x0000000000400538:  mov    -0x4(%rbp),%edx
   0x000000000040053b:  mov    %edx,%esi
   0x000000000040053d:  mov    %rax,%rdi
   0x0000000000400540:  mov    $0x0,%eax
   0x0000000000400545:  callq  0x400418 <printf@plt>
   0x000000000040054a:  mov    $0x0,%eax
   0x000000000040054f:  leaveq 
   0x0000000000400550:  retq   
   0x0000000000400551:  nop
   0x0000000000400552:  nop
   0x0000000000400553:  nop
   0x0000000000400554:  nop
   0x0000000000400555:  nop
   ...
Run Code Online (Sandbox Code Playgroud)

这主要是解决方案.

顺便说一句,这是一个不同的代码,看它是否有效.这就是为什么上面的组件有点不同.上面的代码来自这个c文件:

#include <stdio.h>

int main(void)
{
    int i=1;
    printf("hello %d\n", i);
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

但!


如果这不起作用,那么你仍然有一些提示:

您应该从现在开始在所有函数的开头设置断点.他们就在ret或之前leave.第一个切入点.text本身.这是装配开始,但不是主要的.

问题是并不总是断点会让你的程序运行.像这样的人.text:

(gdb) break *0x0000000000400440
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2 

Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) n
Single stepping until exit from function _start, 
which has no line number information.
0x0000000000400428 in __libc_start_main@plt ()
(gdb) n
Single stepping until exit from function __libc_start_main@plt, 
which has no line number information.
0x0000000000400408 in ?? ()
(gdb) n
Cannot find bounds of current function
Run Code Online (Sandbox Code Playgroud)

因此,您需要继续尝试,直到找到自己的方式,在以下位置设置断点:

0x400440
0x40046c
0x400490
0x4004f4
0x40051e
0x400524
Run Code Online (Sandbox Code Playgroud)

从另一个答案,我们应该保留这个信息:

在文件的非条带版本中,我们看到:

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000400524 <+0>: push   %rbp
   0x0000000000400525 <+1>: mov    %rsp,%rbp
   0x0000000000400528 <+4>: mov    $0x40062c,%edi
   0x000000000040052d <+9>: callq  0x400418 <puts@plt>
   0x0000000000400532 <+14>:    mov    $0x0,%eax
   0x0000000000400537 <+19>:    leaveq 
   0x0000000000400538 <+20>:    retq   
End of assembler dump.
Run Code Online (Sandbox Code Playgroud)

现在我们知道主要是在0x0000000000400524,0x0000000000400539.如果我们使用相同的偏移量来查看条带二进制文件,我们会得到相同的结果:

(gdb) disas 0x0000000000400524,0x0000000000400539
Dump of assembler code from 0x400524 to 0x400539:
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  mov    $0x40062c,%edi
   0x000000000040052d:  callq  0x400418 <puts@plt>
   0x0000000000400532:  mov    $0x0,%eax
   0x0000000000400537:  leaveq 
   0x0000000000400538:  retq   
End of assembler dump.
Run Code Online (Sandbox Code Playgroud)

因此,除非您可以获得主要启动的一些提示(例如使用带符号的其他代码),另一种方法是如果您可以获得有关第一组装指令的信息,那么您可以在特定位置进行反汇编并查看是否匹配.如果您根本无法访问代码,您仍然可以阅读ELF定义以了解代码中应显示的部分数量并尝试计算地址.不过,您需要有关代码中各部分的信息!

我的朋友,这很辛苦!祝好运!

贝乔


Mat*_*Mat 8

如何info files获取部分列表(带地址),并从那里开始?

例:

gdb) info files

Symbols from "/home/bob/tmp/t".
Local exec file:
`/home/bob/tmp/t', file type elf64-x86-64.
Entry point: 0x400490
0x0000000000400270 - 0x000000000040028c is .interp
0x000000000040028c - 0x00000000004002ac is .note.ABI-tag
    ....

0x0000000000400448 - 0x0000000000400460 is .init
    ....
Run Code Online (Sandbox Code Playgroud)

拆解.init:

(gdb) disas 0x0000000000400448,0x0000000000400460
Dump of assembler code from 0x400448 to 0x400460:
   0x0000000000400448:  sub    $0x8,%rsp
   0x000000000040044c:  callq  0x4004bc
   0x0000000000400451:  callq  0x400550
   0x0000000000400456:  callq  0x400650
   0x000000000040045b:  add    $0x8,%rsp
   0x000000000040045f:  retq   
Run Code Online (Sandbox Code Playgroud)

然后继续拆解其余部分.

如果我是你,并且我使用与你的可执行文件相同的GCC版本,我将检查在虚拟非剥离可执行文件上调用的函数序列.在大多数情况下,调用顺序可能类似,因此这可能有助于您main通过比较来研究启动顺序.尽管如此,优化可能会受到影响.

如果您的二进制文件被剥离和优化,main可能不会作为二进制文件中的"实体"存在; 你不可能比这种程序好得多.