Cir*_*四事件 37
Version of this answer with a nice TOC and more content: http://www.cirosantilli.com/elf-hello-world (hitting the 30k char limit here)
ELF is specified by the LSB:
The LSB basically links to other standards with minor extensions, in particular:
generic (both by SCO):
architecture specific:
A handy summary can be found at:
man elf
Run Code Online (Sandbox Code Playgroud)
Its structure can be examined in a human readable way via utilities like readelf and objdump.
让我们分解一个最小的可运行Linux x86-64示例:
section .data
hello_world db "Hello world!", 10
hello_world_len equ $ - hello_world
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, hello_world
mov rdx, hello_world_len
syscall
mov rax, 60
mov rdi, 0
syscall
Run Code Online (Sandbox Code Playgroud)
编译:
nasm -w+all -f elf64 -o 'hello_world.o' 'hello_world.asm'
ld -o 'hello_world.out' 'hello_world.o'
Run Code Online (Sandbox Code Playgroud)
版本:
ld)我们不使用C程序,因为这会使分析复杂化,这将是第2级:-)
hd hello_world.o
hd hello_world.out
Run Code Online (Sandbox Code Playgroud)
输出地址:https://gist.github.com/cirosantilli/7b03f6df2d404c0862c6
ELF文件包含以下部分:
ELF标题.指向节头表和程序头表的位置.
节头表(可执行文件可选).每个都有e_shnum节标题,每个标题指向一个部分的位置.
N个部分,N <= e_shnum(可选的可执行文件)
程序头表(仅适用于可执行文件).每个都有e_phnum程序头,每个程序头都指向一个段的位置.
N segments, with N <= e_phnum (optional on executable)
The order of those parts is not fixed: the only fixed thing is the ELF header that must be the first thing on the file: Generic docs say:
The easiest way to observe the header is:
readelf -h hello_world.o
readelf -h hello_world.out
Run Code Online (Sandbox Code Playgroud)
Output at: https://gist.github.com/cirosantilli/7b03f6df2d404c0862c6
Bytes in the object file:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |....@.....@.....|
Run Code Online (Sandbox Code Playgroud)
Executable:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 |..>.......@.....|
00000020 40 00 00 00 00 00 00 00 10 01 00 00 00 00 00 00 |@...............|
00000030 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 |....@.8...@.....|
Run Code Online (Sandbox Code Playgroud)
Structure represented:
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
Run Code Online (Sandbox Code Playgroud)
Manual breakdown:
0 0: EI_MAG = 7f 45 4c 46 = 0x7f 'E', 'L', 'F': ELF magic number
0 4: EI_CLASS = 02 = ELFCLASS64: 64 bit elf
0 5: EI_DATA = 01 = ELFDATA2LSB: big endian data
0 6: EI_VERSION = 01: format version
0 7: EI_OSABI (only in 2003 Update) = 00 = ELFOSABI_NONE: no extensions.
0 8: EI_PAD = 8x 00: reserved bytes. Must be set to 0.
1 0: e_type = 01 00 = 1 (big endian) = ET_REl: relocatable format
On the executable it is 02 00 for ET_EXEC.
1 2: e_machine = 3e 00 = 62 = EM_X86_64: AMD64 architecture
1 4: e_version = 01 00 00 00: must be 1
1 8: e_entry = 8x 00: execution address entry point, or 0 if not applicable like for the object file since there is no entry point.
On the executable, it is b0 00 40 00 00 00 00 00. TODO: what else can we set this to? The kernel seems to put the IP directly on that value, it is not hardcoded.
2 0: e_phoff = 8x 00: program header table offset, 0 if not present.
40 00 00 00 on the executable, i.e. it starts immediately after the ELF header.
2 8: e_shoff = 40 7x 00 = 0x40: section header table file offset, 0 if not present.
3 0: e_flags = 00 00 00 00 TODO. Arch specific.
3 4: e_ehsize = 40 00: size of this elf header. TODO why this field? How can it vary?
3 6: e_phentsize = 00 00: size of each program header, 0 if not present.
38 00 on executable: it is 56 bytes long
3 8: e_phnum = 00 00: number of program header entries, 0 if not present.
02 00 on executable: there are 2 entries.
3 A: e_shentsize and e_shnum = 40 00 07 00: section header size and number of entries
3 E: e_shstrndx (Section Header STRing iNDeX) = 03 00: index of the .shstrtab section.
Array of Elf64_Shdr structs.
Each entry contains metadata about a given section.
e_shoff of the ELF header gives the starting position, 0x40 here.
e_shentsize and e_shnum from the ELF header say that we have 7 entries, each 0x40 bytes long.
So the table takes bytes from 0x40 to 0x40 + 7 + 0x40 - 1 = 0x1FF.
Some section names are reserved for certain section types: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections e.g. .text requires a SHT_PROGBITS type and SHF_ALLOC + SHF_EXECINSTR
readelf -S hello_world.o:
There are 7 section headers, starting at offset 0x40:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .data PROGBITS 0000000000000000 00000200
000000000000000d 0000000000000000 WA 0 0 4
[ 2] .text PROGBITS 0000000000000000 00000210
0000000000000027 0000000000000000 AX 0 0 16
[ 3] .shstrtab STRTAB 0000000000000000 00000240
0000000000000032 0000000000000000 0 0 1
[ 4] .symtab SYMTAB 0000000000000000 00000280
00000000000000a8 0000000000000018 5 6 4
[ 5] .strtab STRTAB 0000000000000000 00000330
0000000000000034 0000000000000000 0 0 1
[ 6] .rela.text RELA 0000000000000000 00000370
0000000000000018 0000000000000018 4 2 4
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Run Code Online (Sandbox Code Playgroud)
struct represented by each entry:
typedef struct {
Elf64_Word sh_name;
Elf64_Word sh_type;
Elf64_Xword sh_flags;
Elf64_Addr sh_addr;
Elf64_Off sh_offset;
Elf64_Xword sh_size;
Elf64_Word sh_link;
Elf64_Word sh_info;
Elf64_Xword sh_addralign;
Elf64_Xword sh_entsize;
} Elf64_Shdr;
Run Code Online (Sandbox Code Playgroud)
Contained in bytes 0x40 to 0x7F.
The first section is always magic: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html says:
If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), e_shnum has the value SHN_UNDEF (0) and the actual number of section header table entries is contained in the sh_size field of the section header at index 0 (otherwise, the sh_size member of the initial entry contains 0).
There are also other magic sections detailed in Figure 4-7: Special Section Indexes.
In index 0, SHT_NULL is mandatory. Are there any other uses for it: What is the use of the SHT_NULL section in ELF? ?
.data is section 1:
00000080 01 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................|
000000a0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Run Code Online (Sandbox Code Playgroud)
80 0: sh_name = 01 00 00 00: index 1 in the .shstrtab string table
Here, 1 says the name of this section starts at the first character of that section, and ends at the first NUL character, making up the string .data.
.data is one of the section names which has a predefined meaning http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html
These sections hold initialized data that contribute to the program's memory image.
80 4: sh_type = 01 00 00 00: SHT_PROGBITS: the section content is not specified by ELF, only by how the program interprets it. Normal since a .data section.
80 8: sh_flags = 03 7x 00: SHF_ALLOC and SHF_EXECINSTR: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#sh_flags, as required from a .data section
90 0: sh_addr = 8x 00: in what virtual address the section will be placed during execution, 0 if not placed
90 8: sh_offset = 00 02 00 00 00 00 00 00 = 0x200: number of bytes from the start of the program to the first byte in this section
a0 0: sh_size = 0d 00 00 00 00 00 00 00
If we take 0xD bytes starting at sh_offset 200, we see:
00000200 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a 00 |Hello world!.. |
Run Code Online (Sandbox Code Playgroud)
AHA! So our "Hello world!" string is in the data section like we told it to be on the NASM.
Once we graduate from hd, we will look this up like:
readelf -x .data hello_world.o
Run Code Online (Sandbox Code Playgroud)
which outputs:
Hex dump of section '.data':
0x00000000 48656c6c 6f20776f 726c6421 0a Hello world!.
Run Code Online (Sandbox Code Playgroud)
NASM sets decent properties for that section because it treats .data magically: http://www.nasm.us/doc/nasmdoc7.html#section-7.9.2
Also note that this was a bad section choice: a good C compiler would put the string in .rodata instead, because it is read-only and it would allow for further OS optimizations.
a0 8: sh_link and sh_info = 8x 0: do not apply to this section type. http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections
b0 0: sh_addralign = 04 = TODO: why is this alignment necessary? Is it only for sh_addr, or also for symbols inside sh_addr?
b0 8: sh_entsize = 00 = the section does not contain a table. If != 0, it means that the section contains a table of fixed size entries. In this file, we see from the readelf output that this is the case for the .symtab and .rela.text sections.
Now that we've done one section manually, let's graduate and use the readelf -S of the other sections.
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 2] .text PROGBITS 0000000000000000 00000210
0000000000000027 0000000000000000 AX 0 0 16
Run Code Online (Sandbox Code Playgroud)
.text is executable but not writable: if we try to write to it Linux segfaults. Let's see if we really have some code there:
objdump -d hello_world.o
Run Code Online (Sandbox Code Playgroud)
gives:
hello_world.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: b8 01 00 00 00 mov $0x1,%eax
5: bf 01 00 00 00 mov $0x1,%edi
a: 48 be 00 00 00 00 00 movabs $0x0,%rsi
11: 00 00 00
14: ba 0d 00 00 00 mov $0xd,%edx
19: 0f 05 syscall
1b: b8 3c 00 00 00 mov $0x3c,%eax
20: bf 00 00 00 00 mov $0x0,%edi
25: 0f 05 syscall
Run Code Online (Sandbox Code Playgroud)
If we grep b8 01 00 00 on the hd, we see that this only occurs at 00000210, which is what the section says. And the Size is 27, which matches as well. So we must be talking about the right section.
This looks like the right code: a write followed by an exit.
The most interesting part is line a which does:
movabs $0x0,%rsi
Run Code Online (Sandbox Code Playgroud)
to pass the address of the string to the system call. Currently, the 0x0 is just a placeholder. After linking happens, it will be modified to contain:
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
Run Code Online (Sandbox Code Playgroud)
This modification is possible because of the data of the .rela.text section.
Sections with sh_type == SHT_STRTAB are called string tables.
They hold a null separated array of strings.
Such sections are used by other sections when string names are to be used. The using section says:
So for example, we could have a string table containing: TODO: does it have to start with \0?
Data: \0 a b c \0 d e f \0
Index: 0 1 2 3 4 5 6 7 8
Run Code Online (Sandbox Code Playgroud)
And if another section wants to use the string d e f, they have to point to index 5 of this section (letter d).
Notable string table sections:
.shstrtab.strtabSection type: sh_type == SHT_STRTAB.
Common name: section header string table.
The section name .shstrtab is reserved. The standard says:
This section holds section names.
This section gets pointed to by the e_shstrnd field of the ELF header itself.
String indexes of this section are are pointed to by the sh_name field of section headers, which denote strings.
This section does not have SHF_ALLOC marked, so it will not appear on the executing program.
readelf -x .shstrtab hello_world.o
Run Code Online (Sandbox Code Playgroud)
Gives:
Hex dump of section '.shstrtab':
0x00000000 002e6461 7461002e 74657874 002e7368 ..data..text..sh
0x00000010 73747274 6162002e 73796d74 6162002e strtab..symtab..
0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex
0x00000030 7400 t.
Run Code Online (Sandbox Code Playgroud)
The data in this section has a fixed format: http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html
If we look at the names of other sections, we see that they all contain numbers, e.g. the .text section is number 7.
Then each string ends when the first NUL character is found, e.g. character 12 is \0 just after .text\0.
Section type: sh_type == SHT_SYMTAB.
Common name: symbol table.
First the we note that:
sh_link = 5sh_info = 6For SHT_SYMTAB sections, those numbers mean that:
.strtab.rela.textA good high level tool to disassemble that section is:
nm hello_world.o
Run Code Online (Sandbox Code Playgroud)
which gives:
0000000000000000 T _start
0000000000000000 d hello_world
000000000000000d a hello_world_len
Run Code Online (Sandbox Code Playgroud)
This is however a high level view that omits some types of symbols and in which the symbol types . A more detailed disassembly can be obtained with:
readelf -s hello_world.o
Run Code Online (Sandbox Code Playgroud)
which gives:
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello_world.asm
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world
5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 2 _start
Run Code Online (Sandbox Code Playgroud)
The binary format of the table is documented at http://www.sco.com/developers/gabi/2003-12-17/ch4.symtab.html
The data is:
readelf -x .symtab hello_world.o
Run Code Online (Sandbox Code Playgroud)
Which gives:
Hex dump of section '.symtab':
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 00000000 01000000 0400f1ff ................
0x00000020 00000000 00000000 00000000 00000000 ................
0x00000030 00000000 03000100 00000000 00000000 ................
0x00000040 00000000 00000000 00000000 03000200 ................
0x00000050 00000000 00000000 00000000 00000000 ................
0x00000060 11000000 00000100 00000000 00000000 ................
0x00000070 00000000 00000000 1d000000 0000f1ff ................
0x00000080 0d000000 00000000 00000000 00000000 ................
0x00000090 2d000000 10000200 00000000 00000000 -...............
0x000000a0 00000000 00000000 ........
Run Code Online (Sandbox Code Playgroud)
The entries are of type:
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;
Run Code Online (Sandbox Code Playgroud)
Like in the section table, the first entry is magical and set to a fixed meaningless values.
STT_FILEEntry 1 has ELF64_R_TYPE == STT_FILE. ELF64_R_TYPE is continued inside of st_info.
Byte analysis:
10 8: st_name = 01000000 = character 1 in the .strtab, which until the following \0 makes hello_world.asm
链接器可以使用这条信息文件来决定哪些段部分.
10 12:st_info=04
位0-3 = ELF64_R_TYPE= Type = 4= STT_FILE:此条目的主要用途是st_name用于指示生成此对象文件的文件的名称.
位4-7 = ELF64_ST_BIND=绑定= 0= STB_LOCAL.所需的值STT_FILE.
10 13:st_shndx=符号表部分标题索引= f1ff= SHN_ABS.要求STT_FILE.
20 0:st_value= 8x 00:值为STT_FILE
20 8:st_size= 8x 00:没有分配的大小
现在readelf,我们快速解读其他人.
有两个这样的表项,一个指向.data,另一个.text(部分索引1和2).
Num: Value Size Type Bind Vis Ndx Name
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
Run Code Online (Sandbox Code Playgroud)
TODO他们的目的是什么?
STT_NOTYPE然后是最重要的符号:
Num: Value Size Type Bind Vis Ndx Name
4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world
5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 2 _start
Run Code Online (Sandbox Code Playgroud)
hello_worldstring在.datasection(索引1)中.它的值为0:它指向该部分的第一个字节.
_start标有GLOBAL知名度,因为我们写的:
global _start
Run Code Online (Sandbox Code Playgroud)
在NASM.这是必要的,因为它必须被视为切入点.与C不同,默认NASM标签是本地的.
SHN_ABShello_world_len指向特别st_shndx == SHN_ABS == 0xF1FF.
0xF1FF 选择是为了不与其他部分冲突.
st_value == 0xD == 13这是我们在程序集中存储的值:字符串的长度Hello World!.
这意味着重定位不会影响此值:它是一个常量.
这是我们的汇编程序为我们做的小优化,并且具有ELF支持.
如果我们使用了hello_world_len任何地址,汇编程序就无法将其标记为SHN_ABS,并且链接器稍后会对其进行额外的重定位工作.
默认情况下,NASM .symtab也会在可执行文件上放置一个.
这仅用于调试.没有符号,我们完全失明,必须对所有事情进行逆向工程.
您可以使用它删除它objcopy,并且可执行文件仍将运行.此类可执行文件称为剥离的可执行文件.
保存符号表的字符串.
这部分有sh_type == SHT_STRTAB.
它是由指向sh_link == 5了的.symtab部分.
readelf -x .strtab hello_world.o
Run Code Online (Sandbox Code Playgroud)
得到:
Hex dump of section '.strtab':
0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm
0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel
0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st
0x00000030 61727400 art.
Run Code Online (Sandbox Code Playgroud)
This implies that it is an ELF level limitation that global variables cannot contain NUL characters.
Section type: sh_type == SHT_RELA.
Common name: relocation section.
.rela.text holds relocation data which says how the address should be modified when the final executable is linked. This points to bytes of the text area that must be modified when linking happens to point to the correct memory locations.
Basically, it translates the object text containing the placeholder 0x0 address:
a: 48 be 00 00 00 00 00 movabs $0x0,%rsi
11: 00 00 00
Run Code Online (Sandbox Code Playgroud)
to the actual executable code containing the final 0x6000d8:
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
4000c1: 00 00 00
Run Code Online (Sandbox Code Playgroud)
It was pointed to by sh_info = 6 of the .symtab section.
readelf -r hello_world.o gives:
Relocation section '.rela.text' at offset 0x3b0 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000c 000200000001 R_X86_64_64 0000000000000000 .data + 0
Run Code Online (Sandbox Code Playgroud)
The section does not exist in the executable.
The actual bytes are:
00000370 0c 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Run Code Online (Sandbox Code Playgroud)
The struct represented is:
typedef struct {
Elf64_Addr r_offset;
Elf64_Xword r_info;
Elf64_Sxword r_addend;
} Elf64_Rela;
Run Code Online (Sandbox Code Playgroud)
So:
370 0: r_offset = 0xC: address into the .text whose address this relocation will modify
370 8: r_info = 0x200000001. Contains 2 fields:
ELF64_R_TYPE = 0x1: meaning depends on the exact architecture.__C
Dav*_*ica 12
正如我的评论中所提到的,你将基本上为可执行文件编写自己的elf-header,从而消除不需要的部分.仍有几个必修部分.Muppetlabs-TinyPrograms的文档可以很好地解释这个过程.为了好玩,这里有几个例子:
相当于/ bin/true(45字节):
00000000 7F 45 4C 46 01 00 00 00 00 00 00 00 00 00 49 25 |.ELF..........I%|
00000010 02 00 03 00 1A 00 49 25 1A 00 49 25 04 00 00 00 |......I%..I%....|
00000020 5B 5F F2 AE 40 22 5F FB CD 80 20 00 01 |[_..@"_... ..|
0000002d
Run Code Online (Sandbox Code Playgroud)
你的经典'Hello World!' (160字节):
00000000 7f 45 4c 46 01 01 01 03 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 03 00 01 00 00 00 74 80 04 08 34 00 00 00 |........t...4...|
00000020 00 00 00 00 00 00 00 00 34 00 20 00 02 00 28 00 |........4. ...(.|
00000030 00 00 00 00 01 00 00 00 74 00 00 00 74 80 04 08 |........t...t...|
00000040 74 80 04 08 1f 00 00 00 1f 00 00 00 05 00 00 00 |t...............|
00000050 00 10 00 00 01 00 00 00 93 00 00 00 93 90 04 08 |................|
00000060 93 90 04 08 0d 00 00 00 0d 00 00 00 06 00 00 00 |................|
00000070 00 10 00 00 b8 04 00 00 00 bb 01 00 00 00 b9 93 |................|
00000080 90 04 08 ba 0d 00 00 00 cd 80 b8 01 00 00 00 31 |...............1|
00000090 db cd 80 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a |...Hello world!.|
000000a0
Run Code Online (Sandbox Code Playgroud)
别忘了让它们可执行......
| 归档时间: |
|
| 查看次数: |
6544 次 |
| 最近记录: |