本文简要介绍了动态链接库中地址无关代码(Position independent code)的实现原理,并利用GDB等工具对此过程进行了验证。
/*add.c*/ int global_extern_int = 2; void foo() { } int add(int a_, int b_) { foo(); return global_extern_int+a_+ b_; }
/*main.c*/ int add(int a_, int b_); extern int global_extern_int; int global_int = 3; int main() { static int a = 19; global_int = 5; int rtv = 0; rtv = add(global_int, global_extern_int); return rtv; }
代码的编译和链接,
gcc -g -fno-pie -no-pie -m32 -fPIC -c add.c gcc -g -fno-pie -no-pie -m32 -shared add.o -o libadd.so gcc -g -fno-pie -no-pie -m32 -fPIC -c main.c gcc -g -fno-pie -no-pie -m32 -o main main.o -L . -ladd
动态链接是相对于静态链接提出来的,这里仅简单分析两者的区别,
Position-independent code (PIC) is code that uses no hard-coded addresses for either code or data. by using relative addressing
PIC不使用绝对地址对data或者fun进行寻址,而是利用一些相对地址的手段进行,PIC一般而言是针对共享库的。
对于一个共享库,以代码中的add.c为例,它对外提供了,
PIC的目的是让我们的代码段对所有的使用者来说都是一样的(不能对代码段进行修改,例如使用前面文章中介绍的符号重定位的方法,这种方法其实修改了最终的代码段),其具体的实现比较巧妙,核心的思想是通过增加一个间接层,也就是GOT(Global Offset Table)和PLT(Procedure Linkage Table),来完成对数据和函数的间接寻址。
PIC的一个核心思想是借助数据段和代码段之间的确定的偏移量。因为对于链接器而言,在将许多的目标文件进行合并的时候,它明确的知道所有段的大小和他们之间的偏移量。如上图所示,代码段紧跟着数据段,因此任何代码段中的某条指令到数据段起始的偏移量都可以很容易的计算出来,用代码段的大小减去该指令距离代码段起始点的偏移量。假设距离代码段起始点偏移0x80的指令想取获取数据段中的数据,此时链接器知道相对偏移量(0xEF80),可以用相对偏移量去进行相对寻址。
如上图所示,GOT是一个保存了各种地址的一个数组,前三个元素都是动态链接器的一些信息,后面的元素都是一些符号的地址,也就是变量和函数的地址。
GOT保存在数据段,数据段是可以修改的,这个很重要,如果代码段中有指令要去获取变量的地址,代码段中的指令不是通过直接去获取绝对地址,而是指向了一个确定的GOT中的位置,动态链接器可以在找到变量或者函数的地址后,然后修改对应的地址值。这就保证了代码段的地址无关,保证了代码段的稳定不变。
GOT在ELF文件中有两个对应的段,分别为.got和.got.plt,其实是一个,只不过根据功能的不同,分成了两个:
下面分析一下,
lyf@liuyifei:~/test$ objdump -d -Mintel ./libadd.so ./libadd.so: file format elf32-i386 ...... 0000116d <add>: 116d: 55 push ebp 116e: 89 e5 mov ebp,esp 1170: 53 push ebx 1171: 83 ec 04 sub esp,0x4 1174: e8 e7 fe ff ff call 1060 <__x86.get_pc_thunk.bx> 1179: 81 c3 87 2e 00 00 add ebx,0x2e87 117f: e8 bc fe ff ff call 1040 <foo@plt> 1184: 8b 83 f8 ff ff ff mov eax,DWORD PTR [ebx-0x8] 118a: 8b 10 mov edx,DWORD PTR [eax] 118c: 8b 45 08 mov eax,DWORD PTR [ebp+0x8] 118f: 01 c2 add edx,eax 1191: 8b 45 0c mov eax,DWORD PTR [ebp+0xc] 1194: 01 d0 add eax,edx 1196: 8b 5d fc mov ebx,DWORD PTR [ebp-0x4] 1199: c9 leave 119a: c3 ret 0000119b <__x86.get_pc_thunk.ax>: 119b: 8b 04 24 mov eax,DWORD PTR [esp] 119e: c3 ret
call 116e <__x86.get_pc_thunk.ax>的作用是获取到下一条指令的地址(32位特有的,64位上有专门的寄存器),即0x1179。
eax加上0x2e87为0x4000,那么这个地址对应的是什么地方呢?这个0x2e87就是链接器在链接时候知道的当前代码地址距离GOT的OFFSET。
lyf@liuyifei:~/test$ readelf -S ./libadd.so There are 31 section headers, starting at offset 0x3710: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .note.gnu.bu[...] NOTE 00000154 000154 000024 00 A 0 0 4 [ 2] .gnu.hash GNU_HASH 00000178 000178 00002c 04 A 3 0 4 [ 3] .dynsym DYNSYM 000001a4 0001a4 000080 10 A 4 1 4 [ 4] .dynstr STRTAB 00000224 000224 00006f 00 A 0 0 1 [ 5] .rel.dyn REL 00000294 000294 000040 08 A 3 0 4 [ 6] .rel.plt REL 000002d4 0002d4 000008 08 AI 3 18 4 [ 7] .init PROGBITS 00001000 001000 000024 00 AX 0 0 4 [ 8] .plt PROGBITS 00001030 001030 000020 04 AX 0 0 16 [ 9] .plt.got PROGBITS 00001050 001050 000008 08 AX 0 0 8 [10] .text PROGBITS 00001060 001060 00013f 00 AX 0 0 16 [11] .fini PROGBITS 000011a0 0011a0 000018 00 AX 0 0 4 [12] .eh_frame_hdr PROGBITS 00002000 002000 000034 00 A 0 0 4 [13] .eh_frame PROGBITS 00002034 002034 0000ac 00 A 0 0 4 [14] .init_array INIT_ARRAY 00003f24 002f24 000004 04 WA 0 0 4 [15] .fini_array FINI_ARRAY 00003f28 002f28 000004 04 WA 0 0 4 [16] .dynamic DYNAMIC 00003f2c 002f2c 0000c0 08 WA 4 0 4 [17] .got PROGBITS 00003fec 002fec 000014 04 WA 0 0 4 [18] .got.plt PROGBITS 00004000 003000 000010 04 WA 0 0 4 [19] .data PROGBITS 00004010 003010 000008 00 WA 0 0 4 [20] .bss NOBITS 00004018 003018 000004 00 WA 0 0 1 [21] .comment PROGBITS 00000000 003018 00002b 01 MS 0 0 1 [22] .debug_aranges PROGBITS 00000000 003043 000020 00 0 0 1 [23] .debug_info PROGBITS 00000000 003063 000085 00 0 0 1 [24] .debug_abbrev PROGBITS 00000000 0030e8 000079 00 0 0 1 [25] .debug_line PROGBITS 00000000 003161 000054 00 0 0 1 [26] .debug_str PROGBITS 00000000 0031b5 000099 01 MS 0 0 1 [27] .debug_line_str PROGBITS 00000000 00324e 000015 01 MS 0 0 1 [28] .symtab SYMTAB 00000000 003264 0001e0 10 29 23 4 [29] .strtab STRTAB 00000000 003444 0001b0 00 0 0 1 [30] .shstrtab STRTAB 00000000 0035f4 00011b 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), D (mbind), p (processor specific)
从libadd.so的段表中,我们能够看到0x4000对应的是段.got.plt。
1184: 8b 83 f8 ff ff ff mov eax,DWORD PTR [ebx-0x8]
下面是实际运行时gdb的例子,
lyf@liuyifei:~/test$ gdb main ... Reading symbols from main... (gdb) set environment LD_LIBRARY_PATH=. (gdb) break add Breakpoint 1 at 0x8049050 (gdb) run Starting program: /home/lyf/test/main [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Breakpoint 1, add (a_=5, b_=2) at add.c:10 10 foo(); (gdb) set disassembly-flavor intel (gdb) disas add Dump of assembler code for function add: 0xf7fba16d <+0>: push ebp 0xf7fba16e <+1>: mov ebp,esp 0xf7fba170 <+3>: push ebx 0xf7fba171 <+4>: sub esp,0x4 0xf7fba174 <+7>: call 0xf7fba060 <__x86.get_pc_thunk.bx> 0xf7fba179 <+12>: add ebx,0x2e87 => 0xf7fba17f <+18>: call 0xf7fba040 <foo@plt> 0xf7fba184 <+23>: mov eax,DWORD PTR [ebx-0x8] 0xf7fba18a <+29>: mov edx,DWORD PTR [eax] 0xf7fba18c <+31>: mov eax,DWORD PTR [ebp+0x8] 0xf7fba18f <+34>: add edx,eax 0xf7fba191 <+36>: mov eax,DWORD PTR [ebp+0xc] 0xf7fba194 <+39>: add eax,edx 0xf7fba196 <+41>: mov ebx,DWORD PTR [ebp-0x4] 0xf7fba199 <+44>: leave 0xf7fba19a <+45>: ret End of assembler dump. (gdb) i registers eax 0x804c000 134529024 ecx 0x2 2 edx 0x5 5 ebx 0xf7fbd000 -134492160 esp 0xffffd170 0xffffd170 ebp 0xffffd178 0xffffd178 esi 0xffffd274 -11660 edi 0xf7ffcb80 -134231168 eip 0xf7fba17f 0xf7fba17f <add+18> eflags 0x296 [ PF AF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) print/x *(void**)0xF7FBCFF8 = 0xf7fbd014 (gdb) print/x &global_extern_int = 0xf7fbd014
如上述介绍了PIC中对变量地址的获取,对于函数调用,我们可以用同样的方法来实现,即将动态库中函数的加载在内存中的真实地址放在GOT中,然后利用间接引用的方式去获取。但是这里有个问题,目前很多C/C++库都包含了大量的函数,而这些库函数的真实地址直到动态链接库解析后才能获取到,如果采用上述GOT的做法,在动态库加载时,将所有函数的地址都先解析后,填充在GOT表中,可以想象GOT表得有多大。此外我们调用动态库,可能有时候只调用了一个printf函数,其他的函数根本就没有被调用。因此编译系统采用了名叫延时绑定(Lazy binding)的技术,只有在真正调用某个函数时,才去解析其内存中真实的地址,其核心原理依然是通过添加一个间接层,也就是PLT(Procedure Linkage Table),来完成对函数的间接寻址以及延时绑定。
PLT位于代码段,也是类似于GOT一样的一个数组结构,如下图所示,
当第一次调用printf函数时,
当本进程中再次调用printf函数时,
通过上面的步骤,我们可以看到
main链接了PIC的动态库libadd.so,查看其包含的段,
lyf@liuyifei:~/test$ readelf -S ./main There are 35 section headers, starting at offset 0x3874: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048194 000194 000013 00 A 0 0 1 [ 2] .note.gnu.bu[...] NOTE 080481a8 0001a8 000024 00 A 0 0 4 [ 3] .note.ABI-tag NOTE 080481cc 0001cc 000020 00 A 0 0 4 [ 4] .gnu.hash GNU_HASH 080481ec 0001ec 000020 04 A 5 0 4 [ 5] .dynsym DYNSYM 0804820c 00020c 000060 10 A 6 1 4 [ 6] .dynstr STRTAB 0804826c 00026c 000066 00 A 0 0 1 [ 7] .gnu.version VERSYM 080482d2 0002d2 00000c 02 A 5 0 2 [ 8] .gnu.version_r VERNEED 080482e0 0002e0 000020 00 A 6 1 4 [ 9] .rel.dyn REL 08048300 000300 000010 08 A 5 0 4 [10] .rel.plt REL 08048310 000310 000010 08 AI 5 22 4 [11] .init PROGBITS 08049000 001000 000024 00 AX 0 0 4 [12] .plt PROGBITS 08049030 001030 000030 04 AX 0 0 16 [13] .text PROGBITS 08049060 001060 000178 00 AX 0 0 16 [14] .fini PROGBITS 080491d8 0011d8 000018 00 AX 0 0 4 [15] .rodata PROGBITS 0804a000 002000 000008 00 A 0 0 4 [16] .eh_frame_hdr PROGBITS 0804a008 002008 000034 00 A 0 0 4 [17] .eh_frame PROGBITS 0804a03c 00203c 0000b0 00 A 0 0 4 [18] .init_array INIT_ARRAY 0804bf00 002f00 000004 04 WA 0 0 4 [19] .fini_array FINI_ARRAY 0804bf04 002f04 000004 04 WA 0 0 4 [20] .dynamic DYNAMIC 0804bf08 002f08 0000f0 08 WA 6 0 4 [21] .got PROGBITS 0804bff8 002ff8 000008 04 WA 0 0 4 [22] .got.plt PROGBITS 0804c000 003000 000014 04 WA 0 0 4 [23] .data PROGBITS 0804c014 003014 000010 00 WA 0 0 4 [24] .bss NOBITS 0804c024 003024 000004 00 WA 0 0 1 [25] .comment PROGBITS 00000000 003024 00002b 01 MS 0 0 1 [26] .debug_aranges PROGBITS 00000000 00304f 000020 00 0 0 1 [27] .debug_info PROGBITS 00000000 00306f 000098 00 0 0 1 [28] .debug_abbrev PROGBITS 00000000 003107 00009e 00 0 0 1 [29] .debug_line PROGBITS 00000000 0031a5 000057 00 0 0 1 [30] .debug_str PROGBITS 00000000 0031fc 0000a9 01 MS 0 0 1 [31] .debug_line_str PROGBITS 00000000 0032a5 000016 01 MS 0 0 1 [32] .symtab SYMTAB 00000000 0032bc 000280 10 33 19 4 [33] .strtab STRTAB 00000000 00353c 0001e7 00 0 0 1 [34] .shstrtab STRTAB 00000000 003723 000151 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), D (mbind), p (processor specific)
反汇编main,
lyf@liuyifei:~/test$ objdump -d -Mintel ./main ./main: file format elf32-i386 Disassembly of section .init: 08049000 <_init>: 8049000: f3 0f 1e fb endbr32 8049004: 53 push ebx 8049005: 83 ec 08 sub esp,0x8 8049008: e8 a3 00 00 00 call 80490b0 <__x86.get_pc_thunk.bx> 804900d: 81 c3 f3 2f 00 00 add ebx,0x2ff3 8049013: 8b 83 fc ff ff ff mov eax,DWORD PTR [ebx-0x4] 8049019: 85 c0 test eax,eax 804901b: 74 02 je 804901f <_init+0x1f> 804901d: ff d0 call eax 804901f: 83 c4 08 add esp,0x8 8049022: 5b pop ebx 8049023: c3 ret Disassembly of section .plt: 08049030 <__libc_start_main@plt-0x10>: 8049030: ff 35 04 c0 04 08 push DWORD PTR ds:0x804c004 8049036: ff 25 08 c0 04 08 jmp DWORD PTR ds:0x804c008 804903c: 00 00 add BYTE PTR [eax],al ... 08049040 <__libc_start_main@plt>: 8049040: ff 25 0c c0 04 08 jmp DWORD PTR ds:0x804c00c 8049046: 68 00 00 00 00 push 0x0 804904b: e9 e0 ff ff ff jmp 8049030 <_init+0x30> 08049050 <add@plt>: 8049050: ff 25 10 c0 04 08 jmp DWORD PTR ds:0x804c010 8049056: 68 08 00 00 00 push 0x8 804905b: e9 d0 ff ff ff jmp 8049030 <_init+0x30> 08049176 <main>: 8049176: 8d 4c 24 04 lea ecx,[esp+0x4] 804917a: 83 e4 f0 and esp,0xfffffff0 804917d: ff 71 fc push DWORD PTR [ecx-0x4] 8049180: 55 push ebp 8049181: 89 e5 mov ebp,esp 8049183: 53 push ebx 8049184: 51 push ecx 8049185: 83 ec 10 sub esp,0x10 8049188: e8 47 00 00 00 call 80491d4 <__x86.get_pc_thunk.ax> 804918d: 05 73 2e 00 00 add eax,0x2e73 8049192: c7 c2 1c c0 04 08 mov edx,0x804c01c 8049198: c7 02 05 00 00 00 mov DWORD PTR [edx],0x5 804919e: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0 80491a5: 8b 90 f8 ff ff ff mov edx,DWORD PTR [eax-0x8] 80491ab: 8b 0a mov ecx,DWORD PTR [edx] 80491ad: c7 c2 1c c0 04 08 mov edx,0x804c01c 80491b3: 8b 12 mov edx,DWORD PTR [edx] 80491b5: 83 ec 08 sub esp,0x8 80491b8: 51 push ecx 80491b9: 52 push edx 80491ba: 89 c3 mov ebx,eax 80491bc: e8 8f fe ff ff call 8049050 <add@plt> 80491c1: 83 c4 10 add esp,0x10 80491c4: 89 45 f4 mov DWORD PTR [ebp-0xc],eax 80491c7: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc] 80491ca: 8d 65 f8 lea esp,[ebp-0x8] 80491cd: 59 pop ecx 80491ce: 5b pop ebx 80491cf: 5d pop ebp 80491d0: 8d 61 fc lea esp,[ecx-0x4] 80491d3: c3 ret 080491d4 <__x86.get_pc_thunk.ax>: 80491d4: 8b 04 24 mov eax,DWORD PTR [esp] 80491d7: c3 ret
08049050 <add@plt>: 8049050: ff 25 10 c0 04 08 jmp DWORD PTR ds:0x804c010 8049056: 68 08 00 00 00 push 0x8 804905b: e9 d0 ff ff ff jmp 8049030 <_init+0x30>
看看0x804c010地址处的值,
lyf@liuyifei:~/test$ readelf -x .got.plt ./mainHex dump of section '.got.plt': NOTE: This section has relocations against it, but these have NOT been applied to this dump. 0x0804c000 08bf0408 00000000 00000000 46900408 ............F... 0x0804c010 56900408 V...
以上分析的结果和上一节的结果吻合,下面通过gdb实际运行,看看lazy binding的效果,
lyf@liuyifei:~/test$ gdb ./main...Reading symbols from ./main...(gdb) set environment LD_LIBRARY_PATH=.(gdb) set disassembly-flavor intel(gdb) break main.c:9Breakpoint 1 at 0x804919e: file main.c, line 9.(gdb) runStarting program: /home/lyf/test/main[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".Breakpoint 1, main () at main.c:99 int rtv = 0;(gdb) disas mainDump of assembler code for function main: 0x08049176 <+0>: lea ecx,[esp+0x4] 0x0804917a <+4>: and esp,0xfffffff0 0x0804917d <+7>: push DWORD PTR [ecx-0x4] 0x08049180 <+10>: push ebp 0x08049181 <+11>: mov ebp,esp 0x08049183 <+13>: push ebx 0x08049184 <+14>: push ecx 0x08049185 <+15>: sub esp,0x10 0x08049188 <+18>: call 0x80491d4 <__x86.get_pc_thunk.ax> 0x0804918d <+23>: add eax,0x2e73 0x08049192 <+28>: mov edx,0x804c01c 0x08049198 <+34>: mov DWORD PTR [edx],0x5=> 0x0804919e <+40>: mov DWORD PTR [ebp-0xc],0x0 0x080491a5 <+47>: mov edx,DWORD PTR [eax-0x8] 0x080491ab <+53>: mov ecx,DWORD PTR [edx] 0x080491ad <+55>: mov edx,0x804c01c 0x080491b3 <+61>: mov edx,DWORD PTR [edx] 0x080491b5 <+63>: sub esp,0x8 0x080491b8 <+66>: push ecx 0x080491b9 <+67>: push edx 0x080491ba <+68>: mov ebx,eax 0x080491bc <+70>: call 0x8049050 <add@plt> 0x080491c1 <+75>: add esp,0x10 0x080491c4 <+78>: mov DWORD PTR [ebp-0xc],eax 0x080491c7 <+81>: mov eax,DWORD PTR [ebp-0xc] 0x080491ca <+84>: lea esp,[ebp-0x8] 0x080491cd <+87>: pop ecx 0x080491ce <+88>: pop ebx 0x080491cf <+89>: pop ebp 0x080491d0 <+90>: lea esp,[ecx-0x4] 0x080491d3 <+93>: retEnd of assembler dump.(gdb) print/x *(void**)0x804c010 = 0x8049056(gdb) print &add = (int (*)(int, int)) 0xf7fba16d <add>(gdb) step10 rtv = add(global_int, global_extern_int);(gdb) x/w 0x804c0100x804c010 <add@got.plt>: 0x08049056(gdb) stepadd (a_=5, b_=2) at add.c:1010 foo();(gdb) x/w 0x804c0100x804c010 <add@got.plt>: 0xf7fba16d(gdb) print &add = (int (*)(int, int)) 0xf7fba16d <add>