Thoughts on Buffer Overflow protections
#1
Thoughts on Buffer Overflow protections

Just going through old posts on my old home forum where I used to hang out before GreySec. And found som interesting thoughts from a friend who posted there. Just sharing his two cents here. If anyone is wanting to read more about bypassing different protections in binary exploitation:

Sky Wrote:ASLR = Address Space Layout Randomization
- Randomizes memory addresses that programs load data and code in.

DEP = Data Execution Prevention
- Hardware layer feature, NX (AMD) and XD (Intel) bit provide assembly intructions jno(Jump if no overflow) and jo(jump if overflow)

Bypassing ASLR & DEP can prove to be quite difficult, you would have to use polymorphic ASCII shellcode and return orientated programming.
The reason for return oriented programming is to bypass the NX bit and XD bit, which are types of hardware DEP implemented in the processor.

How to bypass ASLR with msvcr71.dll and mona.py
Corelan article: https://www.corelan.be/index.php/2011/07...d-mona-py/

Bypass DEP with return oriented programming.
https://m0rph-1.github.io/Intro-ROP-DEP-Bypass/
https://medium.com/4ndr3w/linux-x86-bypa...4768363c9a
https://medium.com/cybersecurityservices...b3361e50ce
Reply
#2
Quote:DEP = Data Execution Prevention
- Hardware layer feature, NX (AMD) and XD (Intel) bit provide assembly intructions jno(Jump if no overflow) and jo(jump if overflow)

jno/jo have been around as long as the i386 architecture has been, probably longer, but I can only confirm that the instructions are present in the Intel 80386 Programmer's Reference Manual from 1986. While they can be used in a security related way their introduction was not security focused and had nothing to do with the introduction of NX.

NX was a way to tell the CPU what pages of memory should not be executed. This way even if the software was tricked into executing user provided data the CPU could check the NX bit and raise an error. The common tactic of the day was to inject machine code (often called shellcode) into the memory of a running process and then cause the program to jmp to it and execute it. By marking pages as non-executable the CPU could prevent the attack when the software failed to.

Quote:Bypassing ASLR & DEP can prove to be quite difficult, you would have to use polymorphic ASCII shellcode and return orientated programming.

This isn't true, polymorphic ASCII shellcode is a means to get around input restrictions on which characters you use which is a restriction wholely unrelated to DEP and ASLR. ASLR prevent you from knowing what memory address to move program execution to, and DEP as I said above just makes pages as non-executable regardless of being ascii or not.

----

It is true though, DEP plus ASLR together has led to software being a fair bit more difficult to exploit but there are well known techniques for dealing with both.

While NX came around first in the 80s and 90s (though not seeing real adoption until it came onto the x86 architecture), ASLR came about in the early 00s. I want to start by talking about ASLR because its the easiest to understand how to break.

What exactly is ASLR? Quoting from above "Randomizes memory addresses that programs load data and code in." That is accurate, but its important to consider how is randomizes addresses. Notably the way it works is that it randomizes logical chunks of memory, but not details. Imagine software as a book, inside a book you have multiple chapters. Your main piece of software would be one chapter, and each library you use would be another chapter in the book. So you use the Standard C Library, that's one chapter, include a networking library thats another chapter, include some graphics library thats another chapter. What ASLR does is it will randomize the order of the chapters, but not the actual content of the chapters. Its just like this Standard Library chapter lets start you at page 150, networking? Start at page 50, graphics? Page 300. Notice how the numbers are all divisible by 50, ASLR implementations usually will 'align' the data to start at some nice number like that instead of being totally random.

So with some high level details laid out, how do you break ASLR. There are quite a few basic approaches.

ASLR Bypass 1 - Data Oriented Attacks, or don't even try

Sounds a bit weird, but you don't always even need to deal with ASLR, a great example of this is the sudo vulnerability a few months back, CVE-2019-18634. This was an overflow in static data. This meant that other data declared as static that were allocated after the particular buffer could be overwritten. Notably the user_details could be overwritten that included the gid and uid that could be used to execute the askpass program (which it gets from a the environment var SUDO_ASK_PASS). So, you could overflow user details to make it execute whatever program you want as uid=0. No ASLR needed, not even DEP would stop this. These are called Data-Oriented Attacks.

ASLR Bypass 2 - Partial Overwrites

The idea here being that you can gain partial control of another pointer, just overwriting the least-significant bits that might be sufficient to bypass ASLR. For example, lets say you have some code that has a function pointer to puts() which is part of glibc (GNU C Library, pretty common on Linux). The location where glibc is loaded will be randomized but puts() for example might always start at 0x1fc0 into glibc, and a more interesting function like system() will similarly always start at a particular offset, say 0x2f0a.

In these casses the whole address for puts might be like 0x23a6ce398fbe1fc0 and system() at 0x23a6ce398fbe2f0a, and while the '23a6ce398fbe' might be completely unknowable, if you can overwrite just the least significant bits you can just replace those last two bytes (1fc0) with 2f0a and get a pointer to system() without needing to break ASLR.

This is mostly used with overflows on little-endian systems (like x86/64) as while overflowing the first bytes you'll overwrite are the least-significant bytes of a pointer. With Big-endian you'd be overwriting the most-significant bits which is not useful. Though you could use this with a write-what-(relatively)-where style exploit where the 'where' is a relative address and doesn't require knowing the ASLR'd address.You could just target what you overwrite so it only overwrites the least significant bytes.

Sometimes you don't have control over all the bytes being written, the last bytes for example might have a null byte appended, or some other formatting limiting your control. This is still exploitable it just takes more luck and messing with memory to get instructions you want in the right place in memory (see #5)

its also worth noting that while the offsets are not randomized, you do need to know them, and they are not consistent across versions of a library, each recompilation (of the library not the software) can result in a different internal offset.


ASLR Bypass 3 - Opt-Outs

Not everything is randomized by default, some libraries for whatever reason  will decide they are incompatible with randomization and may not opt-in to being randomized. These are getting to be increasingly rare. Additionally, the executable file itself is usually not randomized by default (this is an addition option often called PIE or Position-Independent-Executable). Depending on the application, there may be useful gadgets within the core executable also. Libraries that don't opt-into randomization still exist but are increasingly rare, but in either case, there will be gadgets you can use that are always loaded at the same address.

ASLR Bypass 4 - Bruteforce

On some systems the randomization may be fairly weak, on 32bit machines for example ASLR only had at most 20bits of randomness, down to 12bits of actual randomness in some cases. Meaning just trying the same address over and over again can be effective, albeit loud. This wouldn't be useful for a browser exploit where you only get one shot, but a network service that restarts bruteforce can be viable...just needs to land once.

ASLR Bypass 5 - Spraying

This is somewhat related to bruteforce, but a means to increase your chances of a hit. Heap-spraying is basically where you're able to cause many allocations to occur, ideally large allocations so you end up in effect taking over a very large chunk of memory, fill it mostly with a nop-sled so say if you allocate an entire gb of memory with the nop-sled into useful instructions you have a pretty high chance of hitting it with an intelligent guess.

ASLR Bypass 6 - Memory Leaks/Information Disclosure

This is the most common tactic in modern exploits against hardened software. Basically, its a secondary exploit in addition to whatever exploit actually results in code execution. This secondary exploit often called an information disclosure is used just to leak memory information from the process, specifically pointers. Once you have a pointer leaked, you can use that to figure out where some related objects are in memory. Best case scenario is to leak a function pointer, then you also know the address of all other functions in the same library relative to the one you leaked. There are many different types of attacks that result in information disclosure, uninitialized memory is a big one, a wrapper allocating an object, but not initializing it completely for example,  use-after-free can sometimes be leveraged as a read gadget.

The gist is just once you've leaked an address you can calculate more useful gadgets based off it.

---

Whew, with all that out of the way lets talk DEP/NX.

DEP is really beaten in one of two ways

DEP Bypass 1 - Data Oriented Programming

This is the same as the first ASLR bypass, if you can avoid hijacking control flow, you can avoid needing to trigger DEP. Its not super common, but in a lot of places its still a really power vector to abuse a write gadget against non-control data.

- https://www.cs.purdue.edu/homes/xyzhang/...attack.pdf
- https://huhong-nus.github.io/advanced-DOP/

DEP Bypass 2 - Code-Reuse Attacks

This is a catch-all term for all the attacks spawned since the original "ret2libc" attack.

In the beginning, we had ret2lib. the basic idea was rather than overwritting the return address on the stack with the address of your shellcode (living in user-data regions protected by DEP), just structure the stack so the return looks like a call to a libc function like system("/bin/sh").

Once of the first good writeups about ret2libc that I'm aware of is this email from 1997 - https://web.archive.org/web/200210181146...ive/1/7480

This eventually got generalized into what we now call return-oriented-programming (ROP). ROP basically being based on a couple ideas, firstly, you can chain these calls on teh stack, it didn't have to only be a call to a single function like system(), and two that you didn't actually have to call actual functions, any sequence of instructions ending with a ret can be used. As far as I am aware first described in-detail in 2008 at Blackhat - https://hovav.net/ucsd/talks/blackhat08.html

Very quickly the ROP technique was adopted to more than just returns, leading to the more general categorization of 'code-reuse attacks' taking advantage of this idea that small chains of instructions already existed within a program that could be reused.

- Return oriented Programming Without Returns was published in 2010 - https://hovav.net/ucsd/dist/noret-ccs.pdf
- Jump-Oriented Programming in 2011 - https://www.comp.nus.edu.sg/~liangzk/pap...accs11.pdf
- Also Call-Oriented-Programming but I don't have a link for that

Basically, if you want bypass DEP, you reuse existing code in the application, either to completely spawn a shell, or to create a page of executable memory you can write to.


Modern Mitigations

DEP and ASLR, while extremely important and have pushed the difficulty level of performing modern exploits up, are not the new kids on the block anymore. There are well established techniques that are generally applicable to any reasonably complex software, though they do render some issues on smaller programs impossible to exploit.


When talking about modern mitigations there are a handful worth discussion and that are likely to be seen.

Fine-grained ASLR

Starting with ASLR again, there is on-going research into not just randomizing the offset a library it loaded at but randomizing the order of functions within a library so that knowing one address is not useful to finding out where anything else is located. This is even potentially making its way into the Linux kernel with a patch introduced a couple months ago at the start of February 2020, https://lore.kernel.org/kernel-hardening...intel.com/

It still has some issues (maybe resolved now) but it is an attempt to bring fine-grained ASLR out of theory and into the real world.

Memory Tagging

This actually covers a number of different techniques. All coming down to the one feature called memory tagging. The idea behind memory tagging is that every word of memory will have an additional few bits (designs vary) that carry additional data about the memory. For example, the Morphius chip suggested by the University of Michigan tags all memory with three bits (iirc) indicating it is data, code, pointer to data, or pointer to code.

This way the CPU knows when it receives a instruction to load memory, if the address it is given is not tagGed as a pointer to data it can reject it, or if it gets a jmp if its not a pointer to code or if the instruction isn't code it can reject it.

Pointer Authentication is another use of memory tagging that is actually being used today on iOS devices. This time utilizing some of the wasted space in 64bit addresses as most systems don't actually provide a 64bit memory space.

- https://googleprojectzero.blogspot.com/2...on-on.html

Shadow Stacks

This is supported on insider Window's releases if you have a CPU support Intel Control-flow Enforcement Technology (CET).

If you're familiar with the basic idea of calling conventions, you'd know that its common to push the address to return to onto the stack when making a call. This is how the 'ret' instruction knows where to return execution to when a function ends. Overwritting that stored address is what smashing the stack is all about. Stack Canaries were one attempt to solve this, but often can be bypassed by leaking a canary. Now the idea of shadow stacks has been passed around (its been around for quite awhile), the idea of a shadow stack is that you still put the return address on the stack as normal, but you create a secondary stack, possibly in protected memory (current research is looking at isolated memory) this second stack only contains the return addresses. Its contained somewhere else in memory usually with unallocated pages around it so you cannot smash it. When you return, the shadow stack is compared to the return address, if they don't match the return fails.

With Intel CET this is finally seeing hardware support potentially coming to consumer level products though software only implementations have existed for awhile.

While they do attempt to protect the shadow stack address, it can be leaked at times, the shadow stack does need to be manipulated at times with stuff like stack unwinding with exception handling so there is likely room for abuse, but it is a new mitigation that is likely to heavily impact the future of exploit development.

Control-Flow Guard/Integrity

This is the other side of the coin, while Shadow Stacks protect 'backward edge' control flow compromises (returning being a step backwards in the control flow), CFG is about forward edge protection, protecting calls and jmps from going to bad destinations.

Ultimately, the goal is to detect whether a call or jmp destination is legitimate, or has been compromised.

There are several approaches to this, and it is still a very active area of research.

Memory Tagging has been used for this. While it has other uses too, its use for guarding control flow is certainly noteworthy.

Jmp/Call Instrumentation is fairly common. Instrumenting valid destinations with a special sequence of instructions that indicates it is a legitimate destination. This may also be broken up into several types of sequences to limit the number of possible gadgets that could be used from any particular indirect jump.

There are also some implementations taking advantage of Intel Processor Trace to detect when control flow has entered an invalid state, however I don't believe that has very good performance, but iirc it was pretty effective.

That said, CFG can still be defeated with some thought about the process itself. There are some techniques like Counterfit-Object-Oriented-Programming (https://www.syssec.ruhr-uni-bochum.de/me...land15.pdf) which takes advantage of the difficulty of actually determine what valid control flow is possible when it comes to C++ objects. Memory tagging is susceptible to memory leaks that expose the tags. And instrumentation is often fairly permissive and can lead to a lot of usable gadgets, you just need to be careful about clobbering registers as the gadgets will be larger than what you'd usually use with something like ROP.
Reply
#3
Very good input Dropzone! I stand corrected, thanks for giving your thoughts. Still trying to get into this topic myself but you explained it pretty well Smile
Reply