Yesterday, Aproova gave a talk on the details of the Heap-Spray exploit that was used in Aurora.
During the presentation a couple of questions came up and so we will try to address them here.
1. Why was 0x0D0D0D0D used as the NOP instead of the standard 0×90?
In x86, 0×90 is a single byte instruction that does nothing, i.e. No-Operation = NOP. The 1-byte instruction is perfect for use as a NOP slide (or sled) because you don’t have to worry about aligning the shellcode properly. As long as the Instruction Pointer (IP) points to somewhere in the NOP sled, it the sled will lead you to the proper execution of the shellcode.
Because this particular vulnerability is a dangling-pointer vulnerability where an object with a function pointer on the heap was freed, but the pointer to the object still exists, the heap-spray will overwrite the function pointer. So when the particular function is called, control is transfered to the contents of the function pointer.
If 0×90′s were used then it is very likely that 0×90909090 is now the contents of that function pointer. But, in Windows, 0×80000000 is the beginning of the kernel memory. As such, 0×90909090 will cause a jump into kernel space from userspace, which will fail. Thus defeating the purpose of the exploit.
Now that we know 0×90 can’t be used, we need an alternative. In addition, we also need an alternative where the resulting address points to the heap (because that is where the NOP sled is). It was mentioned that 0x0C0C0C0C or 0x0D0D0D0D could be used, which would imply that 0x0C0C0C0C and 0x0D0D0D0D are addresses of the heap.
In this way, when the function is called, the IP is updated with 0x0D0D0D0D, and so the contents of memory location 0x0D0D0D0D is now treated as an instruction. If things were done correctly, then the contents of 0x0D0D0D0D should be part of the NOP sled. Which means, the 1-byte contents of address 0x0D0D0D0D is also 0x0D. (Normally if we could use 0×90, then the contents would have been 0×90 instead of 0x0D – this is interesting because the NOP sled is ALSO a function pointer address. It is dual-use.)
Now 0x0C and 0x0D are 1 byte opcodes for OR. 0x0C is OR AL with and immediate8, 0x0D is OR AX with immediate16 or OR EAX with immediate32. Before we continue, lets point out that from <http://www.c-jump.com/CIS77/CPU/x86/lecture.html#X77_0230_operands_16> we know that if 0x0D is used with an immediate 16, the ACTUAL OPCODE for OR becomes a 2-byte opcode 0×66 0x0D immediate16.
Given our two choices of 0x0C0C0C0C lets see what happens.
0x0C0C0C0C will get decoded as 0x0C (OR) 0x0C (immediate which = OR AL 0x0C. This is a 2 byte instruction.
0x0D0D0D0D0D will get decoded as 0x0D (OR) 0x0D0D0D0D (immediate 32) which = OR EAX 0x0D0D0D0D. This is a 5 byte instruction! [Because 0x66 is not seen, we know that this uses immediate 32 instead of immediate 16, as mentioned in the link above).
In this case, as long as there isn't any important information in EAX or the FLAGS registers, then the OR operation won't affect the operation of the shellcode. So the side-effects of running these instructions are inconsequential.
Now that we know why 0x0D0D0D0D0D or 0x0C0C is used as the NOP, lets get back to the alignment problem.
As mentioned before 0x90 is normally used because its only 1 bytes, so there aren't any alignment issues. Because 0x0D0D0D0D0D is used as the NOP now we have an alignment problem. Take for example, 0x0D0D0D0D0D0D0D0D0D0D[SHELLCODE] is what our object is.
If this object starts at address 0x0D0D0D0D then the exploit will work since it will decode as 2 OR EAX, 0x0D0D0D0D and then the shellcode will get processed.
If the object starts at address 0x0D0D0D0C instead, then the exploit will not work since it will get decoded as OR EAX, 0x0D0D0D0D then OR EAX,0x0D0D0DSH where “SH” is the first byte of the shellcode. Thus the shellcode doesn’t work.
There are a couple of ways to get around this issue. The first is to pad the SHELLCODE with enough 0×90 NOPs. So if there is an alignment problem, the next instruction will be a 0×90 NOP anyways, and that will fix the alignment problem for us. Also because only a maximum of 4 0×90 NOPs are needed, it doesn’t increase the size of the SHELLCODE by much, which is a good thing (since it means the probability of overwriting the function pointer with the SHELLCODE instead of 0x0D0D0D0D is still slow)
2. Why not just use one HUGE NOP sled with one shellcode instead of many many different NOP sleds with shellcodes?
This is a very interesting question since theoretically it works fine. Here are some of the answers we came up with – but this is mostly speculation, please contact me if anyone has the actual answer: