TippingPoint Digital Vaccine Laboratories
DID YOU KNOW... TippingPoint customers were protected against 0-day exploitation of MS07-017 two years prior to the exploit being discovered in the wild.

Shellcode Detection Using Python


DVLabs has been collecting a large number of documents and files that are flagged as malicious and we're trying to decrease the number that we have to do a full manual analysis on. One of the methods we're using to aid in this is shellcode detection. If shellcode is detected inside the document we can reduce the amount of data we have to look at inside the file to find the attack. The majority of our code is in Python so shellcode detection using a Python module is preferable. The two I'll be looking at in this post are pylibemu and pylibscizzle.


Pylibemu

I prefer to use pylibemu over the standard libemu python bindings because they provide more functionality and also do not increase the difficulty of use. The functionality that most interests me is the test() function that profiles the execution against the win32 API and can give details about the attempted win32 API calls to give you a better idea of what the shellcode is attempting to do once it gains control. If there are any attempted downloads in the shellcode, pylibemu will attempt to retrieve them. Here's an example usage of the Usage:

 

     emulator = pylibemu.Emulator()
    

offset = emulator.shellcode_getpc_test(shellcode)
emulator.prepare(shellcode, offset)
emulator.test()
print emulator.emu_profile_output

Pylibemu is a viable option for use in detecting shellcode, but I found it to be hit and miss. I started testing Georg Wicherski’s libscizzle when it was initially released and found that the results were better on the shellcodes I was testing it on.


Pylibscizzle

The one issue I had with libscizzle was that it was a C++ library and had no Python bindings. That led me to recently write and release cython-based python bindings for it. The one challenge in creating pylibscizzle was the constantly evolving C++ features in Cython. There are still a few functions that are not accessible via Python due to Cython not having a std::string native wrapping, but the core functionality is implemented. Using pylibscizzle is fairly simple, you need to create a pyDetector instance. Inside the initialization function of the pyDetector class, a pyScanner instance is created to retrieve candidate shellcode offsets. These offsets are then used in the detectShellcode function and a value is returned. If the value is 0xffffffff the detection was not successful, if the value is less than zero, then there was an error in detection. Here’s a code snippet showing example usage:

 

     from pylibscizzle import pyDetector,pyScanner

detector = pyDetector(shellcode) offset = detector.detectShellcode()

The scanner module can also be accessed directly to retrieve the candidate offsets and use them for any other analysis.

 

     scanner = pyScanner(shellcode)

offsets = scanner.findCandidates()

Once the offset from pylibscizzle has been obtained, the prepare() and test() functions from pylibemu can be used to look at the emu_profile_output to see libemu’s execution profile of the shellcode. This does not always work, but can be useful in some instances if you are having trouble wrapping your head around what the shellcode is attempting to accomplish. There are also many other python-based disassemblers that can be used


Example

This example will be using the sc1.bin that is provided in the libscizzle distribution but I will only show the first 48 bytes for brevity:

 

     sc  = “eb6e33c0648b403085c0780d568b400c”

     sc += “8b701cad8b40085ec38b403483c07c8b”

     sc += “403cc3608b6c24248b453c8b7c057803”


     ....


     sc = sc.decode(‘hex’)


     detector = pyDetector(sc)

     
     offset = detector.detectShellcode()

 

In this instance the offset returned will be 0x70, but running it through pylibemu will yield an offset of 0x0 on this file. In this instance both are technically correct, and why becomes clear after looking at the disassembly of the first 2 bytes:

 

     
     0x00000000     eb 6e        jmp 0x70

 

It’s clear that they are ultimately agreeing on where the shellcode exists but differ on where it starts. Further analysis on the disassembly can be done using a disassembler such as distorm or you can use radare2 to inspect the instructions and trace the shellcode manually in a more interactive fashion. Radare2 also has the ability to output a graphviz DOT file that illustrates the call structure.


Conclusion

I am currently in the process of building a custom javascript deobfuscator that targets strings deemed suspicious during the deobfuscation process in an attempt to find shellcodes embedded in the code. Once the deobfuscation process is done the plan is to run the decoded strings through pylibscizzle and pylibemu to verify the data has been extracted correctly. I plan to use both because I have had instances where they detect the shellcode offsets at different locations as well one detecting shellcode while the other does not. This will allow for a large database of shellcodes to be built that can be further analyzed for interesting techniques and allow us to more easily identify which of the malicious files that we collect through our harvesting procedures is malicious.

Tags:
Published On: 2011-12-05 05:49:54

Comments post a comment

  1. Anonymous commented on 2011-12-05 @ 09:43

    Did you compare the detection of libscizzle v/s linemu, which one scores better, any stats on that side?

  2. Anonymous commented on 2011-12-05 @ 09:46

    Also do you intent to share your libscizzle python bindings?

    PS: for admin: clicking "Post Comment" causes an error in redirect.php, (on chrome).

  3. Jason Jones commented on 2011-12-05 @ 11:09

    I didn't do any real analysis on which one scores better, but my overall experience was that libscizzle seemed to detect more and false less than libemu.

    I did publish the libscizzle python bindings, but completely forgot to add the links in for all that and other projects. I just updated the post with those links, here's the direct link for pylibscizzle:
    http://code.mwcollect.org/projects/pylibscizzle

  4. Saam commented on 2012-01-09 @ 09:58

    Where does one download libscizzle from? I created account and hit: http://code.mwcollect.org/projects/libscizzle only to get a 403. Please help as this research is very interesting.


Trackback