TippingPoint Digital Vaccine Laboratories
DID YOU KNOW... At the 2007 Black Hat Briefings in Las Vegas, TippingPoint DVLabs had five speakers presenting on a variety of topics.

MindshaRE: Identifying Encryption Functions

Welcome back to another installation of MindshaRE.  This week we will cover identifying a common pattern seen in encryption and compression functions.  The purpose is to quickly identify locations of interest in a binary that may handle this type of activity.

MindshaRE is our weekly look at some simple reverse engineering tips and tricks.  The goal is to keep things small and discuss every day aspects of reversing.  You can view previous entries here by going through our blog history.

When analyzing a binary looking for patterns can help quickly identify what purpose a function may serve.  By doing this we can gain an insight into how a binary works.  There are plenty of patterns you can identify.  In this case we will be discussing functions that handle encryption or compression.

There are hundreds of instructions in Intel assembly language.  Most are never used.  In fact, running some heuristics proves that less than 100 are used (in most cases).  We can use this to our advantage when identifying encryption/compression routines.  These functions in almost every case do bit shifting and flipping.  Doing so requires the usage of a few key instructions such as xor, shl, shr, ror.

Obviously these instructions can be used for many things.  However, in encryption/compression functions they occur in an easily identifiable pattern.  Lets look at a sample from the Kraken bot.
    001AF08F   shl     eax, 4
    001AF092   add     eax, [ebp+var_8]
    001AF095   mov     edi, edx
    001AF097   shr     edi, 5
    001AF09A   add     edi, [ebp+var_C]
    001AF09D   xor     eax, edi
    001AF09F   lea     edi, [esi+edx]
    001AF0A2   xor     eax, edi
One of our hints is the xor.  The xor of two different registers is a tell-tale sign of encryption or compression.  If we can identify a few of these we might be able to automate the identification of such routines.

I have come up with a few metrics to do this.  I give each rule a weight.  My script runs through each function in a binary, and calculates a score.  If a function scores high enough it will print out its location.  This has proved fairly effective at quickly identifying interesting functions.  Here's my rules.
  1. xor of different registers is weighted the highest
  2. shl, shr, ror, rol, and cdq are counted as well, all having a lower score than xor since they occur naturally
  3. If any of these instructions occur in a loop it increases the score
  4. If any of these instructions are in the same basic block it increases the score
I use this weighting system for lots of different purposes, but it seems to work best in the cases of encryption and compression routines.  This is due to the xor.  Like I stated its rare to see xor'ing of different registers, and in the case of a false positive it can be manually verified.

We are always looking for ways to better understand functions in a binary.  Using patterns is a good way to do this quickly.  Try putting this in a script and running it on various binaries.

-Cody
Tags:
Published On: 2008-07-03 13:30:54

Comments post a comment

  1. Anonymous commented on 2008-08-06 @ 12:04

    why dont u share the script ? . Its a good one..
    Thanks.


Links To This Post

  1. Identifiering av krypteringskod | Information och nyheter om krypto
    linked on 2008-07-04 @ 07:56 Show Comment

    Cody Pierce som jobbar på TippingPoint Digital Vaccine Laboratories har skrivit ett blogginlägg om hur du kan identifiera kryptering- och komprimerings-kod med hjälp av att analysera vilka instruktioner som används flitigt.


Trackback