TippingPoint Digital Vaccine Laboratories

MindshaRE: Command Line Binary Analysis

I spend a lot of time in the Windows environment. This wasn't always the case, so I try every chance I can, to drop back into a UNIX shell and use commands that do one thing very well. It's my belief that these commands, when chained together, can be very powerful, even when reverse engineering. So I developed a novel approach to binary analysis on the UNIX command line (Cygwin included).

MindshaRE is our weekly look at some simple reverse engineering tips and tricks. The goal is to keep things small and discuss every day aspects of reversing. You can view previous entries here by going through our blog history.

Commands such as find, grep, awk, etc. are all well designed for locating and processing data. Their power really shines when processing flat text files. The idea, as it pertains to reverse engineering, is that all binaries can be effectively broken down into functions, basic blocks, and instructions. If we take this breakdown and create a directory structure from it, we can use familiar command line tools to dig into the assembly.

To do this, I wrote a script that takes a PaiMei PIDA file and writes it into a directory structure. The script works by first creating a directory for each function, and subsequent directories for each basic block in the function. Within the basic block directories is a file for each instruction containing its disassembly.  Each directory and file is named by its address.  For instance
$ find -L .
./0x401000                    <--- Function
./0x401000/0x401000           <--- Basic Block
./0x401000/0x401000/0x401000  <--- Instructions
./0x401000/0x401000/0x401002  <
./0x401000/0x401000/0x401008  <
./0x401000/0x401000/0x401009  <
./0x401000/0x401000/0x40100e  <
./0x401000/0x401000/0x401010  <
Once this is created you can easily access any part of the module by function, or basic block, and access the assembly of those with the commands of your choosing. Let's print the contents of a basic block:
$ for i in `ls`;do echo -n "$i: "; cat $i; echo; done
0x401050: push ebp
0x401051: mov ebp [esp+arg_0]
0x401055: push esi
0x401056: mov eax ebp
0x401058: push edi
0x401059: lea edx [eax+1]
0x40105c: lea esp [esp+0]
Taking this one step further the script creates symlinks for any call inside of a basic block, linking it to the destination functions directory.  In the example below, a directory containing a "_" is a symlink to the callees directory.
$ find -L . -type d
./0x402140/0x4023b9/0x4023c0_0x401050
./0x402140/0x4023b9/0x4023c0_0x401050/0x401050
./0x402140/0x4023b9/0x4023c0_0x401050/0x401060
./0x402140/0x4023b9/0x4023c0_0x401050/0x401067
./0x402140/0x4023b9/0x4023c0_0x401050/0x401070
Since we have all this information at our finger tips we can also use it for a little generic auditing. Here is an example looking for interesting library calls.
$ find . -type f -exec egrep -H 'sprintf|sscanf|recv|bind|accept' {} \;
./0x401050/0x401094/0x4010aa:call ds:_imp__sprintf
./0x401700/0x401700/0x401716:call ds:_imp__sprintf
./0x404850/0x4048e6/0x4048f9:call ds:_imp__accept
./0x406090/0x4061d9/0x4061e6:call ds:_imp__sscanf
./0x406090/0x40624d/0x40625a:call ds:_imp__sscanf
./0x4062e0/0x406340/0x406353:call ds:_imp__sscanf
./0x406c90/0x406d64/0x406d7a:call ds:_imp__sprintf
./0x404060/0x4040a7/0x4040b1:call ds:_imp__recv
./0x40dd20/0x40dd62/0x40dd6b:call ds:_imp__bind
This approach is a little tongue-in-cheek, but fun. Maybe some other ideas will sprout from this. Please share any of your own ideas, or comments. I'd love to hear them.

The script can be gotten from the link below, along with a link to PaiMei for generating the PIDA file.
-Cody
Tags:
Published On: 2009-02-12 12:37:20

Comments post a comment

No comments.
Trackback