TippingPoint Digital Vaccine Laboratories
DID YOU KNOW... A raisin dropped in a fresh glass of soda will bounce up and down continually from the bottom of the glass to the top.

MindshaRE: Path Finding

How many times have you been at an address in IDA and wanted to know all the ways you could reach that point? A million? Probably not. But its pretty useful to be able to find all the paths leading to a particular location in a binary. That's why today we are going to cover path finding.

MindshaRE is our weekly look at some simple reverse engineering tips and tricks. The goal is to keep things small and discuss every day aspects of reversing. You can view previous entries here by going through our blog history.

Path finding is a term we use at TippingPoint to describe a way to get from point A to point B in a binary. Aaron and Cameron discussed this subject a little at ToorCon Seattle 2008. Their slides can be found here. Let's break the process of path finding down into its components and provide some automation for your use.

When trying to find a path you must first get all cross references from point B. If point A occurs in one of those cross references we can pull out that list of cross references for further inspection. Most of the time this works, however keep in mind that indirect calls will not be discovered using merely cross references. For that I believe a combination of static and live analysis may be needed.

Once we locate point A in the cross references of point B we must walk down each basic block looking for the path to point B that showed up in the cross references. To achieve this we will use IDAPython to check one instruction at a time for a branch to the next function in the path we made note of while walking the cross references.

Lets take a look at the following example. We have located a function in calc.exe that we want a path to.
.text:01011F1D _mulnumx        proc near
Now we want to see how to get to _mulnumx from a function higher up.
.text:0101239E numpowlongx     proc near
So we want to know how to get from numpowlongx to _mulnumx. Lets look at numpowlongx:
.text:0101239E numpowlongx     proc near
.text:0101239E
.text:0101239E var_4 = dword ptr -4
.text:0101239E arg_0 = dword ptr  8
.text:0101239E arg_4 = dword ptr  0Ch
.text:0101239E       
.text:0101239E       push    ebp
.text:0101239F       mov     ebp, esp
.text:010123A1       push    ecx
...
.text:010123BB loc_10123BB:
.text:010123BB       test    bl, 1
.text:010123BE       jz      short loc_10123CB
.text:010123C0       push    dword ptr [esi]
.text:010123C2       lea     eax, [ebp+var_4]
.text:010123C5       push    eax
.text:010123C6       call    mulnumx
...
A little common sense tells us that following mulnumx well most likely get us to _mulnumx:
.text:01012314 mulnumx         proc near
.text:01012314
.text:01012314 arg_0 = dword ptr  8
.text:01012314 arg_4 = dword ptr  0Ch
.text:01012314
.text:01012314       push    ebp
.text:01012315       mov     ebp, esp
...
.text:01012390 loc_1012390:
.text:01012390       push    esi
.text:01012391       push    ebx
.text:01012392       call    _mulnumx
...
This particular example is easy, but lets write a script to do these steps for us. It will allow us to find a path quickly. If you remember a while back we used a script called get_recursive_xrefs.py to find all xrefs to point B. Lets add a function to the Node class in that script to search for a particular function in the cross references.
def path(self, name, path=None):
    if path == None:
        path = []
    
    path.append(self.name)
        
    if name == self.name:
        return True

    for c in self.children:
        if c.path(name, path):
            return path
        else:
            path.pop()
    
    return False
Now we just need some points to find.
pointa = 0x0101239E
pointb = 0x01011F1D
pointa_name = GetFunctionName(pointa)
pointb_name = GetFunctionName(pointb)

print "[*] Looking for the path from %s to %s" % (pointa_name, pointb_name)
t = get_xref_args(pointb)
path = t.path(pointa_name)
print path
The output gives us something like so:
[*] Looking for the path from numpowlongx to _mulnumx
['_mulnumx', 'mulnumx', 'numpowlongx']
So the call chain is numpowlongx -> mulnumx -> _mulnumx. Alone, this can be very helpful if you want to find your way in smaller functions, or just want some hints. The point of path finding however is to find the specific locations the code executes to get from point A to point B. If we fix up this script a little bit we can get closer. Now we want to walk each function in this list, and look for calls to the subsequent function in the path.
def find_branch(source, dest):
    source_ea = get_function_by_name(source)
    if not source_ea:
        return False
        
    source_start, source_end = get_function_limits(source_ea)
    curea = source_start
    for curea in Heads(source_start, source_end):
        mnem = GetMnem(curea)
        if 'call' in mnem:
            location = GetOpnd(curea, 0)
            if location == dest:
                print "%s:%x" % (source, curea)

    return True
Putting this together with our previous changes delivers the following results.
[*] Looking for the path from numpowlongx to _mulnumx
['_mulnumx', 'mulnumx', 'numpowlongx']
[*] Need to find the branch from numpowlongx to mulnumx
numpowlongx:10123c6
numpowlongx:10123ce
[*] Need to find the branch from mulnumx to _mulnumx
mulnumx:1012392
The output is telling us that at address 0x10123c6 in function numpowlongx a call to mulnumx is made. We can corroborate all of these addresses easily.

numpowlongx:
.text:010123C6       call    mulnumx
.text:010123CE       call    mulnumx
mulnumx:
.text:01012392       call    _mulnumx
That helps us a little bit more. We can now get the specific address of each paths call to the next link in the chain. The last big step is making this information even more usable. If we colored all paths to these calls in the current function, we can easily see how the binary gets from point A to B during execution. Any loops, or conditional branches will be easily identified once we color everything. This is the most important step in my opinion. When you want to find a path you want to see every instruction and basic block that can get there. For instance if we want to go from a recv call to a possible vulnerability we want to determine which conditions must be met to get there.

After some more changes to our script we can see the path colored in the mulnumx function outlining how to get to _mulnumx.



That is a lot more useful in my opinion. At a glance we know what must be done, and every possible way, to get to _mulnumx.

Path finding is an invaluable tactic, especially for tasks such as vulnerability analysis, or exploit development. It takes a lot of the manual work out of tracking down a function by reversing line by line from point A without knowing where point B is. We can take this even futher but color coding conditional branches and their condition check. Imagine being able to quickly tell that the first 4 bytes of a packet must be 0x10203040 to get to a vulnerable function.

The script used in this example can be found here get_path.py but I must warn you. I have not fully tested it, and there could be bugs. I wrote it as an example for this entry. For a better script take a look at Aaron's code which allows you to find paths, and much more. His code will be released some time in the future as he's currently working on cleaning it up.

If you have some additional ideas for path finding please leave a comment.

-Cody
Tags:
Published On: 2008-10-23 13:39:27

Comments post a comment

  1. Mark commented on 2008-10-26 @ 16:55

    Thanks for the really useful post :)


Trackback