TippingPoint Digital Vaccine Laboratories

MindshaRE: Finding Object Constructors

In a previous MindshaRE we touched on the power of searching in IDA. This time I want to revist that subject with an example of searching for constructors in a binary. This is a simple trick I use from time to time when I do not have symbols. Labeling objects before you start reverse engineering can be a huge help.

MindshaRE is our weekly look at some simple reverse engineering tips and tricks. The goal is to keep things small and discuss every day aspects of reversing. You can view previous entries here by going through our blog history.

When I speak of constructors I am describing the code responsible for creating an object in a high level language like C++. When an object is instantiated it must first set up all data necessary to access that object, including properties and methods.

To identify these in a binary we first need to look at an example. Using symbols I have found a test case that is fairly typical.
.text:75464778 Domain__Domain  proc near
.text:75464778    mov     edi, edi
.text:7546477A    push    esi
.text:7546477B    mov     esi, ecx
.text:7546477D    push    edi
.text:7546477E    xor     edi, edi
.text:75464780    lea     ecx, [esi+3Ch]
.text:75464783    mov     [esi], edi
.text:75464785    mov     [esi+4], di
.text:75464789    mov     [esi+8], edi
.text:7546478C    mov     [esi+0Ch], edi
.text:7546478F    mov     [esi+10h], edi
.text:75464792    mov     [esi+38h], edi
.text:75464795    call    sub_7548A2BC
.text:7546479A    push    4
.text:7546479C    lea     ecx, [esi+5Ch]
.text:7546479F    call    CQueue__CQueue
.text:754647A4    push    4
.text:754647A6    lea     ecx, [esi+7Ch]
.text:754647A9    call    CQueue__CQueue
.text:754647AE    push    4
.text:754647B0    lea     ecx, [esi+9Ch]
.text:754647B6    call    sub_754616E7
.text:754647BB    push    4
.text:754647BD    lea     ecx, [esi+0BCh]
.text:754647C3    call    sub_754616E7
.text:754647C8    push    4
.text:754647CA    lea     ecx, [esi+0E0h]
.text:754647D0    mov     [esi+0DCh], edi
.text:754647D6    call    sub_754616E7
.text:754647DB    lea     ecx, [esi+100h]
.text:754647E1    call    RandomChannelGenerator__RandomChannelGenerator
.text:754647E6    push    edi
.text:754647E7    push    edi
.text:754647E8    mov     ecx, esi
.text:754647EA    call    Domain__LockDomainParameters
.text:754647EF    pop     edi
.text:754647F0    mov     eax, esi
.text:754647F2    pop     esi
.text:754647F3    retn
.text:754647F3 Domain__Domain  endp
When looking at constructors we want to see that a structure is being built. As you can see at the top of the function we see a structure being initialized. This is done by zeroing out elements in the object being created. The tip off is the following assembly.
.text:7546477E    xor     edi, edi
.text:75464780    lea     ecx, [esi+3Ch]
.text:75464783    mov     [esi], edi
.text:75464785    mov     [esi+4], di
.text:75464789    mov     [esi+8], edi
.text:7546478C    mov     [esi+0Ch], edi
.text:7546478F    mov     [esi+10h], edi
.text:75464792    mov     [esi+38h], edi
Edi becomes a zero register, and then is used to zero out elements in the structure being created using esi as the base pointer. To double check that the object being initialized is new, we can look at this functions caller.
.text:75461900 Controller__ApplicationCreateDomain proc near
...
.text:7546191C    push    108h
.text:75461921    call    operator_new
.text:75461926    test    eax, eax
.text:75461928    pop     ecx
.text:75461929    jz      short loc_75461936
.text:7546192B    mov     ecx, eax
.text:7546192D    call    Domain__Domain
The call to operator_new() will create a memory region of 0x108 for our instantiated Domain object. At this point we are certain this is an object constructor.

The previous example is easy to find with symbols. IDA will label constructors ClassName::ClassName which turns into ClassName__ClassName in the UI. But what happens when we are stripped of our precious symbolic information? We will have to resort to pattern recognition and IDAPython.

In this sample script we will have a few basic requirements. First we need a function that uses a zero register. It will also have to use that zero register to initialize structure variables. You can see in the example above edi will be our zero register and the mov to structure offsets will be our initialization. The code will look like this:
def instruction_match(ea, mnem=None, op1=None, op2=None, op3=None):
    if mnem and mnem != GetMnem(ea):
        return False

    if op1 and op1 != GetOpnd(ea, 0): return False
    if op2 and op2 != GetOpnd(ea, 1): return False
    if op3 and op3 != GetOpnd(ea, 2): return False

    return True

segbeg = SegByName(".text")
segend = SegEnd(segbeg)

for ea in Functions(segbeg, segend):
    function_name = GetFunctionName(ea)
    beg           = ea
    end           = FindFuncEnd(beg)
    zero          = False
    count         = 0
    curea         = beg

    while curea <= end and curea != BADADDR:
        mnem = GetMnem(curea)
        
        if "xor" in mnem or "mov" in mnem:
            if instruction_match(curea, "xor", "edi", "edi"):
                zero = True
            elif zero and "mov" in mnem:
                # mov     [esi+4], edi
                optype = GetOpType(curea, 0)
                if optype == 4:
                    op = GetOpnd(curea, 1)
                    if op in ["edi", "di"]:
                        count += 1
                        
        curea = NextHead(curea, end)
    
    if count > 4:
        log("%x\n" % beg)
This script essentially implements our requirements. It will loop through all functions in the binary searching each line for a zero register and that register being used on a structure. Running this script gives us a slew of addresses to investigate, one being our original example at 0x75464778.

There are a couple of issues with this script. First the compiler decides what register to use for the zero register and the structure. For instance look at the following example.
.text:754748D2    xor     ebx, ebx
.text:754748D4    lea     ecx, [edi+20h]
.text:754748D7    mov     dword ptr [edi], offset CConfDescriptorListContainer___vftable_
.text:754748DD    mov     [edi+10h], ebx
.text:754748E0    mov     [edi+14h], ebx
.text:754748E3    mov     [edi+18h], ebx
.text:754748E6    mov     [edi+1Ch], ebx
.text:754748E9    call    sub_7548A2BC
It is very similar to the first example, and our pattern still applies. However, the compiler has chosen to use ebx for the zero register and edi for the object structure. To make our script better we would want to add these other possible registers.

A second problem is the lack of robustness in the script. For instance we do not track the zero register. If it changes, our script will still believe it is being used for initialization. Also we do not handle local variables being initialized to zero. In IDA the OpType for locals and structure offsets is the same.

Solving these problems would not be extremely difficult. With some additional text processing, and more robust requirements, our script could provide a reverse engineer with a handy tool for locating those crucial constructors in a binary. I hope this can be of some use in the future. As always if you have some additional ideas, or contributions, please leave a comment.

-Cody
Tags:
Published On: 2008-12-18 14:27:27

Comments post a comment

No comments.
Trackback