TippingPoint Digital Vaccine Laboratories

MindshaRE: Fixing Functions


IDA's function identification has always frustrated me. I could never understand why seemingly undefined functions weren't discovered during analysis. Recently, while attending RECon, I got my answer. While Ilfak, the creator of IDA, was giving a talk he explained why. He always errors on the side of caution. Meaning, unless he is 100% positive about a function he will leave it up to the user to fix. This isn't such a bad philosophy, unless you are dealing with hundred meg binaries. Regardless, its necessary to fix functions in IDA if you want your cross references, graphs, scripts, and various other actions to work. So today we will talk about fixing functions, from undefined functions as a whole, to changing their beginnings and end, and more.

MindshaRE is our weekly look at some simple reverse engineering tips and tricks. The goal is to keep things small and discuss every day aspects of reversing. You can view previous entries here by going through our blog history.

A function in IDA is defined as a block of code that is the destination of a call instruction. IDA uses varying heuristics and cross references to identify these. One of these heuristics is the typical prologue for msvc/intel.

.text:238BA1DD     push    ebp
.text:238BA1DE     mov     ebp, esp

If IDA can find the prologue, and it has a cross reference, it will /almost/ always be defined. When IDA defines a function it will set a beginning and end, label all the arguments and local variables, and give it a "sub_" name. Once this function is defined we can use the graph view, or IDC GetFunction* methods. All of this helps when browsing through a binary.

The problem is, IDA can't always define a function. I wish I could tell you why in all cases, but in general it has to do with either not being able to identify a contiguous chunk as code, or IDA cannot find proper references to a function. Either way we need to clean everything up the best we can.

One way to begin this search is via the analysis toolbar. Undefined code will be represented as a red stripe.



Clicking on one of those red bars will bring us to the associated address. You will notice the addresses in this code are red as well.



An alternative way to do this is to hit Alt-U. This will search for the next "not function", meaning code that isn't in a function.

Since we have seen a prologue, and we have identified a return we can be pretty sure this should be defined as a procedure. I say procedure because to do this we move our cursor to the first instruction in the prologue and hit "p". Doing this our function above turns from a raw disassembled dead list to the following.

.text:238BA1DD sub_238BA1DD    proc near
.text:238BA1DD
.text:238BA1DD var_4C    = qword ptr -4Ch
.text:238BA1DD var_34    = qword ptr -34h
.text:238BA1DD var_2C    = dword ptr -2Ch
.text:238BA1DD var_28    = dword ptr -28h
.text:238BA1DD var_24    = dword ptr -24h
.text:238BA1DD var_20    = dword ptr -20h
.text:238BA1DD var_1C    = dword ptr -1Ch
.text:238BA1DD var_18    = dword ptr -18h
.text:238BA1DD var_14    = dword ptr -14h
.text:238BA1DD var_10    = dword ptr -10h
.text:238BA1DD var_C     = dword ptr -0Ch
.text:238BA1DD var_8     = dword ptr -8
.text:238BA1DD var_4     = dword ptr -4
.text:238BA1DD arg_0     = dword ptr  8
.text:238BA1DD arg_4     = dword ptr  0Ch
.text:238BA1DD arg_8     = dword ptr  10h
.text:238BA1DD arg_C     = dword ptr  14h
.text:238BA1DD arg_10    = dword ptr  18h
.text:238BA1DD
.text:238BA1DD     push    ebp
.text:238BA1DE     mov     ebp, esp
.text:238BA1E0     sub     esp, 34h

And its cross reference now shows it is as a function.

.data:23932437    db    0
.data:23932438    dd offset aSplit        ; "split"
.data:2393243C    dd offset sub_238BA1DD

In some cases we might even have to dig a little deeper. In this case IDA didn't even flag it as "not function".

.text:238B9FB1 byte_238B9FB1   db 55h, 8Bh, 0ECh
.text:238B9FB4     dd 5318EC83h, 56085D8Bh, 530C75FFh, 0F55CB7E8h, 85F08BFFh
.text:238B9FB4     dd 0F5959F6h, 12A84h, 7D8B5700h, 4C88314h, 107D83h, 0FFC4789h
.text:238B9FB4     dd 10A84h, 0E8458D00h, 5337FF50h, 0F5C0FAE8h, 0CC483FFh
.text:238B9FB4     dd 840FC085h, 0EFh, 0C1F70E8Bh, 40000000h, 1574C18Bh, 25h
.text:238B9FB4     dd 1BD8F780h, 800025C0h, 0FF053FFFh, 2300007Fh, 89C085C1h
.text:238B9FB4     dd 45DBFC45h, 0DC067DFCh
.text:238BA028     db 5

That's because it isn't even discovered as code!  Oh well, I happen to know that 0x55 0x8b 0xec is our prologue. Hitting "c" on this will turn it into code.

.text:238B9FB1     push    ebp
.text:238B9FB2     mov     ebp, esp
.text:238B9FB4     sub     esp, 18h

And then we can hit "p" to define it as a function...but it didn't work. Why not?  Because when I tried to define those bytes as code it couldn't define all of them. So the function has no return, and has a large chunk of raw bytes in the middle of it. Again this is IDA erring on the side of caution.

.text:238BA027                 fadd    ds:dbl_238FD5F0
.text:238BA027 ; ---------------------------------------------------------------------------
.text:238BA02D byte_238BA02D   db 0DDh, 5Dh, 0F0h
.text:238BA030                 dd 0E845DD51h, 241CDD51h, 0B06E8h, 0F85DDD00h, 59EED959h
.text:238BA030                 dd 0DFF855DCh, 41C4F6E0h, 0D8DD1274h, 0DCF045DDh, 0E0DFF85Dh
.text:238BA030                 dd 7A05C4F6h, 0F045DD06h, 83F85DDDh, 7501107Dh, 0F045DD05h
.text:238BA030                 dd 458D59EBh, 77FF50E8h, 70E85304h, 83FFF5C0h, 0C0850CC4h

Hitting "c" again on the data will define the rest of our function. Going back to the top and pressing "p" to define the procedure finally works giving us this.

.text:238B9FB1 sub_238B9FB1    proc near
.text:238B9FB1
.text:238B9FB1 var_2C          = qword ptr -2Ch
.text:238B9FB1 var_18          = qword ptr -18h
.text:238B9FB1 var_10          = qword ptr -10h
.text:238B9FB1 var_4           = dword ptr -4
.text:238B9FB1 arg_0           = dword ptr  8
.text:238B9FB1 arg_4           = dword ptr  0Ch
.text:238B9FB1 arg_8           = dword ptr  10h
.text:238B9FB1 arg_C           = dword ptr  14h
.text:238B9FB1 arg_10          = dword ptr  18h
.text:238B9FB1
.text:238B9FB1                 push    ebp
.text:238B9FB2                 mov     ebp, esp

One of the ways to prevent IDA from being so stubborn when defining code is to turn the auto analysis off by going to Options->General->Analysis and unchecking "Enabled" in the "Analysis" box. Obviously you want to do this only when the initial analysis has been ran.

Great, the IDB is starting to shape up. But we can run into another problem. IDA may have defined a function, but left out some code.



We can easily fix this by locating what we believe to be the end and pressing "e", or navigating to the menu Edit->Functions->Set Function End.



Take a look around the options in Edit->Functions. The two functions, Append and Prepend function tail, will allow us to add code to a function. It's used by highlighting the data you want to add, selecting the appropriate action, and which function it belongs to.

.text:238013E6     retn    10h
.text:238013E6 PlugInMain      endp
.text:238013E6
.text:238013E9 ; -----------------------------------
.text:238013E9
.text:238013E9 loc_238013E9:
.text:238013E9     cmp     dword ptr [esp+4], 20000h
.text:238013F1     jnz     loc_23801485
.text:238013F7     mov     eax, dword_23934560
.text:238013FC     push    esi
.text:238013FD     push    edi

After "Append function tail" becomes.

.text:238013E5     pop     ebp
.text:238013E6     retn    10h
.text:238013E6 PlugInMain      endp
.text:238013E6
.text:238013E9 ; --------------------------------------
.text:238013E9 ; START OF FUNCTION CHUNK FOR PlugInMain
.text:238013E9
.text:238013E9 loc_238013E9:
.text:238013E9     cmp     [esp-4+arg_0], 20000h
.text:238013F1     jnz     loc_23801485
.text:238013F7     mov     eax, dword_23934560
.text:238013FC     push    esi
.text:238013FD     push    edi
.text:238013FE     push    offset aEscript ; "EScript"

This creates a "Function chunk" for the associated function. A function chunk is just an associated block of code that may not be contiguous, or defined, in the parent function.

Finally you have the menu item "Edit Function (Alt-P)". This will allow us to set everything about a function. Using this menu we can change the start, end, frame, color, and variables. This may be your one stop shop for fixing functions. Honestly, I rarely use "Edit Function" because in most cases I can just add function end, or add function tail. Combine that with our other keyboard shortcuts and we can handle most cases.

Undefined functions may not concern you. If you are looking for something very specific, or tracing with a debugger for only interesting functions you might not even need to repair IDA's analysis. However it's always nice when diving into a reversing project to have everything defined, and pretty. Spending some time doing this may save you a lot of headaches in the future.

There exists some plugins that attempt to fix IDA's auto analysis. One I have used is the ExtraPass plugin by Jim Lacy. It can do a pretty good job of defining missed functions, but like every thing, it also misses some functions.

As an aside, some may remember that during the era of IDA 4.8 Microsoft made an update to their system DLL function prologues that badly busted IDA's function detection. Typical prologues which may have resembled the following:
    push    ebp
    mov     ebp, esp
    sub     esp, 34h
Were replaced with:
    mov     edi, edi
    push    ebp
    mov     ebp, esp
    sub     esp, 34h
The addition of the mov edi, edi broke IDA's function prologue analysis which was subsequently fixed in IDA 4.9. Why did Microsoft make this change? The additional 2 bytes of code has no side effects, essentially two NOPs. The answer is hotpatching. In order to Detours style hook a routine you must replace the first 5 bytes of code with a JMP ADDRESS. With the old style function prologues a disassembler was required to determine what the third instructon after the push/mov was, such that it could be reimplemented correctly. The addition of the mov edi,edi ensures that all functions start with the same static 5 bytes which can be safely replaced and reimplemented. This is why since the advent of Windows XP SP2 you no longer have to reboot 100% of the time when installing an update, only 98% of the time ;-) Why a mov edi,edi as opposed to two NOPs? I suppose it's to save a clock cycle.

I hope you enjoyed this weeks MindshaRE.

-Cody
 
Tags:
Published On: 2008-08-21 13:04:12

Comments post a comment

  1. Rolf Rolles commented on 2008-08-28 @ 11:20

    The "code/data separation" problem is formally undecidable, so no disassembler will ever get it correct. http://odobs.cs.uni-dortmund.de:8080/odobs/publication;jsessionid=0EE7ED365AE3A84B53CF67F964C18F78?id=5267704

    Edit function can be useful to correct mistakes with IDA's stack tracking, or when you wish to manually mark a function as being non-returning or a library.

    skape and skywing address the question in the penultimate sentence about mov edi, edi vs. nop/nop: http://www.uninformed.org/?v=8&a=2&t=txt search for "atomic"

  2. Cody Pierce commented on 2008-08-28 @ 13:22

    @Rolf Rolles: Always appreciate the added info, thanks Rolf!


Trackback