TippingPoint Digital Vaccine Laboratories

MindshaRE: Arithmetic in Assembly

In a previous MindshaRE we looked at loops in assembly.  I feel writing chunks of code in c and then disassembling them is an important bridge between higher level languages and assembly when you first start reversing.  This allows you to quickly recognize patterns and translate those back into their higher level representation.  Doing this is imperative to understanding a binary when reversing.  So today we will do a short followup, and look at arithmetic in assembly.  Covering the easy ones down to differences in optimization and a few, more complex, examples.

MindshaRE is our weekly look at some simple reverse engineering tips and tricks.  The goal is to keep things small and discuss every day aspects of reversing.  You can view previous entries here by going through our blog history.

Obviously basic arithmetic is an integral part of any program, and programming language.  It should become second nature when looking at assembly to visualize the higher level tokens being used to produce the assembly you are looking at.  Lets break them down.  While some are obvious, optimizations can make things slightly more complex.  The first group of examples will be compiled with the Microsoft Visual C++ Compiler cl.exe using the following command line.

cl /Fearith.exe /Fdarith.pdb /Zi arith.c

In the subsequent examples we will compile with the full optimization flags.

cl /Fearith_optimized.exe /Fdarith_optimized.pdb /Zi /Ox /Ot arith.c

When adding the compiler will use the "add" mnemonic.

int add1(int a)
{
    return(a + 1);
}

.text:00401090 add1            proc near
.text:00401090
.text:00401090 a               = dword ptr  8
.text:00401090
.text:00401090                 push    ebp
.text:00401091                 mov     ebp, esp
.text:00401093                 mov     eax, [ebp+a]
.text:00401096                 add     eax, 1
.text:00401099                 pop     ebp
.text:0040109A                 retn
.text:0040109A add1            endp

int add(int a, int b)
{
    return(a + b);
}

.text:004010A0 add             proc near
.text:004010A0
.text:004010A0 a               = dword ptr  8
.text:004010A0 b               = dword ptr  0Ch
.text:004010A0
.text:004010A0                 push    ebp
.text:004010A1                 mov     ebp, esp
.text:004010A3                 mov     eax, [ebp+a]
.text:004010A6                 add     eax, [ebp+b]
.text:004010A9                 pop     ebp
.text:004010AA                 retn
.text:004010AA add             endp

Seems simple enough.  Keep in mind that this is unoptimized code.

Subtraction, as you could guess, is almost identical.

int sub1(int a)
{
    return(a - 1);
}

int sub(int a, int b)
{
    return(a - b);
}

.text:004010B0 sub1            proc near
.text:004010B0
.text:004010B0 a               = dword ptr  8
.text:004010B0
.text:004010B0                 push    ebp
.text:004010B1                 mov     ebp, esp
.text:004010B3                 mov     eax, [ebp+a]
.text:004010B6                 sub     eax, 1
.text:004010B9                 pop     ebp
.text:004010BA                 retn
.text:004010BA sub1            endp

.text:004010C0 sub             proc near
.text:004010C0
.text:004010C0 a               = dword ptr  8
.text:004010C0 b               = dword ptr  0Ch
.text:004010C0
.text:004010C0                 push    ebp
.text:004010C1                 mov     ebp, esp
.text:004010C3                 mov     eax, [ebp+a]
.text:004010C6                 sub     eax, [ebp+b]
.text:004010C9                 pop     ebp
.text:004010CA                 retn
.text:004010CA sub             endp


Alright, pretty self explainatory.  How about multiplication?

int mult(int a, int b)
{
    return(a * b);
}

.text:004010E0 mult            proc near
.text:004010E0
.text:004010E0 a               = dword ptr  8
.text:004010E0 b               = dword ptr  0Ch
.text:004010E0
.text:004010E0                 push    ebp
.text:004010E1                 mov     ebp, esp
.text:004010E3                 mov     eax, [ebp+a]
.text:004010E6                 imul    eax, [ebp+b]
.text:004010EA                 pop     ebp
.text:004010EB                 retn
.text:004010EB mult            endp

"imul" is used as the "Signed Multiplication" instruction.  It has a few variations but in general mutiplies the destination by the source ( eax * b ) and stores it in the destination.  A shortened version, using only one operand, will multiply the accumulator

Division can be the most complicated.

uint udiv(uint a, uint b)
{
    return(a / b);
}

int sdiv(int a, int b)
{
    return(a / b);
}

.text:00401120 udiv            proc near
.text:00401120
.text:00401120 a               = dword ptr  8
.text:00401120 b               = dword ptr  0Ch
.text:00401120
.text:00401120                 push    ebp
.text:00401121                 mov     ebp, esp
.text:00401123                 mov     eax, [ebp+a]
.text:00401126                 xor     edx, edx
.text:00401128                 div     [ebp+b]
.text:0040112B                 pop     ebp
.text:0040112C                 retn
.text:0040112C udiv            endp

.text:00401130 sdiv            proc near
.text:00401130
.text:00401130 a               = dword ptr  8
.text:00401130 b               = dword ptr  0Ch
.text:00401130
.text:00401130                 push    ebp
.text:00401131                 mov     ebp, esp
.text:00401133                 mov     eax, [ebp+a]
.text:00401136                 cdq
.text:00401137                 idiv    [ebp+b]
.text:0040113A                 pop     ebp
.text:0040113B                 retn
.text:0040113B sdiv            endp

As you can see we have two versions.  The first function udiv is the division of unsigned integers, and the second is the signed form.  With unsigned division we can use the simple "div" mnemonic.  It will divide eax by its source operand.  In this case you see we move argument 1 to eax then issue the "div" instruction with argument 2 as its operand.  This is faily straight forward except we have to remember that we may also have a remainder.  The remainder for a 32 bit operation will be stored in EDX and the quotient in EAX like we stated.  This often has a representation in assembly documentation of EDX:EAX.

Signed division requires an additional step, and a different instruction to handle the signed arithmetic.  Before we can divide a signed dword we must convert it to a sign extended quad word using the cdq instruction.  Then we can perform the division using the signed divide instruction "idiv".  Just like our unsigned counterpart we will divide EAX by the first operand storing it back in EAX and the remainder in EDX.  Keep in mind when working with 16-bit, and 8-bit numbers we must use their equivalent registers and they may use a single register for both the quotient and remainder.

The last one we will look at is modulo.  I put this in here because it demonstrates using only the remainder in division.

uint umod(uint a, uint b)
{
    return(a % b);
}

int smod(int a, int b)
{
    return(a % b);
}

.text:00401140 umod            proc near
.text:00401140
.text:00401140 a               = dword ptr  8
.text:00401140 b               = dword ptr  0Ch
.text:00401140
.text:00401140                 push    ebp
.text:00401141                 mov     ebp, esp
.text:00401143                 mov     eax, [ebp+a]
.text:00401146                 xor     edx, edx
.text:00401148                 div     [ebp+b]
.text:0040114B                 mov     eax, edx
.text:0040114D                 pop     ebp
.text:0040114E                 retn
.text:0040114E umod            endp

.text:00401150 smod            proc near
.text:00401150
.text:00401150 a               = dword ptr  8
.text:00401150 b               = dword ptr  0Ch
.text:00401150
.text:00401150                 push    ebp
.text:00401151                 mov     ebp, esp
.text:00401153                 mov     eax, [ebp+a]
.text:00401156                 cdq
.text:00401157                 idiv    [ebp+b]
.text:0040115A                 mov     eax, edx
.text:0040115C                 pop     ebp
.text:0040115D                 retn
.text:0040115D smod            endp

Just like our unsigned and signed division except we only keep the remainder by moving EDX into the return register EAX.

Lets take this one step further and talk about optimization.  I will cover a very very very tiny aspect of optimization as it relates to our examples above.  First thing we must realize is a lot of the complex instruction like "imul", and "idiv" are very slow.  In my documents, which only list 486 clock cycles, an "imul" instruction can take between 13 and 42 clock cycles. "idiv" with a 32-bit memory location can take a guaranteed 44 clock cycles.  This is huge when you consider "inc" takes 1 in almost every case.

Turning on optimization makes the compiler try as hard as it can to reduce these costly instructions.  You will notice our previous example of adding 1 to a argument. The unoptimized output was a sloppy.

.text:00401096                 add     eax, 1

When it should be.

.text:00401094                 inc     eax

Looking these up they both may be the same clock speed, but inc eax takes up fewer bytes, so optimization isnt *always* about speed :).  We can also see the same thing for subtraction.

.text:004010B6                 sub     eax, 1

Becomes.

.text:004010B4                 dec     eax

Not bad.  But the real savings happens in the more complex computations like multiplication and division.  Take a look at multiplying by 2.  Unoptimized we see something different from the "imul" instruction.

int mult2(int a)
{
    return(a * 2);
}

.text:004010B3                 mov     eax, [ebp+a]
.text:004010B6                 shl     eax, 1

Ahh well that makes since, a binary shift of 1 is just like multiplying by 2.  But "shl" take a couple of clock cycles, so lets look at an optimized version.

.text:00416A60                 mov     eax, [esp+a]
.text:00416A64                 add     eax, eax

Well of course.  When optimized, what was specified as multiplication by two, is simplified to the obvious addition of itself resulting in a gain of performance.  Lets look at some other multiplication time savers.

.text:00416A70 mult10          proc near
.text:00416A70
.text:00416A70 a               = dword ptr  4
.text:00416A70 b               = dword ptr  8
.text:00416A70
.text:00416A70                 mov     eax, [esp+a]
.text:00416A74                 lea     eax, [eax+eax*4]
.text:00416A77                 add     eax, eax
.text:00416A79                 retn
.text:00416A79 mult10          endp

Multiplication by 10 can be sped up by saying (a + (a * 4)) * 2.  Notice the use of "lea" for quick pointer aligned multiplication, this is used frequently since "lea" executes relatively fast.

.text:00416A90 mult12          proc near
.text:00416A90
.text:00416A90 a               = dword ptr  4
.text:00416A90 b               = dword ptr  8
.text:00416A90
.text:00416A90                 mov     eax, [esp+a]
.text:00416A94                 lea     eax, [eax+eax*2]
.text:00416A97                 add     eax, eax
.text:00416A99                 add     eax, eax
.text:00416A9B                 retn
.text:00416A9B mult12          endp

Ohh cool we do almost the equivalent of mult10 except half it and add twice.  I see some patterns forming here.

.text:00416AB0 mult14          proc near
.text:00416AB0
.text:00416AB0 a               = dword ptr  4
.text:00416AB0 b               = dword ptr  8
.text:00416AB0
.text:00416AB0                 mov     ecx, [esp+a]
.text:00416AB4                 lea     eax, ds:0[ecx*8]
.text:00416ABB                 sub     eax, ecx
.text:00416ABD                 add     eax, eax
.text:00416ABF                 retn
.text:00416ABF mult14          endp

Again "lea" is used, then subtracting the original from the product, and multiplying by 2.

.text:00416AC0 mult15          proc near
.text:00416AC0
.text:00416AC0 a               = dword ptr  4
.text:00416AC0 b               = dword ptr  8
.text:00416AC0
.text:00416AC0                 mov     ecx, [esp+a]
.text:00416AC4                 mov     eax, ecx
.text:00416AC6                 shl     eax, 4
.text:00416AC9                 sub     eax, ecx
.text:00416ACB                 retn
.text:00416ACB mult15          endp

.text:00416AD0 mult16          proc near
.text:00416AD0
.text:00416AD0 a               = dword ptr  4
.text:00416AD0 b               = dword ptr  8
.text:00416AD0
.text:00416AD0                 mov     eax, [esp+a]
.text:00416AD4                 shl     eax, 4
.text:00416AD7                 retn
.text:00416AD7 mult16          endp


Here is the ubiquitous "shl" instruction being used for multiplication.  We also see the optimization in the last example of getting the source to 16, then subtracting the original source to get 15.

The pattern we see here is to first use our fastest multiplication instruction "lea" then fix up the rest of the computation with slower instructions like "add" and "sub".  This must be fun for the developers of compilers.  But what about division?

.text:004010B0 div10           proc near
.text:004010B0
.text:004010B0 a               = dword ptr  4
.text:004010B0 b               = dword ptr  8
.text:004010B0
.text:004010B0                 mov     eax, 0CCCCCCCDh
.text:004010B5                 mul     [esp+a]
.text:004010B9                 shr     edx, 3
.text:004010BC                 mov     eax, edx
.text:004010BE                 retn
.text:004010BE div10           endp

.text:004010C0 div11           proc near
.text:004010C0
.text:004010C0 a               = dword ptr  4
.text:004010C0 b               = dword ptr  8
.text:004010C0
.text:004010C0                 mov     eax, 0BA2E8BA3h
.text:004010C5                 mul     [esp+a]
.text:004010C9                 shr     edx, 3
.text:004010CC                 mov     eax, edx
.text:004010CE                 retn
.text:004010CE div11           endp

.text:004010D0 div12           proc near
.text:004010D0
.text:004010D0 a               = dword ptr  4
.text:004010D0 b               = dword ptr  8
.text:004010D0
.text:004010D0                 mov     eax, 0AAAAAAABh
.text:004010D5                 mul     [esp+a]
.text:004010D9                 shr     edx, 3
.text:004010DC                 mov     eax, edx
.text:004010DE                 retn
.text:004010DE div12           endp

.text:004010E0 div13           proc near
.text:004010E0
.text:004010E0 a               = dword ptr  4
.text:004010E0 b               = dword ptr  8
.text:004010E0
.text:004010E0                 mov     eax, 4EC4EC4Fh
.text:004010E5                 mul     [esp+a]
.text:004010E9                 shr     edx, 2
.text:004010EC                 mov     eax, edx
.text:004010EE                 retn
.text:004010EE div13           endp

.text:004010F0 div14           proc near
.text:004010F0
.text:004010F0 a               = dword ptr  4
.text:004010F0 b               = dword ptr  8
.text:004010F0
.text:004010F0                 mov     ecx, [esp+a]
.text:004010F4                 mov     eax, 24924925h
.text:004010F9                 mul     ecx
.text:004010FB                 mov     eax, ecx
.text:004010FD                 sub     eax, edx
.text:004010FF                 shr     eax, 1
.text:00401101                 add     eax, edx
.text:00401103                 shr     eax, 3
.text:00401106                 retn
.text:00401106 div14           endp

.text:00401110 div15           proc near
.text:00401110
.text:00401110 a               = dword ptr  4
.text:00401110 b               = dword ptr  8
.text:00401110
.text:00401110                 mov     eax, 88888889h
.text:00401115                 mul     [esp+a]
.text:00401119                 shr     edx, 3
.text:0040111C                 mov     eax, edx
.text:0040111E                 retn
.text:0040111E div15           endp

.text:00401120 div16           proc near
.text:00401120
.text:00401120 a               = dword ptr  4
.text:00401120 b               = dword ptr  8
.text:00401120
.text:00401120                 mov     eax, [esp+a]
.text:00401124                 shr     eax, 4
.text:00401127                 retn
.text:00401127 div16           endp

Here is some examples of unsigned division being optimized.  Whats confusing at first is we multiply then shift to get our result.  The immediate values we see are considered "Magic numbers", or simply numbers you can multiply a constant by to make it mod 2**32.  This lets us then shift the binary number the needed amount to arrive out our result.  Lets see what happens when you do 30 / 15.

           0x0000001e
x          0x88888889
---------------------
0x00000010:0x0000000e

0x00000010 >> 2 = 0x2

Like we saw with our multiplication optimizations we see the same idea applied to division.  See if you can understand the same functions using signed values.

.text:004010B0 div10           proc near
.text:004010B0
.text:004010B0 a               = dword ptr  4
.text:004010B0 b               = dword ptr  8
.text:004010B0
.text:004010B0                 mov     ecx, [esp+a]
.text:004010B4                 mov     eax, 66666667h
.text:004010B9                 imul    ecx
.text:004010BB                 sar     edx, 2
.text:004010BE                 mov     eax, edx
.text:004010C0                 shr     eax, 1Fh
.text:004010C3                 add     eax, edx
.text:004010C5                 retn
.text:004010C5 div10           endp

.text:004010D0 div11           proc near
.text:004010D0
.text:004010D0 a               = dword ptr  4
.text:004010D0 b               = dword ptr  8
.text:004010D0
.text:004010D0                 mov     ecx, [esp+a]
.text:004010D4                 mov     eax, 2E8BA2E9h
.text:004010D9                 imul    ecx
.text:004010DB                 sar     edx, 1
.text:004010DD                 mov     eax, edx
.text:004010DF                 shr     eax, 1Fh
.text:004010E2                 add     eax, edx
.text:004010E4                 retn
.text:004010E4 div11           endp

.text:004010F0 div12           proc near
.text:004010F0
.text:004010F0 a               = dword ptr  4
.text:004010F0 b               = dword ptr  8
.text:004010F0
.text:004010F0                 mov     ecx, [esp+a]
.text:004010F4                 mov     eax, 2AAAAAABh
.text:004010F9                 imul    ecx
.text:004010FB                 sar     edx, 1
.text:004010FD                 mov     eax, edx
.text:004010FF                 shr     eax, 1Fh
.text:00401102                 add     eax, edx
.text:00401104                 retn
.text:00401104 div12           endp

.text:00401110 div13           proc near
.text:00401110
.text:00401110 a               = dword ptr  4
.text:00401110 b               = dword ptr  8
.text:00401110
.text:00401110                 mov     ecx, [esp+a]
.text:00401114                 mov     eax, 4EC4EC4Fh
.text:00401119                 imul    ecx
.text:0040111B                 sar     edx, 2
.text:0040111E                 mov     eax, edx
.text:00401120                 shr     eax, 1Fh
.text:00401123                 add     eax, edx
.text:00401125                 retn
.text:00401125 div13           endp

.text:00401130 div14           proc near
.text:00401130
.text:00401130 a               = dword ptr  4
.text:00401130 b               = dword ptr  8
.text:00401130
.text:00401130                 mov     ecx, [esp+a]
.text:00401134                 mov     eax, 92492493h
.text:00401139                 imul    ecx
.text:0040113B                 add     edx, ecx
.text:0040113D                 sar     edx, 3
.text:00401140                 mov     eax, edx
.text:00401142                 shr     eax, 1Fh
.text:00401145                 add     eax, edx
.text:00401147                 retn
.text:00401147 div14           endp

.text:00401150 div15           proc near
.text:00401150
.text:00401150 a               = dword ptr  4
.text:00401150 b               = dword ptr  8
.text:00401150
.text:00401150                 mov     ecx, [esp+a]
.text:00401154                 mov     eax, 88888889h
.text:00401159                 imul    ecx
.text:0040115B                 add     edx, ecx
.text:0040115D                 sar     edx, 3
.text:00401160                 mov     eax, edx
.text:00401162                 shr     eax, 1Fh
.text:00401165                 add     eax, edx
.text:00401167                 retn
.text:00401167 div15           endp

.text:00401170 div16           proc near
.text:00401170
.text:00401170 a               = dword ptr  4
.text:00401170 b               = dword ptr  8
.text:00401170
.text:00401170                 mov     eax, [esp+a]
.text:00401174                 cdq
.text:00401175                 and     edx, 0Fh
.text:00401178                 add     eax, edx
.text:0040117A                 sar     eax, 4
.text:0040117D                 retn
.text:0040117D div16           endp

Pretty much the same thing just accounting for the signed bit.

Arithmetic in assembly can look daunting, especially when you get into signed division.  I hope this has at least inspired you to dig deeper and try to truely grasp this topic.  It comes in handy when you see a string of instructions that look complicated.  Instead of getting frustrated, you can simply say "oh i know that, its just division by 15".

If you really want to get into this grab a copy of the book "Hackers Delight" by Henry S. Warren Jr.  Be forewarned, that book is intense.

Hope you enjoyed another installment of MindshaRE (Even if I was a day late).

-Cody
Tags:
Published On: 2008-08-15 17:45:15

Comments post a comment

  1. Anonymous commented on 2008-08-19 @ 00:25

    Multiplication by 10 can be sped up by saying (a + (a + (a * 4))).

    correct one should be
    Multiplication by 10 can be sped up by saying ((a + (a * 4)) + (a + (a * 4))).

  2. Cody Pierce commented on 2008-08-19 @ 11:14

    Thanks for the heads up, I meant to put (a + (a * 4)) * 2. In my defense I was sick when I wrote this ;)


Trackback