MindshaRE is our weekly look at some simple reverse engineering tips and tricks. The goal is to keep things small and discuss every day aspects of reversing. You can view previous entries here by going through our blog history.
Obviously basic arithmetic is an integral part of any program, and programming language. It should become second nature when looking at assembly to visualize the higher level tokens being used to produce the assembly you are looking at. Lets break them down. While some are obvious, optimizations can make things slightly more complex. The first group of examples will be compiled with the Microsoft Visual C++ Compiler cl.exe using the following command line.
cl /Fearith.exe /Fdarith.pdb /Zi arith.c
In the subsequent examples we will compile with the full optimization flags.
cl /Fearith_optimized.exe /Fdarith_optimized.pdb /Zi /Ox /Ot arith.c
When adding the compiler will use the "add" mnemonic.
int add1(int a)
{return(a + 1);
}
.text:00401090 add1 proc near
.text:00401090
.text:00401090 a = dword ptr 8
.text:00401090
.text:00401090 push ebp
.text:00401091 mov ebp, esp
.text:00401093 mov eax, [ebp+a]
.text:00401096 add eax, 1
.text:00401099 pop ebp
.text:0040109A retn
.text:0040109A add1 endp
int add(int a, int b)
{return(a + b);
}
.text:004010A0 add proc near
.text:004010A0
.text:004010A0 a = dword ptr 8
.text:004010A0 b = dword ptr 0Ch
.text:004010A0
.text:004010A0 push ebp
.text:004010A1 mov ebp, esp
.text:004010A3 mov eax, [ebp+a]
.text:004010A6 add eax, [ebp+b]
.text:004010A9 pop ebp
.text:004010AA retn
.text:004010AA add endp
Seems simple enough. Keep in mind that this is unoptimized code.
Subtraction, as you could guess, is almost identical.
int sub1(int a)
{return(a - 1);
}
int sub(int a, int b)
{return(a - b);
}
.text:004010B0 sub1 proc near
.text:004010B0
.text:004010B0 a = dword ptr 8
.text:004010B0
.text:004010B0 push ebp
.text:004010B1 mov ebp, esp
.text:004010B3 mov eax, [ebp+a]
.text:004010B6 sub eax, 1
.text:004010B9 pop ebp
.text:004010BA retn
.text:004010BA sub1 endp
.text:004010C0 sub proc near
.text:004010C0
.text:004010C0 a = dword ptr 8
.text:004010C0 b = dword ptr 0Ch
.text:004010C0
.text:004010C0 push ebp
.text:004010C1 mov ebp, esp
.text:004010C3 mov eax, [ebp+a]
.text:004010C6 sub eax, [ebp+b]
.text:004010C9 pop ebp
.text:004010CA retn
.text:004010CA sub endp
Alright, pretty self explainatory. How about multiplication?
int mult(int a, int b)
{return(a * b);
}
.text:004010E0 mult proc near
.text:004010E0
.text:004010E0 a = dword ptr 8
.text:004010E0 b = dword ptr 0Ch
.text:004010E0
.text:004010E0 push ebp
.text:004010E1 mov ebp, esp
.text:004010E3 mov eax, [ebp+a]
.text:004010E6 imul eax, [ebp+b]
.text:004010EA pop ebp
.text:004010EB retn
.text:004010EB mult endp
"imul" is used as the "Signed Multiplication" instruction. It has a few variations but in general mutiplies the destination by the source ( eax * b ) and stores it in the destination. A shortened version, using only one operand, will multiply the accumulator
Division can be the most complicated.
uint udiv(uint a, uint b)
{return(a / b);
}
int sdiv(int a, int b)
{return(a / b);
}
.text:00401120 udiv proc near
.text:00401120
.text:00401120 a = dword ptr 8
.text:00401120 b = dword ptr 0Ch
.text:00401120
.text:00401120 push ebp
.text:00401121 mov ebp, esp
.text:00401123 mov eax, [ebp+a]
.text:00401126 xor edx, edx
.text:00401128 div [ebp+b]
.text:0040112B pop ebp
.text:0040112C retn
.text:0040112C udiv endp
.text:00401130 sdiv proc near
.text:00401130
.text:00401130 a = dword ptr 8
.text:00401130 b = dword ptr 0Ch
.text:00401130
.text:00401130 push ebp
.text:00401131 mov ebp, esp
.text:00401133 mov eax, [ebp+a]
.text:00401136 cdq
.text:00401137 idiv [ebp+b]
.text:0040113A pop ebp
.text:0040113B retn
.text:0040113B sdiv endp
As you can see we have two versions. The first function udiv is the division of unsigned integers, and the second is the signed form. With unsigned division we can use the simple "div" mnemonic. It will divide eax by its source operand. In this case you see we move argument 1 to eax then issue the "div" instruction with argument 2 as its operand. This is faily straight forward except we have to remember that we may also have a remainder. The remainder for a 32 bit operation will be stored in EDX and the quotient in EAX like we stated. This often has a representation in assembly documentation of EDX:EAX.
Signed division requires an additional step, and a different instruction to handle the signed arithmetic. Before we can divide a signed dword we must convert it to a sign extended quad word using the cdq instruction. Then we can perform the division using the signed divide instruction "idiv". Just like our unsigned counterpart we will divide EAX by the first operand storing it back in EAX and the remainder in EDX. Keep in mind when working with 16-bit, and 8-bit numbers we must use their equivalent registers and they may use a single register for both the quotient and remainder.
The last one we will look at is modulo. I put this in here because it demonstrates using only the remainder in division.
uint umod(uint a, uint b)
{return(a % b);
}
int smod(int a, int b)
{return(a % b);
}
.text:00401140 umod proc near
.text:00401140
.text:00401140 a = dword ptr 8
.text:00401140 b = dword ptr 0Ch
.text:00401140
.text:00401140 push ebp
.text:00401141 mov ebp, esp
.text:00401143 mov eax, [ebp+a]
.text:00401146 xor edx, edx
.text:00401148 div [ebp+b]
.text:0040114B mov eax, edx
.text:0040114D pop ebp
.text:0040114E retn
.text:0040114E umod endp
.text:00401150 smod proc near
.text:00401150
.text:00401150 a = dword ptr 8
.text:00401150 b = dword ptr 0Ch
.text:00401150
.text:00401150 push ebp
.text:00401151 mov ebp, esp
.text:00401153 mov eax, [ebp+a]
.text:00401156 cdq
.text:00401157 idiv [ebp+b]
.text:0040115A mov eax, edx
.text:0040115C pop ebp
.text:0040115D retn
.text:0040115D smod endp
Just like our unsigned and signed division except we only keep the remainder by moving EDX into the return register EAX.
Lets take this one step further and talk about optimization. I will cover a very very very tiny aspect of optimization as it relates to our examples above. First thing we must realize is a lot of the complex instruction like "imul", and "idiv" are very slow. In my documents, which only list 486 clock cycles, an "imul" instruction can take between 13 and 42 clock cycles. "idiv" with a 32-bit memory location can take a guaranteed 44 clock cycles. This is huge when you consider "inc" takes 1 in almost every case.
Turning on optimization makes the compiler try as hard as it can to reduce these costly instructions. You will notice our previous example of adding 1 to a argument. The unoptimized output was a sloppy.
.text:00401096 add eax, 1
When it should be.
.text:00401094 inc eax
Looking these up they both may be the same clock speed, but inc eax takes up fewer bytes, so optimization isnt *always* about speed :). We can also see the same thing for subtraction.
.text:004010B6 sub eax, 1
Becomes.
.text:004010B4 dec eax
Not bad. But the real savings happens in the more complex computations like multiplication and division. Take a look at multiplying by 2. Unoptimized we see something different from the "imul" instruction.
int mult2(int a)
{return(a * 2);
}
.text:004010B3 mov eax, [ebp+a]
.text:004010B6 shl eax, 1
Ahh well that makes since, a binary shift of 1 is just like multiplying by 2. But "shl" take a couple of clock cycles, so lets look at an optimized version.
.text:00416A60 mov eax, [esp+a]
.text:00416A64 add eax, eax
Well of course. When optimized, what was specified as multiplication by two, is simplified to the obvious addition of itself resulting in a gain of performance. Lets look at some other multiplication time savers.
.text:00416A70 mult10 proc near
.text:00416A70
.text:00416A70 a = dword ptr 4
.text:00416A70 b = dword ptr 8
.text:00416A70
.text:00416A70 mov eax, [esp+a]
.text:00416A74 lea eax, [eax+eax*4]
.text:00416A77 add eax, eax
.text:00416A79 retn
.text:00416A79 mult10 endp
Multiplication by 10 can be sped up by saying (a + (a * 4)) * 2. Notice the use of "lea" for quick pointer aligned multiplication, this is used frequently since "lea" executes relatively fast.
.text:00416A90 mult12 proc near
.text:00416A90
.text:00416A90 a = dword ptr 4
.text:00416A90 b = dword ptr 8
.text:00416A90
.text:00416A90 mov eax, [esp+a]
.text:00416A94 lea eax, [eax+eax*2]
.text:00416A97 add eax, eax
.text:00416A99 add eax, eax
.text:00416A9B retn
.text:00416A9B mult12 endp
Ohh cool we do almost the equivalent of mult10 except half it and add twice. I see some patterns forming here.
.text:00416AB0 mult14 proc near
.text:00416AB0
.text:00416AB0 a = dword ptr 4
.text:00416AB0 b = dword ptr 8
.text:00416AB0
.text:00416AB0 mov ecx, [esp+a]
.text:00416AB4 lea eax, ds:0[ecx*8]
.text:00416ABB sub eax, ecx
.text:00416ABD add eax, eax
.text:00416ABF retn
.text:00416ABF mult14 endp
Again "lea" is used, then subtracting the original from the product, and multiplying by 2.
.text:00416AC0 mult15 proc near
.text:00416AC0
.text:00416AC0 a = dword ptr 4
.text:00416AC0 b = dword ptr 8
.text:00416AC0
.text:00416AC0 mov ecx, [esp+a]
.text:00416AC4 mov eax, ecx
.text:00416AC6 shl eax, 4
.text:00416AC9 sub eax, ecx
.text:00416ACB retn
.text:00416ACB mult15 endp
.text:00416AD0 mult16 proc near
.text:00416AD0
.text:00416AD0 a = dword ptr 4
.text:00416AD0 b = dword ptr 8
.text:00416AD0
.text:00416AD0 mov eax, [esp+a]
.text:00416AD4 shl eax, 4
.text:00416AD7 retn
.text:00416AD7 mult16 endp
Here is the ubiquitous "shl" instruction being used for multiplication. We also see the optimization in the last example of getting the source to 16, then subtracting the original source to get 15.
The pattern we see here is to first use our fastest multiplication instruction "lea" then fix up the rest of the computation with slower instructions like "add" and "sub". This must be fun for the developers of compilers. But what about division?
.text:004010B0 div10 proc near
.text:004010B0
.text:004010B0 a = dword ptr 4
.text:004010B0 b = dword ptr 8
.text:004010B0
.text:004010B0 mov eax, 0CCCCCCCDh
.text:004010B5 mul [esp+a]
.text:004010B9 shr edx, 3
.text:004010BC mov eax, edx
.text:004010BE retn
.text:004010BE div10 endp
.text:004010C0 div11 proc near
.text:004010C0
.text:004010C0 a = dword ptr 4
.text:004010C0 b = dword ptr 8
.text:004010C0
.text:004010C0 mov eax, 0BA2E8BA3h
.text:004010C5 mul [esp+a]
.text:004010C9 shr edx, 3
.text:004010CC mov eax, edx
.text:004010CE retn
.text:004010CE div11 endp
.text:004010D0 div12 proc near
.text:004010D0
.text:004010D0 a = dword ptr 4
.text:004010D0 b = dword ptr 8
.text:004010D0
.text:004010D0 mov eax, 0AAAAAAABh
.text:004010D5 mul [esp+a]
.text:004010D9 shr edx, 3
.text:004010DC mov eax, edx
.text:004010DE retn
.text:004010DE div12 endp
.text:004010E0 div13 proc near
.text:004010E0
.text:004010E0 a = dword ptr 4
.text:004010E0 b = dword ptr 8
.text:004010E0
.text:004010E0 mov eax, 4EC4EC4Fh
.text:004010E5 mul [esp+a]
.text:004010E9 shr edx, 2
.text:004010EC mov eax, edx
.text:004010EE retn
.text:004010EE div13 endp
.text:004010F0 div14 proc near
.text:004010F0
.text:004010F0 a = dword ptr 4
.text:004010F0 b = dword ptr 8
.text:004010F0
.text:004010F0 mov ecx, [esp+a]
.text:004010F4 mov eax, 24924925h
.text:004010F9 mul ecx
.text:004010FB mov eax, ecx
.text:004010FD sub eax, edx
.text:004010FF shr eax, 1
.text:00401101 add eax, edx
.text:00401103 shr eax, 3
.text:00401106 retn
.text:00401106 div14 endp
.text:00401110 div15 proc near
.text:00401110
.text:00401110 a = dword ptr 4
.text:00401110 b = dword ptr 8
.text:00401110
.text:00401110 mov eax, 88888889h
.text:00401115 mul [esp+a]
.text:00401119 shr edx, 3
.text:0040111C mov eax, edx
.text:0040111E retn
.text:0040111E div15 endp
.text:00401120 div16 proc near
.text:00401120
.text:00401120 a = dword ptr 4
.text:00401120 b = dword ptr 8
.text:00401120
.text:00401120 mov eax, [esp+a]
.text:00401124 shr eax, 4
.text:00401127 retn
.text:00401127 div16 endp
Here is some examples of unsigned division being optimized. Whats confusing at first is we multiply then shift to get our result. The immediate values we see are considered "Magic numbers", or simply numbers you can multiply a constant by to make it mod 2**32. This lets us then shift the binary number the needed amount to arrive out our result. Lets see what happens when you do 30 / 15.
0x0000001e
x 0x88888889
---------------------
0x00000010:0x0000000e
0x00000010 >> 2 = 0x2
Like we saw with our multiplication optimizations we see the same idea applied to division. See if you can understand the same functions using signed values.
.text:004010B0 div10 proc near
.text:004010B0
.text:004010B0 a = dword ptr 4
.text:004010B0 b = dword ptr 8
.text:004010B0
.text:004010B0 mov ecx, [esp+a]
.text:004010B4 mov eax, 66666667h
.text:004010B9 imul ecx
.text:004010BB sar edx, 2
.text:004010BE mov eax, edx
.text:004010C0 shr eax, 1Fh
.text:004010C3 add eax, edx
.text:004010C5 retn
.text:004010C5 div10 endp
.text:004010D0 div11 proc near
.text:004010D0
.text:004010D0 a = dword ptr 4
.text:004010D0 b = dword ptr 8
.text:004010D0
.text:004010D0 mov ecx, [esp+a]
.text:004010D4 mov eax, 2E8BA2E9h
.text:004010D9 imul ecx
.text:004010DB sar edx, 1
.text:004010DD mov eax, edx
.text:004010DF shr eax, 1Fh
.text:004010E2 add eax, edx
.text:004010E4 retn
.text:004010E4 div11 endp
.text:004010F0 div12 proc near
.text:004010F0
.text:004010F0 a = dword ptr 4
.text:004010F0 b = dword ptr 8
.text:004010F0
.text:004010F0 mov ecx, [esp+a]
.text:004010F4 mov eax, 2AAAAAABh
.text:004010F9 imul ecx
.text:004010FB sar edx, 1
.text:004010FD mov eax, edx
.text:004010FF shr eax, 1Fh
.text:00401102 add eax, edx
.text:00401104 retn
.text:00401104 div12 endp
.text:00401110 div13 proc near
.text:00401110
.text:00401110 a = dword ptr 4
.text:00401110 b = dword ptr 8
.text:00401110
.text:00401110 mov ecx, [esp+a]
.text:00401114 mov eax, 4EC4EC4Fh
.text:00401119 imul ecx
.text:0040111B sar edx, 2
.text:0040111E mov eax, edx
.text:00401120 shr eax, 1Fh
.text:00401123 add eax, edx
.text:00401125 retn
.text:00401125 div13 endp
.text:00401130 div14 proc near
.text:00401130
.text:00401130 a = dword ptr 4
.text:00401130 b = dword ptr 8
.text:00401130
.text:00401130 mov ecx, [esp+a]
.text:00401134 mov eax, 92492493h
.text:00401139 imul ecx
.text:0040113B add edx, ecx
.text:0040113D sar edx, 3
.text:00401140 mov eax, edx
.text:00401142 shr eax, 1Fh
.text:00401145 add eax, edx
.text:00401147 retn
.text:00401147 div14 endp
.text:00401150 div15 proc near
.text:00401150
.text:00401150 a = dword ptr 4
.text:00401150 b = dword ptr 8
.text:00401150
.text:00401150 mov ecx, [esp+a]
.text:00401154 mov eax, 88888889h
.text:00401159 imul ecx
.text:0040115B add edx, ecx
.text:0040115D sar edx, 3
.text:00401160 mov eax, edx
.text:00401162 shr eax, 1Fh
.text:00401165 add eax, edx
.text:00401167 retn
.text:00401167 div15 endp
.text:00401170 div16 proc near
.text:00401170
.text:00401170 a = dword ptr 4
.text:00401170 b = dword ptr 8
.text:00401170
.text:00401170 mov eax, [esp+a]
.text:00401174 cdq
.text:00401175 and edx, 0Fh
.text:00401178 add eax, edx
.text:0040117A sar eax, 4
.text:0040117D retn
.text:0040117D div16 endp
Pretty much the same thing just accounting for the signed bit.
Arithmetic in assembly can look daunting, especially when you get into signed division. I hope this has at least inspired you to dig deeper and try to truely grasp this topic. It comes in handy when you see a string of instructions that look complicated. Instead of getting frustrated, you can simply say "oh i know that, its just division by 15".
If you really want to get into this grab a copy of the book "Hackers Delight" by Henry S. Warren Jr. Be forewarned, that book is intense.
Hope you enjoyed another installment of MindshaRE (Even if I was a day late).
-Cody
