MindshaRE is our weekly look at some simple reverse engineering tips and tricks. The goal is to keep things small and discuss every day aspects of reversing. You can view previous entries here by going through our blog history.
There are several reasons to actually use a naming convention. It makes your IDBs easier to read (for everyone), helps you organize functions, variables, and basic blocks, and in general makes your IDBs more professional looking, among other things. If you ever read this blog you know I try really hard to make everything as simple and clear as possible.
Naming convention standards have been debated ad-nauseum by developers since programming languages were created. For some reason everyone insists that their opinion is obviously the best. As long as you, and anyone you interact with, can read your labels and glean the intended information from them you win. Besides readability one of the most important aspects of a good naming convention is simplicity. If it's too complex you won't use it. A naming convention must become a natural extension to the way you reverse.
My personal naming convention is a mixture of Hungarian, UpperCamelCase, and traditional C style. I do this because I need type information, readability, and flexibility. I tend to make my labels longer and descriptive because it's easier to understand.
I have broken down my naming conventions into their respective categories. Let's jump right in.
Functions:
I label my functions more than anything. You have three different views of functions which you'll look at almost every day. Viewing the top of a function, calling a function, or looking at the function window will display the names you have given them. IDA starts you out with the ambiguous "sub_xxxxxxxx" moniker. This is fine but hardly a description of what the function does.
When I have reversed a function I will give it a UpperCamelCase name, trying to be as descriptive as possible. For instance "sub_7E4D5E88" might become "ReadFromFile". One drawback from this method is you have to be mindful of any import names that may conflict. IDA will let you use it, but might assign prototype information to the function. If I wanted to call this function "ReadFile" I might just call it "MyReadFile"
Another common occurrence is to simply append "Wrapper" to functions. In the above example the caller may be renamed to "ReadFromFileWrapper". This can become a little cumbersome when you get 4 wrappers deep. ReadFromFileWrapperWrapperWrapperWrapper just doesn't have the same ring to it. In that case I will just shorten to "ReadFromFileWWWW".
Arguments/Locals:
For arguments and locals I use Hungarian notation for its data type definitions. This seems to be the most descriptive method for associating needed type information with a variable name.
In general arguments and locals are named in similar fashion. The only difference is I will prepend a "arg_" to the arguments name in a function. This lets me easily differentiate between the two. If you need position information as well you can append it to the original making a name like "arg0_", or "arg_4_", whichever is more natural to you.
Let's pretend we have a local integer that contains a count. Using Hungarian notation I would call it "dwCount". To me this specifies its size (I'm assuming dword ints of course) and its purpose. If this was a pointer I'd prepend the name with a "p" to become "pdwCount". I realize people may groan at how this looks. That's fine, but looking at this label I can instantly tell we have a pointer to a 32-bit integer being used as a count. If this was an argument we would use "arg_dwCount" or "arg0_dwCount". To satisfy those whom may not always be on 32-bit platforms you could also label this by size "i64Count".
If we also need the signed information for the data type we can add that as well. Sometimes signed distinction is unnecessary, but I support more information than not. Our above example of a dword integer would be "udwCount", or "sdwCount". And admittedly, the ugliest name "pudwCount" to denote a pointer to an unsigned dword.
Here is a list of the data types I often encounter.
b Byte bCount
w Word wCount
dw Dword dwCount
p Pointer pCount pwCount psdwCount
sz String szName
a Array aNames
s Struct sNames
Alternatively you could also use the c identifiers char, short, or long if you want. Whatever works for you.
Globals:
Global data varies. It could be a handle, jump table, global variable or hundreds of other things. With that said you may need to work on a case-by-case basis. Normally however, I will use the C ALL_UPPERCASE_GLOBAL nomenclature. Since I am use to this as a global variable it works well for me. If we had a jump table that handled packet processing we could name it "PACKET_HANDLER".
Branches:
Branches are your intraprocedural jumps to other basic blocks in the function. IDA names these as "loc_xxxxxxxx". Often times we want to rename this, for instance if we know the branch is a basic block that returns from the function.
For these instances I stick to the old c syntax of lower_case_underscore names. It helps me differentiate between functions and basic blocks easily. It also seems to be more readable in certain cases and stand out less. Lets pretend the basic block currently named "loc_7E4D5F56" returns True. I would label this as "return_true". If it returns false I'd go with "return_false". Some other common labels may be "check_null", "check_counter", "begin_for_loop", and "throw_exception". These labels are useful in explaining basic blocks in a single glance.
Marks:
Bookmarks are used to save a particularly interesting location. In general these can be free form and as descriptive as possible. A good label will tell you why the location is important. An example I often use is "read tcp socket data" or "read from configuration file" keeping everything lower case and forgetting punctuation. I also have the ubiquitous "im here" or "here" mark indicating my last position in the IDB.
Comments:
Comments should be readable and generally a single line. To me it's strange to see multiple lines of comments on a single address. You should insert any data you may have, or references to other addresses if need be. Remember any address IDA has a reference for that you put in a comment can be followed in the IDA GUI.
Creating names using a convention you are comfortable with helps everyone. Try to find something you feel is beneficial and it will become second nature. I don't know how many times I've gone back to an IDB and not known what was going on because I didn't name things properly. Forget trying to open someone else's IDB.
I would be very interested in hearing about the conventions you personally use. I certainly do not think my way is bulletproof or the absolute best. Everything can be improved and expanded. Please leave a comment if you have some suggestions, maybe one day everyone will use a similar style! In the very unlikely case people actually agree on a naming standard I'll draft up a document with more detail that can be used by everyone.
Hope you enjoyed this weeks MindshaRE.
-Cody
