TippingPoint Digital Vaccine Laboratories
DID YOU KNOW... TippingPoint customers were protected against 0-day exploitation of MS07-017 two years prior to the exploit being discovered in the wild.

MindshaRE: Strings!


In this week's MindshaRE we will take a look at strings.  We will cover some of the obvious uses for strings as well as helpful application of strings in the binary.

MindshaRE is our weekly look at some simple reverse engineering tips and tricks.  The goal is to keep things small and discuss every day aspects of reversing.  You can view previous entries here by going through our blog history.

String examination is a frequent starting point for many reverser engineers.  Whether it is how they begin learning reverse engineering or perhaps how they dive into a new application.  That is not to say some people don't jump straight into the code, but it's generally a good starting point.  I personally examine strings before any other analysis primarily because I can pick up an idea of the binaries purpose, verbosity, history, and other tidbits quickly.  The combination of interesting strings and their cross references to associated library calls allows us to label many functions in a very short period of time.

Here are some interesting examples of strings from a single binary.
004EF594 db '$Workfile: SDIBase.cpp $ Copyright (c) 1998 Selsius Systems Inc.,'
004EF594 db ' all rights reserved.',0
...
004BC8D4 db '<CiscoIPPhoneExecute><ExecuteItem URL="Play:%s"/></CiscoIPPhoneExecute>',0
...
004D34B0 db 'Password',0
...
004E3434 db 'SELECT D.Name,D.tkModel,D.tkClass, N.DNOrPattern, C.Name AS Expr1'
004E3434 db ' FROM  Device D INNER JOIN DeviceNumPlanMap M ON M.fkDevice = D.p'
004E3434 db 'kid INNER JOIN NumPlan N ON M.fkNumPlan = N.pkid LEFT OUTER JOIN '
004E3434 db 'RoutePartition C ON N.fkRoutePartition = C.pkid WHERE(N.DNOrPatte'
004E3434 db 'rn = ',27h,0
Just looking around the strings quickly reveals that the binary in question has portions licensed from another firm, does some form of XML parsing and in some way communicates with a SQL server.

In many cases we can rely on string patterns to partially recover symbolic information. Analyzing a binary with symbols is much easier then parsing through call graphs of anonymous sub-routines.  Often developers will have a layer for logging, debugging, or tracing a process.  In many cases this is for customer support reasons.  If a customer has a problem, the developers, or support personnel, can quickly look at a call trace and determine the problem.  We can use this to our advantage.  In our example we can see a group of strings that appear to be class methods.
.data:004EE93C db 'ServiceParamInfo::SetServParamEventForCTI...',0
.data:004EE96C db 'ServiceParamInfo::SetServParamEventForCCM...',0
.data:004EE99C db 'ServiceParamInfo::SetServParamEventForCEF...',0
.data:004EE9CC db 'ServiceParamInfo::SetRISDCEvent...',0
.data:004EE9F0 db 'ServiceParamInfo::ResetRISDCEvent...',0
.data:004EEA18 db 'ServiceParamInfo::SetSNMPEvent...',0
You can view these by going through the "Strings" view in IDA, or looking through the data sections in most binaries (be aware strings can be in other segments as well).  Now if we follow the cross reference of these strings we see they are pushed to a function.
004957FE  push    offset `string'  ; "ServiceParamInfo::SetServParamEventForC"...
00495803  lea     ecx, [esp+420h+var_418]
00495807  call    sub_496600
Following that we can see a pretty tell-tell sign of a logging function.  I have cut out some assembly and branches for brevity.
...
0041E3B5  push    offset tmpbuf
0041E3BA  push    offset `string'  ; "%s:%s\n"
0041E3BF  push    ecx
0041E3C0  call    ds:_imp__fprintf
So we know that the function sub_496600 handles some sort of trace logging.  If we find all the cross references to this function we may be able to build a list of all the function names that are logged when tracing.



As you can see the functions have names.  This is because I have gone through each cross reference, and renamed the function based off the string contents.  You will also notice there are only 12 cross references, the reason being that in this particular binary there are several functions responsible for logging.  You may need to hunt down many different routines and repeat this process.  If you are interested in a sample script to automate these actions, take a look at resolve_symbols.py.  The script takes any cross reference from the current cursor, walks the pushed arguments for any strings, and applies those as function names to the caller.

Another good use of strings is combining them with imported library calls to create a listing of all external functions being called from a binary.  A good example of this are calls to the registry functions.  By reconstructing these we can generate the following example output:
RegOpenKeyEx( 0x80000002,
              'SYSTEM\CurrentControlSet\Services\Eventlog\Application\Cisco \

Extended Function',
              0x0,
              0x20019,
              *var_10 );
I hope this installment of MindshaRE has given you some ideas on helpful ways to use strings in a binary.  There are many uses for them when reverse engineering a binary.  Don't be shy when it comes to digging into the strings first, there is nothing wrong with having more knowledge about a binary.  If you have any ideas please leave a comment, I am always interested in the way other people reverse engineer.  See you next week.

-Cody
Tags:
Published On: 2008-07-10 13:04:11

Comments post a comment

  1. Eric Monti commented on 2008-07-13 @ 13:23

    Leveraging strings gets an unfair rap by people claiming that it can become a crutch. The reality is reverse engineers routinely leverage strings at some point on almost every project.

    Another use I found for strings recently was to identify the load offset of a unfamiliar executable format. After finding a list of adjacent strings with an associated address table elsewhere in the binary I realized I could identify the load offset just by comparing the table's runtime addresses to the "real" file offsets for the strings.

    I wrote a bit about it at http://www.matasano.com/log/1047/toast-spells-tsaot-in-reverse/

    Anyway, nice post. I like your resolve_symbols.py script. I've been enjoying the MindshaRE series a lot. Don't stop!

  2. Cody Pierce commented on 2008-07-15 @ 11:22

    @Eric: Heh, its true people always clown on strings but you have to use everything you can right?

    That is a great blog entry, I have read it before. Im glad you enjoy the series.

  3. Nate McFeters commented on 2008-07-18 @ 01:35

    Cody, well played. I'm definitely enjoying the MindshaRE section of the blog. I've seen too many thick client applications where I grab strings from a binary that have a hard-coded database connection string... accessible over the Internet, to clown on strings anymore.

    Besides, the quicker the pwnage is finished, the quicker the beer begins to flow.

    -Nate

  4. Rolf Rolles commented on 2008-08-28 @ 12:12

    Leveraging strings gets an unfair rap by people claiming that it can become a crutch. The reality is reverse engineers routinely leverage strings at some point on almost every project.

    Maybe true in mainstream vulnerability analysis, but not in general. Smoke 'em if you've got 'em, but

    * What if the strings are in Chinese?
    * What about indirect addressing in C++ (or compile-time obfuscation), which severely limits the utility of cross-reference tracking and code recognition?
    * What about malware that obfuscates the strings to prevent this kind of analysis?
    * What about application domains (such as AV emulators) where developers are conscious of reverse engineering and #define away all logging functions in release builds?

    Thinking back, almost none of my non-trivial projects over the past few years, especially the C++ ones, benefited much from string references. I use them when they're applicable, they just don't tend to be.

  5. Cody Pierce commented on 2008-09-04 @ 09:25

    @Rolf: You make some good points Rolf. There are certainly cases where situations like the ones you mentioned exist. However in my experience it is an exception.

    Even in those examples using strings, when they exist in some form, is not useless. It just takes a few extra steps before they become useful. For instance, in one of your crackmes the encrypted strings simply needed to be decrypted in the IDB first. After that they proved fairly useful.

    Like you mentioned, strings are regarded by people as a crutch. This may be true for some that lack the patience to dig deeper. But as you said, when reversing, you'll take what you can get :).


Trackback