TippingPoint Digital Vaccine Laboratories
DID YOU KNOW... DVLabs and our Zero Day Initiative were credited with discovering 17 Microsoft vulnerabilities in 2006 alone.

Using Pastebin for Malicious Sample Collection

Services like Malware Domain List, Virus Watch and MalC0de are great for finding URLs of malicious content that may be interesting to collect and they provide us with a great deal of information that we use for further analysis. There are times when I am looking for specific samples and these services can't be used, that's when I turn to Pastebin.

Pastebin is a great service that allows for easy sharing and collaboriation of data. Recently it has been used by various groups for posting personal information, breach data, or statements. I’ve also found it to be a great source for collecting malicious samples and discovering new attack techniques.

Pastebin does not make it easy to collect this data in an automated fashion – their API does not currently provide searching capabilities – but Pastenum by nullthreat from the CoreLan team does. He wrote a nice tool that searches Pastebin/Pastie and gives results out in a nice HTML report, but it would be fairly easy to write the output into a database and create a nice content harvesting tool.

Malicious JavaScript and Shellcode

Some of the most interesting findings are the large number of malicious javascript samples that are turning up in the search results. Simple search queries such as "unescape", "%u9090", and "string.fromCharCode" yield a large number of malicious-looking javascripts. I say "looking" because not all the results are malicious and great care must be taken to make sure filters that would block these do not get written. Most of the results appear to have been added by persons aiming to share with colleagues or people looking for help with a site that has been exploited.

The above screenshot is a sample from an exploit page used by attackers that was found using a search for unescape. There are attempts at obfuscation on the page such as reassigning unescape and string concatenation on the value of the variable bigblock, but then the variable name is left at shellcode and the function is named "HeapSpray" which end up being pretty clear indicators that the page is not benign.

Here's an abbreviated sample that was found on pastebin and then ran through libemu for shellcode detection after transforming the unescaped string into a hex string using a simple regex

sc = "%u204c%u0f60%u1705%u4a80%u203c%u0f60%u630f%u4a80%ueba3"
sc += "%u4a80%u2030%u4a82%u2f6e%u4a80%u4141%u4141%u0026%u0000"
sc += "%ueaa9%uf6a8%uf6ee%ufeb9%ub7bb%ub6b6%u86b7%u8686%u8686"
sc = re.sub(r'\%u([0-9a-fA-F]{2})([0-9A-Fa-f]{2})','\g<2>\g<1>',sc)
e = Emulator()

The above regex removes the %u and swaps the bytes since the string will be treated as little-endian when it is executed.  

Similarly there are quite a few examples of shellcode and other public exploits available. Some of these are repostings of metasploit modules while others are publicly available POCs. Most of the data found using these queries isn't outside of what is known but some interesting things turn up. Some of my favorite searches for finding these are the easiest ones to use: "exploit", "shellcode", "sc" and "\x90". The majority of the exploit source-code I've found with these searches belongs in public exploits, but I have also found analysis from other researchers on exploits that have been used in attacks.

Exploit Toolkits

Pastebin is also very useful to find rendered pages from some of the newer exploit kits that are harder to obtain. An alleged sample of Nice Pack turned up as well (seen below). Using terms like blackhole, eleonore, phoenix pages from all of these exploit kits turned up and helped the process of charting the evolution of the exploit kits based on upload dates. Out of these, BlackHole returns the most results and deserves further attention.

BlackHole Exploit Toolkit

Using pastebin to find the constantly evolving obfuscations used by the BlackHole Exploit Kit has been very helpful for finding a large enough sample base to help find commonalities to help protect our customers. One of the recent obfuscations could be easily found with searches for "function setCharAt" and 'aa="func"' That search yielded many samples that have been added to our database of malicious content. Even searching for "blackhole" yields a large number of pages that can be useful finding fully rendered html.

The above screenshot is a sample from a recent BlackHole-infected site that someone shared on pastebin. Positively identifying these pages as BlackHole enabled finding many more samples and supplement our ever-growing malicious content database.

I also used a recent post on Kahu Security to investigate a new unknown exploit kit page by searching for "ddd=new Date". Microsoft Security Essentials identified some samples as BlackHole-related, but no other vendors on VirusTotal identified them so specifically. This script exhibits using square brackets around a string (["fromCharCode"]) instead of the standard dot-notation (.fromCharCode) that is becoming a popular evasion technique on many of the malicious pages in our database.

Pastebin is a very flexible sharing service and with some extremely simple queries you can find almost anything, including malware. If you have any other good search terms for finding malicious samples please share :)

Published On: 2011-12-14 05:59:58

Comments post a comment

No comments.