Let's say, for example, that you're a security administrator charged with maintaining a network usage/security policy for your company. Let's go a step further and say that part of this policy is to block the usage of instant messaging and VoIP applications. Let's go one final step further and assume that you actually care about your job and really want to do this and not simply tell your boss you did and then run down to the bar for a drink.
"It's easy," you think. "Simply block the ports used by these various IM and VoIP applications." Ah-ha, not so fast there. Your modern, more active IM and VoIP applications don't just use one port anymore. You're lucky if they even use TCP or UDP consistently. The modern IM/VoIP application wants to make sure it isn't blocked by silly things like firewalls. It will switch protocols, hop ports, encrypt its traffic, and do all sorts of other nefarious schemes (often while twirling its mustache and tying young ladies to train tracks).
"The job is impossible," you think. Ah, but there you'd be wrong again (you're batting a thousand today, aren't you?). Sure, you can't just decrypt someone else's traffic (because if you could, then it's not really encrypted, is it?), but you can still see the traffic. The little IP datagrams skittering to and fro, hither and yon (or thither, if you prefer). You can still see those packets, and what's in them (even if it is just encrypted gobbledygook (which could also be PHP code)). You can see other interesting things about those packets, too. Things like packet size and interpacket delay.
"Wait," you think. "What if I statistically analyzed a bunch of network traffic and determined what sort of trends (covariance matrices, if you will) tend to appear in certain types of traffic. Why, if I did that, I could identify that pesky IM/VoIP traffic and block it." Well, today's your lucky day, Network Administrator. Someone already has done all that for you. And they'll be presenting it at Black Hat this August.
That's right. Rohit and I will be presenting a paper on this very topic in two weeks. We painstakingly analyzed gigabytes upon gigabytes of traffic (all by hand, in the dark, and uphill both ways), identified statistical trends within them, and wrote software to make educated guesses about what the protocol is that you're looking at. No need to thank us. Virtue is its own reward.
The paper is called "PISA - Protocol Identification via Statistical Analysis". I can't take credit for the idea (that was Rohit), but I came up with the clever acronym. It's also informally called the Duck Protocol Identifier, from the old adage: if the protocol walks like a duck and quacks like a duck, it's probably not an elephant.
(Another way of looking at it: if there's a giant animal in the room covered by a blanket, wouldn't you rather know if it's a friendly, social elephant, or an angry, maladjusted hippopotamus?)
We'll formally present our findings at Black Hat, along with a Python-based framework that you can use yourself. A full description of the math involved will be given, along with lots of pretty graphs, and some examples of various types of encrypted traffic vs other types.
If you happen to be at Black Hat, you should come see the talk. A pleasant time is guaranteed for all.
(Disclaimer: Pleasant time may not be had by all.)
