For me, Chrismakkuh came early this year. Saturday afternoon, my girlfriend and I went to the Apple Store and I picked up a copy of Mac OS X 10.5 “Leopard”.
(Yes, I have a girlfriend. I realize that the fact that I’m posting a blog entry about an operating system means that she probably lives in Canada and none of my friends have ever actually seen her, but, dude, she totally exists. We met at Niagara Falls last year. Really.)
Leopard includes a lot of new security features, some of them good, some of them meh. Now, I know what you’re thinking: ”Rob, every other security guy/girl on Earth is posting a blog posting about the new security features of Leopard!” You’re absolutely right, they are.
Here’s the difference: I am much, much better looking. Also, I’m going to go into considerably more detail. I’m not writing just one blog entry, not just two, but many. And I’m going to explain the theory and the science behind it all. It’s all well and good to say that Leopard supports Address Space Layout Randomization and that it makes it harder to exploit buffer overflows. See? Right there, that’s as much content as a lot of other blogs, and I’m still writing. That’s because I’m going to explain linkers, loaders, shared libraries, prebinding, everything. It’s like a freshman essay in college; I’m going to get as many pages of filler in here as possible. Also, it’ll be educational.
So, I hereby present to you:
Mac OS X 10.5 “Leopard” New Security Features - Part I: Address Space Layout Randomization
Computer Memory, Shared Libraries, Linkers, and Loaders
Libraries are collections of commonly used code. Libraries exist for everything - XML parsing, text input, graphics, windowing systems...everything. This is incredibly useful, of course, in that it promotes code reuse. No need to reinvent the wheel if there’s one already available (actually, given the current state of software development, no need to reinvent the wheel when you have 1.7 x 10^27 vaguely round objects to choose from).
Libraries have existed in some form or another since the very first days of programmable computers. Some of the earliest libraries were on paper - books of example code that programmers could include in their own programs. As computer storage got larger and programmers got lazier, libraries moved into electronic format. Even then, libraries were still bits of code literally copied - sometimes by the programmer - into a new program. Eventually, some clever folks wrote a program, known as a link editor or linker that would analyze programs and automatically pull code from pre-compiled libraries and link it into the new program where it was referenced. This was revolutionary, and there was much rejoicing.
The fly in the ointment (there’s always one) was that computer memory was still expensive, and this approach used a lot of memory. If Fred Foobar and Suzie Cobol both ran a copy of a program that had pulled functions from a library, two copies of that code were in memory, requiring double the space. This was because the code was literally taken from the library and copied whole cloth into the new program. Sure, it saved development time and helped keep the peasants from revolting and such, but it also made people who had to pay for memory very unhappy.
Then, some more clever people had another great idea. Advances in hardware-assisted memory management and operating system design made it easier to implement shared libraries. These were libraries that were linked with a program not when it was compiled, but when it was run. When a program was run, all the libraries it needed were located and loaded into memory and the program’s code was patched at runtime to reference the new addresses of the library functions. The best part about this was that a library needed to be loaded only once - if two programs used the same library, it was still loaded only once. This was surely the pinnacle of operating system technology and everyone was pleased.
A special program, known as the loader was responsible for, well, loading programs and fixing up their internal references to the actual addresses of the library functions in memory. Generally at this point, libraries were loaded into the same address in memory every time, or at least into a known set of addresses in memory, and the main program was also loaded into a known address. This made things simpler, and in those halcyon days, nobody worried much about security.
Besides, how could something like this be a security flaw? Well, it’s not a security flaw so much as something that makes exploiting security flaws easier. Say, for example, that I manage to inject code into a running program in this situation. I know exactly where the program is in memory, and (assuming I know about the program), I know the addresses of all the libraries it uses. I call therefore call any code in those libraries very easily. It’s almost as though my code was loaded by the operating system itself.
Most modern operating systems (and Windows) used this method of handling libraries until very recently. As computers became more and more connected and exposed to more and more potentially malicious input, it become obvious that a way of making the execution of injected code more difficult would be something very useful. One of the techniques that was devised was called Address Space Layout Randomization (ASLR).
ASLR does just what it says on the tin: it randomizes the layout of address spaces. Libraries and executables aren’t always loaded in the same place every time. While this may not prevent an attacker from injecting malicious code into a running process, at least it makes him have to work harder to find useful code to call.
Now, popping the conversation stack, let’s get back to the point of all this: Address Space Layout Randomization in Leopard. Leopard is the first version of Mac OS X to implement ASLR, and, like any first-time effort, there are a few...caveats.
What are the caveats? Well, for one, Leopard isn’t entirely egalitarian. Not all libraries have their load addresses randomized. Sure, some is better than none, but some of the libraries at fixed addresses include things like the loader itself. That means that, if I can get code injected and run, I can access functions in the loader to find other libraries’ addresses (or load new ones). This still makes it harder to run malicious code, since the malicious code has to do more work before it can get down to the dirty business of exploitation, but it’s just a speed bump, not a roadblock.
Another problem is that library load addresses are not randomized per-process, only per-machine. Also, the random addresses are persistent - the same address will be used next time the machine is booted too. Now, to be fair, this is not just a problem with Leopard: Linux and other operating systems also have this problem. This problem is a byproduct of another technology called prebinding (or prelinking).
What is prebinding? I’m so glad you asked. Prebinding addresses one of the problems with shared libraries: the fact that it takes longer to load a program that uses shared libraries than it does one that doesn’t use them. This is because the loader has to fix up references in programs to point to the library addresses in memory, and these can change. Prebinding goes through and pre-calculates all of the load addresses for libraries known on the system and figures out where to load them in memory next time. This saves time at program startup by doing the relocation once for everything. Think of it as a compromise between the speed of loading statically linked programs (since the loader doesn't have to figure out addresses for all the libraries every time, unless there's a conflict) and dynamically linked programs (since the libraries can still be shared between processes).
Prebinding is great, in that it speeds up the loading of applications by quite a bit, but it does mean that randomization can only be done at prebinding time. On my CentOS 5 Linux machine, for example, this is done every fourteen days. That means that, once a fortnight, all my libraries will get new load addresses. On Leopard, it looks like prebinding is done any time a system library is upgraded.
So does prebinding mean that ASLR is useless? No, not at all. Prebinding is still done on a per-machine basis, and at prebinding time the load addresses of libraries are still randomized. That means that every Mac running Leopard will have different load addresses for libraries. An exploit that works on one Mac probably won’t work on another one, at least not unless it goes through the added effort of trying to locate itself and libraries in memory.
So, what does all this mean? Is ASLR on Leopard going to save the world? Well, no. It’s not a panacea. There are still some flaws (I don’t consider the prebinding issues flaws, but I do consider that not all libraries’ addresses are randomized an issue). It’s still leaps and bound better than what Tiger had, and I can only assume it will get better.
Okay, that’s it for today. Read chapters 7 - 10 tonight and stay tuned for next time, where we cover Leopard's Code Signing!