Speed!

Lately I’ve been doing some programming in my spare time, and working on putting the proverbial polish on some applications. Among other things, I’ve been using Shark to profile applications and look for hot spots to optimise.

Now before I say anything else, I just have to say that if you develop for OS X and you haven’t used Shark, you don’t know what you’re missing. It’s performance profiling how it should be done: simple to use and easy to understand the output. Any non-trivial application could do with optimisation. Even if CPUs are fast enough to run slow code at acceptable speeds, users appreciate snappy applications. And what’s to say they don’t want to use spare cycles for something else? I’m compiling SDLMAME in the background while I blog, for example. And CPUs use less power when idle, so notebook batteries last longer with more optimised applications.

Anyway, getting back to the topic at hand, using Shark was a very interesting experience. I was quite surprised at some of my findings.

First of all, on the topic of libxml2 (the Gnome XML library). Now libxml2 is really cool. It can read an entire XML file into a friendly structure that you can walk forwards and backwards, and edit and even write out as an XML file again. It also validates the document as it goes. I wasn’t using it for any of these reasons, though. I was using it because it’s included with OS X so I could conveniently dynamically link to it. But libxml2 was turning out to a major performance bottleneck in my application. The interesting thing was it was taking about twice as long to deallocate an XML document structure as it took to pare the 24 MB XML file in the first place!

Now my first thought was something along the lines of, “Gee, garbage collection really is a good idea!” If you think about it, garbage collection would let you completely side-step the deallocation step: when the heap is swapped, there are no references so the objects aren’t copied. Using a custom allocator in libxml2 could solve the problem, too. But I side-stepped the issue in a different way: I switched to Expat which doesn’t build a structure from the document to begin with. The trouble with that is my application has now grown by a few hundred kilobytes because of the statically linked XML parsing code.

The second think is that Core Foundation is considerably faster than Cocoa when it should be doing the same thing. In critical areas, I could get a speed gain of twenty to fifty percent by casting my Cocoa collection objects to Core Foundation objects and calling the C APIs.

So where’s the catch? Core Foundation is less forgiving than Cocoa. For example, if you try to get an value from a nil dictionary object with Cocoa, you just get nil. Core Foundation will crash. Core Foundation also has no concept of autorelease pools (but that won’t stop you from casting a CFTypeRef to id and autoreleasing it).

All in all, I can’t recommend performance profiling enough. It’s always interesting, and often surprising. And admit it, everyone loves trying to make stuff go faster 😉

This entry was posted on Tuesday, 19 September, 2006 at 2:15 pm and is filed under Apple, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One response to “Speed!”


Florian says:

Hi,
I found your blog via google by accident and have to admit that youve a really interesting blog 🙂
Just saved your feed in my reader, have a nice day 🙂

Leave a Reply