George V. Reilly

Profiling

Despite being a bona fide per­for­mance expert—I spent a couple of years as the Per­for­mance Lead for Microsoft’s IIS web server product about 15 years ago—I still forget to measure rather than assume.

I wrote some code today that imported nearly 300,000 nodes into a graph from a 500MB XML file. The code was not par­tic­u­lar­ly fast and I assumed that it was the XML parser. I had been using the built-in streaming parser, cEle­ment­Tree iterparse. I assumed that using the lmxl iterparse would make the code faster. It didn’t.

Then I had the bright idea of tem­porar­i­ly disabling the per-node processing, which left only the XML parsing. Instead of handling 200 nodes per second, I was now parsing more than 10,000 nodes per second. Clearly it was the processing, not the parsing, despite my as­sump­tions.

Time to profile.

I used cProfile to gather some actual data on where time was being spent. The first thing that leaped out at me was that I was spending nearly all the time in str.lower. I cached the result of lower for each node; instead of making 20 million calls, I was now making only 32,000.

The hot spot then moved to a function where I was doing a linear search through an unordered list. I introduced a dictionary and the cumulative time spent in that function dropped from 1,500 seconds to 0.56 seconds.

So much for my intuition.

blog comments powered by Disqus
Raising IOError for 'file not found' » « Caucus 2016