George V. Reilly

Accidentally Quadratic: Python List Membership

We had a per­for­mance regression in a test suite recently when the median test time jumped by two minutes.

We tracked it down to this (simplified) code fragment:

task_inclusions = [ some_collection_of_tasks() ]
invalid_tasks = [t.task_id() for t in airflow_tasks
                 if t.task_id() not in task_inclusions]

This looks fairly in­nocu­ous—and it was—until the size of the result returned from some_­col­lec­tion_of_­tasks() jumped from a few hundred to a few thousand.

The in comparison operator con­ve­nient­ly works with all of Python's standard sequences and col­lec­tions, but its efficiency varies. For a list and other sequences, in must search continue.

Now You Have 32 Problems

Some people, when confronted with a problem, think “I know, I'll use regular ex­pres­sions.” Now they have two problems.

— Jaime Zawinksi

A Twitter thread about very long regexes reminded me of the longest regex that I ever ran afoul of, a par­tic­u­lar­ly horrible multilevel mess that had worked acceptably on the 32-bit .NET CLR, but brought the 64-bit CLR to its knees.

Whenever I ran our ASP.NET web ap­pli­ca­tion [on Win64], it would go berserk, eat up all 4GB of my physical RAM, push the working set of IIS's w3wp.exe to 12GB, and max out one of my 4 cores! The only way to maintain any sanity was to run iisreset every 20 minutes to continue.

Old Presentations

I uploaded some pre­sen­ta­tions to Speak­erDeck.com tonight.

Here are various pre­sen­ta­tions of mine at Speak­erDeck.com and SlideShare.net:

LKRhash: Scalable Hash Tables

LKRhash is a hashtable that scales to multiple processors and to millions of items. LKRhash was invented at Microsoft in 1997 by Per-Åke (Paul) Larson of Microsoft Research and Murali Krishnan and George Reilly of Internet In­for­ma­tion Services. LKRhash has been used in many Microsoft products. The techniques that give LKRhash its per­for­mance include linear hashing, cache-friendly data structures, and fine-grained locking.

If Microsoft had had 20% time, LKRhash would have been my main 20% project. I put a lot of continue.

Flame Graphs and Flame Charts

I was in­ves­ti­gat­ing the per­for­mance of a web app today, and I spent some time looking at the Flame Chart vi­su­al­iza­tion in Chrome's profiling tools, which helped identify some problems.

Flame Charts are like Brendan Gregg's Flame Graphs, except that the charts are sorted by time, while the graphs are sorted al­pha­bet­i­cal­ly.

Quoting from Gregg's recent ACM Queue article:

A flame graph has the following char­ac­ter­is­tics:

Profiling

Despite being a bona fide per­for­mance expert—I spent a couple of years as the Per­for­mance Lead for Mi­crosoft­'s IIS web server product about 15 years ago—I still forget to measure rather than assume.

I wrote some code today that imported nearly 300,000 nodes into a graph from a 500MB XML file. The code was not par­tic­u­lar­ly fast and I assumed that it was the XML parser. I had been using the built-in streaming parser, cEle­ment­Tree iterparse. I assumed that using the lmxl iterparse would make the code faster. It didn't.

Then I had the bright idea of tem­porar­i­ly disabling the per-node processing, which left only the XML parsing. Instead of continue.

Decrementing Loops

The canonical for-loop in C and C++ is written thus, counting up i = 0, i = 1, ..., i = N-1:

for (int i=0; i < N; i++) {
    // loop body
}

(In C, you have to declare int i before the for-loop.)

Let's unpack that for-loop into an equivalent while-loop:

int i = 0;
while (i < N) {
    // loop body
    i = i + 1
}

In other words, we initialize i to zero. Then, before every execution of either loop, we check i < N. If i is still within bounds, we execute the loop body. Then we postin­cre­ment continue.