George V. Reilly

Regex Conjunctions

Most regular expression engines make it easy to match al­ter­na­tions (or dis­junc­tions) with the | operator: to match either foo or bar, use foo|bar.

Few regex engines have any provisions for con­junc­tions, and the syntax is often horrible. Awk makes it easy to match /pat1/ && /pat2/ && /pat3/.

$ cat <<EOF | awk '/bar/ && /foo/'
> foo bar
> bar
> barfy food
> barfly
> EOF
foo bar
barfy food

In the case of a Unix pipeline, the con­junc­tion could also be expressed as a series of pipes: ... | grep pat1 | grep pat2 | grep pat3 | ....

The longest regex that I ever en­coun­tered was an enormous al­ter­na­tion—a true horror that shouldn't have continue.

Now You Have 32 Problems

Some people, when confronted with a problem, think “I know, I'll use regular ex­pres­sions.” Now they have two problems.

— Jaime Zawinksi

A Twitter thread about very long regexes reminded me of the longest regex that I ever ran afoul of, a par­tic­u­lar­ly horrible multilevel mess that had worked acceptably on the 32-bit .NET CLR, but brought the 64-bit CLR to its knees.

Whenever I ran our ASP.NET web ap­pli­ca­tion [on Win64], it would go berserk, eat up all 4GB of my physical RAM, push the working set of IIS's w3wp.exe to 12GB, and max out one of my 4 cores! The only way to maintain any sanity was to run iisreset every 20 minutes to continue.

Explaining the epilog of fnmatch.translate, \Z(?ms)

I was debugging a filtering directory walker (on which, more to follow) and I was trying to figure out the mysterious suffix that fnmatch.translate appends to its result, \Z(?ms).

fnmatch.translate takes a Unix-style glob, like *.py or test_*.py[cod], and translates it character-by-character into a regular expression. It then appends \Z(?ms). Hence the latter glob becomes r'test\_.*\.py[cod]\Z(?ms)', using Python's raw string notation to avoid the backslash plague. Also, the ? wildcard character becomes the . regex special character, while the * wildcard becomes the .* greedy regex.

A Stack­Over­flow answer partially explains, which set me on the right track. (?ms) is equivalent to compiling the regex with re.MULTILINE | re.DOTALL. The re.DOTALL modifier makes the . special character match any character, including continue.

64-bit Windows 7

I mentioned three weeks ago that I had just repaved my work dev box and installed the 64-bit version of the Windows 7 RC. Nine or ten years after I first ported parts of IIS to Win64, I am finally running my main desktop on 64-bit Windows. With one exception, it's been painless. Programs have just worked, devices have just worked. There are relatively few native x64 ap­pli­ca­tions, but for the most part it doesn't matter. The cases where it does matter—e.g., shell extensions such as Tor­tois­eSVN—are available as 64-bit binaries.

I briefly flirted with using the 64-bit build of Python, but realized that I would have to recompile several eggs as continue.

Ack - Better than Grep

On a Stack­Over­flow question about favorite Vim plugins, I learned about Ack, a re­place­ment for grep that's smarter about searching source trees.

Ack is written in Perl. The built-in :vimgrep is rather slow. It seems to have some Vim-specific overhead, such as creating swap files and executing BufRead autocmds. Ack is noticeably faster, though somewhat slower than GNU grep.

Which would you rather type to search a tree, ignoring the .svn and .git subtrees?

$ ack -i -l foobar
$ grep --exclude='*.svn*' --exclude='*.git*' -i -l -r foobar .

The ack takes 6 seconds to search 4500 files, while the grep completes in 2. This does not count the time that I spent trying continue.

Exuberant Ctags and JavaScript

Exuberant Ctags is an essential complement to Vim: it generates an index of symbol names (tags) for a set of source files. In Vim, just place the cursor on a function name and type C-] to go to its definition.

Ctags works well for most of the languages that I deal with, but falls down badly on modern JavaScript. Its built-in parser simply doesn't handle de­c­la­ra­tions like these:

Sizzle.selectors.filters.animated = function(elem) { // ...
ajaxSetup: function( settings ) {

I came across Unbad's workaround earlier tonight. His code didn't work for me, so I hacked on it until it did:

--langdef=js
--langmap=js:.js
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\{/\1/,object/
--regex-js=/([A-Za-z0-9._$()]+)[ \t]*[:=][ \t]*function[ \t]*\(/\1/,function/
--regex-js=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\(([^)])\)/\1/,function/
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\[/\1/,array/
--regex-js=/([^= ]+)[ 
continue.

Rating with Stars

I want to be able to write some reviews and graph­i­cal­ly rate them with stars. I put together some trans­par­ent stars in Gimp and added a macro to dasBlog.

I'm going to rate this effort:

$stars(4.5)

To get this effect, I simply wrote $stars(4.5).

(And I had to carefully construct the previous sentence so that dasBlog wouldn't invoke the stars macro.)

I'm hardnosed. I rarely give 5/5 to anything. I don't really expect to need the half stars, but I may want that fine control at some point.

To use this in your own blog, download the zipfile of star images.

Copy 5star*.gif to your blog's images directory. The *.xcf files continue.