George V. Reilly

Explaining the epilog of fnmatch.translate, \Z(?ms)

I was debugging a filtering directory walker (on which, more to follow) and I was trying to figure out the mysterious suffix that fnmatch.translate appends to its result, \Z(?ms).

fnmatch.translate takes a Unix-style glob, like *.py or test_*.py[cod], and translates it character-by-character into a regular expression. It then appends \Z(?ms). Hence the latter glob becomes r'test\_.*\.py[cod]\Z(?ms)', using Python’s raw string notation to avoid the backslash plague. Also, the ? wildcard character becomes the . regex special character, while the * wildcard becomes the .* greedy regex.

A Stack­Over­flow answer partially explains, which set me on the right track. (?ms) is equivalent to compiling the regex with re.MULTILINE | re.DOTALL. The re.DOTALL modifier makes the . special character match any character, including newline; normally . excludes newlines. The re.MULTILINE modifier makes ^ and $ operate on newline boundaries within the search string; otherwise, they anchor to the beginning and end of the string. \A always matches the beginning of the string; \Z always matches the end of the string.

Another way of saying this:

# No multiline, so ^ and $ anchor beginning and end of string
>>>'^\.git$(?s)', '.git')
<_sre.SRE_Match object at 0x10e73a850>

>>>'^\.git$(?s)', 'bar\n.git\nfoo')
# Nope

# Multiline => ^ matches after \n and $ before \n
>>>'^\.git$(?ms)', 'bar\n.git\nfoo')
<_sre.SRE_Match object at 0x10e73a988>

# \A and \Z always anchor beginning and end of string
>>>'\A\.git\Z(?ms)', '.git')
<_sre.SRE_Match object at 0x10e73a850>

>>>'\A\.git\Z(?ms)', 'foo\n.git')
# Nope

So, \Z(?ms) at the end of the pattern means:

blog comments powered by Disqus
On Blogging » « Microsoft Natural Keyboard 4000 on Mac OS X 10.11: Alt/Windows key no longer swapped