George V. Reilly

io.StringIO and UnicodeCSV DictWriter

I like to use io.StringIO rather than the older cStringIO.StringIO, as it's Python 3–ready io.StringIO is also a context manager: if you use it in a with statement, the string buffer is au­to­mat­i­cal­ly closed as you go out of scope.

I tried using io.StringIO with unicodecsv, as I wanted to capture the CSV output into a string buffer for use with unit tests. unicodecsv is a drop-in re­place­ment for Python's built-in csv module, which supports Unicode strings.

with io.StringIO() as csv_file:
    write_csv_rows(csv_file)
    lines = csv_file.getvalue().split('\r\n')
    return lines[:-1]  # drop empty line after trailing \r\n

It failed horribly with TypeError: unicode argument expected, continue.

Doctests, Unicode Literals, and Python 2/3 Compatibility

I rarely use doctests, but I do have some code that uses them.

Although I still mostly write Python 2, I usually import several features of Python 3:

from __future__ import unicode_literals, print_function, absolute_import

Un­for­tu­nate­ly uni­code_lit­er­als doesn't play well with doctests.

The following code will pass with python2 -m doctest demo.py, but not with python3:

from __future__ import unicode_literals, print_function, absolute_import

def upper(s):
    """
    Convert `s` to upper case.

    >>> upper('Hello!')
    u'HELLO!'
    """
    return s.upper()

Python 3 complains:

Failed example:
    upper('Hello!')
Expected:
    u'HELLO!'
Got:
    'HELLO!'

The continue.

Find remote Git branch with change to a file

I needed to track down a remote branch created a couple of months ago on another machine. I knew which file had been changed, but none of the far-too-many remote branches' names rang a bell.

Turns out that using git branch --contains in the right way finds all the relevant branches.

git log --all --format=%H FILENAME \
    | while read f; do git branch --remotes --contains $f; done \
    | sort -u

The first line, git log --all --format=%H FILENAME, lists all the hashes for commits that contained changes to FILENAME. The second finds all the branches that contain those hashes (I added --remotes). The continue.

Profiling

Despite being a bona fide per­for­mance expert—I spent a couple of years as the Per­for­mance Lead for Mi­crosoft­'s IIS web server product about 15 years ago—I still forget to measure rather than assume.

I wrote some code today that imported nearly 300,000 nodes into a graph from a 500MB XML file. The code was not par­tic­u­lar­ly fast and I assumed that it was the XML parser. I had been using the built-in streaming parser, cEle­ment­Tree iterparse. I assumed that using the lmxl iterparse would make the code faster. It didn't.

Then I had the bright idea of tem­porar­i­ly disabling the per-node processing, which left only the XML parsing. Instead of continue.

Raising IOError for 'file not found'

I wanted to raise Python's IOError for a file-not-found condition, but it wasn't obvious what the parameters to the exception should be.

from errno import ENOENT

if not os.path.isfile(source_file):
    raise IOError(ENOENT, 'Not a file', source_file)
with open(source_file) as fp:
    return fp.read()

IOError can be in­stan­ti­at­ed with 1, 2, or 3 arguments:

IOError(errno, strerror, filename)
These arguments are available on the errno, strerror, and filename attributes of the exception object, re­spec­tive­ly, in both Python 2 and 3. The args attribute contains the verbatim con­struc­tor arguments as a tuple.
IOError(errno, strerror)
These are available on the errno and strerror attributes of the exception, re­spec­tive­ly, in both Python 2 and continue.

Quoting in Bash

Various Bash guides recommend putting quotes around just about everything. I had a script that contained this line:

sudo apt-get install --yes $(cat "$BUILD/install_on_aws_ubuntu.txt")

While refac­tor­ing, I put another set of quotes around the $(cat ...) out of an abundance of caution:

sudo apt-get install --yes "$(cat "$BUILD/install_on_aws_ubuntu.txt")"

Several other changes later, I couldn't figure out why my script had stopped working.

Here's what happened:

$ cat $HOME/tmp/demo.txt
foo
bar
quux

$ echo $(cat "$HOME/tmp/demo.txt")
foo bar quux

$ echo "$(cat $HOME/tmp/demo.txt)"
foo
bar
quux

The new outer quotes retained the newlines; the original replaced newlines with spaces.

The Bash FAQ has more: specif­i­cal­ly Command Substition and Word Splitting.

psutil kill

From Python, I needed to find a process that was performing SSH tunneling on port 8080 and kill it.

The following works in Bash:

ps aux | grep [s]sh.*:8080 | awk '{print $2}' | xargs kill -9

The grep [s]sh trick ensures that the grep command itself won't make it through to awk.

Here's what I came up with in Python using psutil:

def kill_port_forwarding(host_port):
    ssh_args = ["-f", "-N", "-L", "{0}:localhost:{0}".format(host_port)]
    for process in psutil.process_iter():
        try:
            if process.name().endswith('ssh'):
       
continue.

Python: Import subclass from dynamic path

I needed to import some plugin code written in Python from a directory whose path isn't known until runtime. Further, I needed a class object that was a subclass of the plugin base class.

from somewhere import PluginBase

class SomePlugin(PluginBase):
    def f1(self): ...
    def f2(self): ...

You can use the imp module to actually load the module from impl_dir. Note that impl_dir needs to be tem­porar­i­ly prepended to sys.path. Then you can find the plugin subclass using dir and issubclass.

import os, imp

def import_class(implementation_filename, base_class):
    impl_dir, impl_filename = os.path.split(implementation_filename)
    module_name, _ = os.path.splitext(impl_filename)

    
continue.

Keybase

I was sent an invite to Keybase a few weeks, which I accepted tonight.

Keybase Wants To Make Serious Encryption Accessible To Mere Mortals explains:

From a cryp­to­graph­ic standpoint, PGP is rock solid. In practice, using it is very messy. Its complexity has deterred the vast majority of people who might otherwise benefit from using encryption.

The first problem is es­tab­lish­ing a valid identity, especially with other people located oceans away. The second is dis­trib­ut­ing public keys without nefarious types posting al­ter­na­tive keys that appear to be registered to the same person. ... The third issue is getting people to install and use PGP software.

I can now be reached via https://keybase.io/georgevreil­ly. I've proved my continue.

Sorting Python Dictionaries by Value

[Pre­vi­ous­ly published at the now defunct MetaBrite Dev Blog.]

I needed to sort a Python dictionary by value today, rather than by key. I found it confusing, so I'll share what I learned.

Assume the following dictionary, where each value is a tuple of (ID, score). How do we sort by score; i.e., the second item in the value tuple? (For the purposes of this discussion, ignore the meaning of the dic­tio­nary's key.)

>>> some_dict = dict(a=(123, 0.7), b=(372, 0.2), e=(456, 0.85), d=(148, 0.23), c=(502, 0.1))
>>> some_dict
{'a': (123, 0.7), 'c': (502, 0.1), 'b': (372, 0.2), 'e': (456, 0.85), 'd': (148, 0.23)}

Python dic­tio­nar­ies are inherently unsorted, unless you continue.

Previous » « Next