2016-04-13

io.StringIO and UnicodeCSV DictWriter

I like to use io.StringIO rather than the older cStringIO.StringIO, as it's Python 3–ready io.StringIO is also a context manager: if you use it in a with statement, the string buffer is automatically closed as you go out of scope.

I tried using io.StringIO with unicodecsv, as I wanted to capture the CSV output into a string buffer for use with unit tests. unicodecsv is a drop-in replacement for Python's built-in csv module, which supports Unicode strings.

with io.StringIO() as csv_file:
    write_csv_rows(csv_file)
    lines = csv_file.getvalue().split('\r\n')
    return lines[:-1]  # drop empty line after trailing \r\n

It failed horribly with TypeError: unicode argument expected, …continue.

2016-04-05

Doctests, Unicode Literals, and Python 2/3 Compatibility

I rarely use doctests, but I do have some code that uses them.

Although I still mostly write Python 2, I usually import several features of Python 3:

from __future__ import unicode_literals, print_function, absolute_import

Unfortunately unicode_literals doesn't play well with doctests.

The following code will pass with python2 -m doctest demo.py, but not with python3:

from __future__ import unicode_literals, print_function, absolute_import

def upper(s):
    """
    Convert `s` to upper case.

    >>> upper('Hello!')
    u'HELLO!'
    """
    return s.upper()

Python 3 complains:

Failed example:
    upper('Hello!')
Expected:
    u'HELLO!'
Got:
    'HELLO!'

The …continue.

2016-04-03

Find remote Git branch with change to a file

I needed to track down a remote branch created a couple of months ago on another machine. I knew which file had been changed, but none of the far-too-many remote branches' names rang a bell.

Turns out that using git branch --contains in the right way finds all the relevant branches.

git log --all --format=%H FILENAME \
    | while read f; do git branch --remotes --contains $f; done \
    | sort -u

The first line, git log --all --format=%H FILENAME, lists all the hashes for commits that contained changes to FILENAME. The second finds all the branches that contain those hashes (I added --remotes). The …continue.

2016-03-25

Profiling

Despite being a bona fide performance expert—I spent a couple of years as the Performance Lead for Microsoft's IIS web server product about 15 years ago—I still forget to measure rather than assume.

I wrote some code today that imported nearly 300,000 nodes into a graph from a 500MB XML file. The code was not particularly fast and I assumed that it was the XML parser. I had been using the built-in streaming parser, cElementTree iterparse. I assumed that using the lmxl iterparse would make the code faster. It didn't.

Then I had the bright idea of temporarily disabling the per-node processing, which left only the XML parsing. Instead of …continue.

2016-03-24

Raising IOError for 'file not found'

I wanted to raise Python's IOError for a file-not-found condition, but it wasn't obvious what the parameters to the exception should be.

from errno import ENOENT

if not os.path.isfile(source_file):
    raise IOError(ENOENT, 'Not a file', source_file)
with open(source_file) as fp:
    return fp.read()

IOError can be instantiated with 1, 2, or 3 arguments:

IOError(errno, strerror, filename): These arguments are available on the errno, strerror, and filename attributes of the exception object, respectively, in both Python 2 and 3. The args attribute contains the verbatim constructor arguments as a tuple.
IOError(errno, strerror): These are available on the errno and strerror attributes of the exception, respectively, in both Python 2 and …continue.

2016-03-17

Quoting in Bash

Various Bash guides recommend putting quotes around just about everything. I had a script that contained this line:

sudo apt-get install --yes $(cat "$BUILD/install_on_aws_ubuntu.txt")

While refactoring, I put another set of quotes around the $(cat ...) out of an abundance of caution:

sudo apt-get install --yes "$(cat "$BUILD/install_on_aws_ubuntu.txt")"

Several other changes later, I couldn't figure out why my script had stopped working.

Here's what happened:

$ cat $HOME/tmp/demo.txt
foo
bar
quux

$ echo $(cat "$HOME/tmp/demo.txt")
foo bar quux

$ echo "$(cat $HOME/tmp/demo.txt)"
foo
bar
quux

The new outer quotes retained the newlines; the original replaced newlines with spaces.

The Bash FAQ has more: specifically Command Substition and Word Splitting.

2016-03-06

psutil kill

From Python, I needed to find a process that was performing SSH tunneling on port 8080 and kill it.

The following works in Bash:

ps aux | grep [s]sh.*:8080 | awk '{print $2}' | xargs kill -9

The grep [s]sh trick ensures that the grep command itself won't make it through to awk.

Here's what I came up with in Python using psutil:

def kill_port_forwarding(host_port):
    ssh_args = ["-f", "-N", "-L", "{0}:localhost:{0}".format(host_port)]
    for process in psutil.process_iter():
        try:
            if process.name().endswith('ssh'):

…continue.

2016-03-02

Python: Import subclass from dynamic path

I needed to import some plugin code written in Python from a directory whose path isn't known until runtime. Further, I needed a class object that was a subclass of the plugin base class.

from somewhere import PluginBase

class SomePlugin(PluginBase):
    def f1(self): ...
    def f2(self): ...

You can use the imp module to actually load the module from impl_dir. Note that impl_dir needs to be temporarily prepended to sys.path. Then you can find the plugin subclass using dir and issubclass.

import os, imp

def import_class(implementation_filename, base_class):
    impl_dir, impl_filename = os.path.split(implementation_filename)
    module_name, _ = os.path.splitext(impl_filename)

…continue.

2016-02-27

Keybase

I was sent an invite to Keybase a few weeks, which I accepted tonight.

Keybase Wants To Make Serious Encryption Accessible To Mere Mortals explains:

From a cryptographic standpoint, PGP is rock solid. In practice, using it is very messy. Its complexity has deterred the vast majority of people who might otherwise benefit from using encryption.

The first problem is establishing a valid identity, especially with other people located oceans away. The second is distributing public keys without nefarious types posting alternative keys that appear to be registered to the same person. ... The third issue is getting people to install and use PGP software.

I can now be reached via https://keybase.io/georgevreilly. I've proved my …continue.

2016-02-16

Sorting Python Dictionaries by Value

[Previously published at the now defunct MetaBrite Dev Blog.]

I needed to sort a Python dictionary by value today, rather than by key. I found it confusing, so I'll share what I learned.

Assume the following dictionary, where each value is a tuple of (ID, score). How do we sort by score; i.e., the second item in the value tuple? (For the purposes of this discussion, ignore the meaning of the dictionary's key.)

>>> some_dict = dict(a=(123, 0.7), b=(372, 0.2), e=(456, 0.85), d=(148, 0.23), c=(502, 0.1))
>>> some_dict
{'a': (123, 0.7), 'c': (502, 0.1), 'b': (372, 0.2), 'e': (456, 0.85), 'd': (148, 0.23)}

Python dictionaries are inherently unsorted, unless you …continue.