2016-04-05

Doctests, Unicode Literals, and Python 2/3 Compatibility

I rarely use doctests, but I do have some code that uses them.

Although I still mostly write Python 2, I usually import several features of Python 3:

from __future__ import unicode_literals, print_function, absolute_import

Unfortunately unicode_literals doesn’t play well with doctests.

The following code will pass with python2 -m doctest demo.py, but not with python3:

from __future__ import unicode_literals, print_function, absolute_import

def upper(s):
    """
    Convert `s` to upper case.

    >>> upper('Hello!')
    u'HELLO!'
    """
    return s.upper()

Python 3 complains:

Failed example:
    upper('Hello!')
Expected:
    u'HELLO!'
Got:
    'HELLO!'

The problem is that Python 2’s repr for a Unicode string prefixes the string with u, while Python 3’s repr does not (all strings are Unicode).

The test can be made to pass by removing unicode_literals from the from __future__ import, but this also removes the benefit of implicitly forcing all string literals to be Unicode.

Another workaround is to use six thus:

from __future__ import unicode_literals, print_function, absolute_import

import six

def upper(s):
    """
    Convert `s` to upper case.

    >>> upper('Hello!') == six.text_type(u'HELLO!')
    True
    """
    return s.upper()

This works, but is less clear. If the assertion fails, complaining about True instead of HELLO! is far less clear.

Lennart Regebro has a good discussion of other doctest migration problems. If you’re willing to use a more sophisticated method of running doctests, you can try a doctest output checker or a nose plugin.