George V. Reilly

io.StringIO and UnicodeCSV DictWriter

I like to use io.StringIO rather than the older cStringIO.StringIO, as it’s Python 3–ready io.StringIO is also a context manager: if you use it in a with statement, the string buffer is au­to­mat­i­cal­ly closed as you go out of scope.

I tried using io.StringIO with unicodecsv, as I wanted to capture the CSV output into a string buffer for use with unit tests. unicodecsv is a drop-in re­place­ment for Python’s built-in csv module, which supports Unicode strings.

with io.StringIO() as csv_file:
    write_csv_rows(csv_file)
    lines = csv_file.getvalue().split('\r\n')
    return lines[:-1]  # drop empty line after trailing \r\n

It failed horribly with TypeError: unicode argument expected, got 'str'.

I managed to fix it by using cStringIO.StringIO and contextlib.closing

with contextlib.closing(cStringIO.StringIO()) as csv_file:
    ...

Writing this post, however, I realized how to fix it properly. Use io.BytesIO:

with io.BytesIO() as csv_file:
    ...

I now realize that io.StringIO is expecting a Unicode string, while the classic cStringIO.StringIO is expecting a byte string. UnicodeCSV implicitly takes care of the character encoding, so we have a byte stream that’s being written.

There are examples with the old StringIO in the unicodecsv code, but somehow I missed that io.BytesIO is used in unicodecsv‘s GitHub README.

cStringIO: “Unlike the StringIO module, this module is not able to accept Unicode strings that cannot be encoded as plain ASCII strings.”

UnicodeCSV: “Note that unicodecsv expects a bytestream, not unicode.”

io: “Since this module has been designed primarily for Python 3.x, you have to be aware that all uses of “bytes” in this document refer to the str type (of which bytes is an alias), and all uses of “text” refer to the unicode type. Fur­ther­more, those two types are not in­ter­change­able in the io APIs.”

blog comments powered by Disqus
Review: Venetian Mask » « Review: A Stone of the Heart