George V. Reilly

Including Data Files in Python Packages

[Pre­vi­ous­ly published at the now defunct MetaBrite Dev Blog.]

I spent some time today struggling with setuptools, trying to make a Python source package not only include a data file, but also install that file.

Building the installer

Consider the following source tree layout:

├── my_stuff/
│   ├──
│   ├──
│   ├──
│   └──
├── models/
│   └── long_ugly_name_20151221.json

I wanted to create a Python source dis­tri­b­u­tion, some_­pack­age-N.N.N.tar.gz, which contains the code in the my_stuff directory, as well as models/long_ug­ly_­name_20151221.json, using python sdist.

It’s not that hard to get models/long_ug­ly_­name_20151221.json included in the tarball. Add an entry in

include models/*.json

Then be sure to set in­clude_­pack­age_­da­ta=True in the call to setup():

from setuptools import setup, find_packages

    # ...

Or, if the JSON file is under source control, you can add pack­age_­da­ta={'models': ['models/*.json']} to the setup() call.

However, neither is sufficient to have the JSON file installed when you run pip install some_­pack­age-N.N.N.tar.gz. The trick is to convince setuptools that models is actually a module by placing an empty in the models source directory:

└── long_ugly_name_20151221.json

More at setuptools: Including Data Files.

Using the JSON file at runtime

As you might guess, the rest of the package doesn’t actually know the actual name of the JSON file. So how do we discover the name at runtime so that we can load it? We use pkg_re­sources:

import json
import pkg_resources

json_files = [f for f in pkg_resources.resource_listdir('models', '')
              if f.endswith('.json')]
model = json.load(pkg_resources.resource_stream('models', json_files[0]))

pack­age_­da­ta versus data_files

Note: there are two similarly named arguments to setup() with distinct semantics, pack­age_­da­ta and data_files.

Use pack­age_­da­ta to install files into the package; use data_files to place files outside the package.

This fragment may help to set data_files:

data_files=[(d, [os.path.join(d, f) for f in files])
            for d, folders, files in os.walk(datadir)]
blog comments powered by Disqus
Review: Destination: Morgue! » « Christmas Cake