Wednesday, April 22, 2009 
string formatting

Python has long had a string interpolation operator, %.

Python 2.6 and 3.0 introduced a new, richer set of string formatting operations. See PEP 3101 for the rationale.

One trick that I liked with the old way of formatting was to put the locals() dictionary or self.__dict__ on the right-hand side

>>> def stuff(a, b):
...  c = a+b; d = a-b
...  return "%(a)s, %(b)s, %(c)s, %(d)s" % locals()
...
>>> stuff(3, 17)
'3, 17, 20, -14'

It took me a few minutes to figure out how to do the equivalent with string.format: use the ** syntax to unpack the dict into kwargs.

>>> class Person(object):
...  def __init__(self, name, age):
...   self.name = name
...   self.age = age
...  def old(self):
...   return "name=%(name)s, age=%(age)d" % self.__dict__
...  def new(self):
...   return "name={name}, age={age}".format(**self.__dict__)
...  def dict(self):
...   return "name={0[name]}, age={0[age]}".format(self.__dict__)
...
>>> gb = Person('George Burns', 100)
>>> gb.old()
'name=George Burns, age=100'
>>> gb.new()
'name=George Burns, age=100'
>>> gb.dict()
'name=George Burns, age=100'

The getitem variant ({0[name]}) might be slightly more efficient, since the dict does not need to be flattened, but I doubt it makes a perceptible difference in practice.

posted on Thursday, April 23, 2009 6:27:20 AM (Pacific Daylight Time, UTC-07:00) 
#    Comments [0]
Monday, March 30, 2009 
Big Ben

The Cozi Tech Blog needed some love, so I wrote a post on augmenting Python's strftime.

posted on Tuesday, March 31, 2009 6:17:58 AM (Pacific Daylight Time, UTC-07:00) 
#    Comments [0]
Friday, March 20, 2009 
List Comprehension

Python has list comprehensions, syntactic sugar for building lists from an expression.

>>> [2 * i for i in (2, 3, 5, 7, 11)]
[4, 6, 10, 14, 22]

This doesn't work so well when the comprehension expression is itself a list: you end up with a list of lists.

>>> def gen():
...     for l in [['a', 'b'], ['c'], ['d', 'e', 'f']]:
...         yield l
...
>>> [l for l in gen()]
[['a', 'b'], ['c'], ['d', 'e', 'f']]

This is ugly. Here's one way to build a flattened list, but it's less elegant than the comprehension.

>>> x = []
>>> for l in gen():
...     x.extend(l)
...
>>> x
['a', 'b', 'c', 'd', 'e', 'f']

It took me a while to find a readable list comprehension, with a little help from Google. Use sum() on the outer list and prime it with an empty list, []. Python will concatenate the inner lists, producing a flattened list.

>>> sum([l for l in gen()], [])
['a', 'b', 'c', 'd', 'e', 'f']

Alternatively, you can use itertools.chain().

>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']

That might be slightly more efficient, though I find the sum() to be a little more readable.

>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']

That might be slightly more efficient, though I find the sum() to be a little more readable.

Edit: I forgot about nested comprehensions

>>> [inner
...     for outer in gen()
...         for inner in outer]
['a', 'b', 'c', 'd', 'e', 'f']

Somewhat cryptic on one line however:

>>> [j for i in gen() for j in i]
['a', 'b', 'c', 'd', 'e', 'f']
posted on Friday, March 20, 2009 7:05:07 AM (Pacific Daylight Time, UTC-07:00) 
#    Comments [0]
Saturday, January 31, 2009 
Eric on BuildBot

[Eric holding forth on BuildBot]

Eric and I attended Northwest Python Day 2009 today at the University of Washington. There were about 50 people present, with a few out-of-town visitors from Portland and Vancouver BC.

It was a mixed bag. I found the afternoon sessions more interesting than the morning ones.

The morning talks started with a set of five-minute lightning talks, including:

  • ctypes being used to crack open a raw binary file with arbitrary bit alignment.
  • Werkzeug: a set of WSGI utilities. Debugger sounds particularly useful.
  • BuildBot: Eric talked about using it for Continuous Integration and how easy it was to configure and extend, compared to CruiseControl.NET.

Browser Interface, Local Server: creating a desktop app that contains, in one process, both a browser app and a local HTTP server, running on separate threads. The browser app can also be used to connect to a remote web server. Used wxPython to host an HTML control for the browser part.

The afternoon lightning talks included:

  • Sphinx: a documentation generator built on top of reStructuredText.
  • NodeBox: a Mac app for creating 2D visuals.
  • vmshell: a not-yet-released toolkit for manipulating virtual machines using libvirt.

Sage is an impressive open-source package for doing mathematics, and a potential alternative to expensive commercial products like Mathematica and Matlab. Browse the Sage Notebook to get a feel for what it can do. Talk a look at today's Sage talk.

Google App Engine is good for a narrow class of apps: HTTP, request+response, time-limited, sandboxed. There are many quotas, known and unknown. The non-relational data store has restricted queries: no joins, only complete entities, limited comparisons.

Cython is a Python to C compiler that seems promising. It requires slight modifications to the classes and functions that will be compiled to C: declare them with the cdef keyword. It offers significant speedups for hotspot code and it's heavily used in Sage.

Ted Leung closed the day by talking about Python at Sun. All of the dynamic languages have been trending upwards in the last few years, hence Sun's (and Microsoft's) interest in dynamic languages. Jython, after years of struggling along, is alive and well. I really have to check out DTrace on Mac or OpenSolaris soon. One way to win mindshare for Python is better tools: NBPython will provide Python support for the NetBeans IDE: code completion, debugger, etc.

There were a handful of other talks that I didn't write up.

My thanks to the organizers for putting together a successful free conference.

posted on Sunday, February 01, 2009 6:58:48 AM (Pacific Standard Time, UTC-08:00) 
#    Comments [0]
Sunday, November 23, 2008 
reStructuredText

I hate composing anything longer than a couple of paragraphs in an online HTML editor. Specifically, I hate writing posts for this blog online. I'd much rather write in Vim and upload HTML. But I don't want to compose in raw HTML either.

I use reStructuredText (reST), an unobtrusive plaintext markup language popular in the Python world. reST can generate HTML, LaTeX, native PDF, ODF, and other formats. The picture at right shows a draft of this document in MacVim; reST is, as you can see, quite readable (though I work with a larger font). I use restview to preview the HTML locally and Pygments for syntax highlighting of code. Vim has its own syntax highlighting for reST and I've developed a set of keyboard macros for my own use.

The weak link in this scheme is posting to the blog. Right now, I have a little wrapper that generates HTML, extracts the body, and copies it to the pasteboard (clipboard). I then manually paste that into a raw HTML textarea in the blog's editor. Someday, I have to adapt mtsend or Firedrop2 to make this less painful. Or I could hack dasBlog to support reST in IronPython, or switch over to a blog that supports reST natively. Someday.

For a long time, I used VST (Vim reStructuredText) to generate HTML from reST. As I began using Python more and more, I realized that I was far better off with the real thing, which is well designed and quite fast. The VimL scripting language is not that good and VST pushes it to its limits.

As of the recent Python 2.6 release, all the official Python documentation is in reST format. Sphinx is a documentation build system that wraps a collection of reST documents into a larger navigable entity.

There are many other lightweight markup languages, such as Textile, Markdown, and AsciiDoc. No doubt they have their strengths, but I now have a significant investment in reST and it's well supported by the Python community.

posted on Monday, November 24, 2008 5:14:40 AM (Pacific Standard Time, UTC-08:00) 
#    Comments [0]
Monday, November 10, 2008 
Distributed/Decentralized Version Control Systems

At work, I've been experimenting with the big three Distributed Version Control Systems, Git, Mercurial, and Bazaar, on Windows over the last ten days.

Pavel and Eric have been singing the praises of Git and git-svn on their Mac and Linux boxes respectively for the last few months. Git allows them to check in small changes locally without perturbing the build. The ease of branching and merging allows them to work in more than one branch at a time at a lower cost than Subversion did. Most of our dev team continue to work in Subversion on Windows boxes. git-svn allows Pavel and Eric to easily interoperate with the Subversion server. Pavel is also a big fan of git-stash: he stacks away in-progress work and switches easily to other patches.

Although I've worked primarily in Python on Linux since the summer, I've been working on our forthcoming mobile client recently. It's ASP.NET-based, hence I'm working on Windows again. I'm in the throes of a major refactoring, extracting the mobile client out of the main webclient and hoisting other code into shared projects, while other developers continue to work on the main webclient and the mobile client.

This seemed like a perfect opportunity to bite the DVCS bullet, since I knew that branching and merging would be less painful with git-svn than with Subversion.

Getting git-svn working on Windows turned out to be a major headache. The Cygwin version of git-svn simply doesn't work for me. And msysGit doesn't currently support git-svn. (Eric has had some success with an older version of msysGit and git-svn, but I found it to be wretchedly slow.) Moreover, Git's integration with Windows is poor. There's nothing like TortoiseSVN to ease developers into using Git.

Having written off Git on Windows for now, it was time to try Bazaar (bzr), which has its own Subversion plugin, bzr-svn. The version of bzr-svn that was available for Windows the week before last was ancient, and promptly crashed. Jelmer, the developer, mailed me yesterday to say that there should be an up-to-date copy of bzr-svn in the brand new 1.9 release of Bazaar. I'll try it at work tomorrow. Windows doesn't seem like an afterthought for Bazaar; indeed, TortoiseBzr offers Explorer integration.

On to Mercurial (hg). Alas, this has the weakest integration with Subversion. There are instructions for doing it by hand (which is what I'm doing). The hgsubversion extension looks promising, but is still immature.

Even so, Mercurial is what I've ended up using for the last week. Partly because it didn't bite me. Partly because I like it best of the three. The Mercurial book takes much of the credit for that. Windows is a first-class client and TortoiseHg offers half-way decent Explorer integration.

I'm not impressed with Git as software engineering; it strikes me as an incoherent mess of C and Perl. The attitude of superiority from some Git proponents is off-putting. I watched Linus Torvalds' Google techtalk about Git on Friday; he came across as a major jerk, repeatedly calling anyone who uses Subversion an idiot. I'd still recommend watching the video: it gives good insight into the social aspects of distributed/decentralized VCSes, how very different they are from traditional centralized VCSes, and how they afford a different way of working.

Watching my compatriot Bryan O'Sullivan's Google techtalk on Mercurial this afternoon was a far more pleasant experience. He talks more about workflow and implementation.

Both Bazaar and Mercurial are written in Python and seem to be fairly well architected. Frankly, if I do have to get my hands dirty in the code (e.g., hgsubversion), I'd much rather hack in Python. I did C/C++ for fifteen years and I'm sick of unmanaged code.

Anyway, Mercurial is where I'm going for now, though I won't categorically rule out Bazaar or Git.

posted on Monday, November 10, 2008 8:19:23 AM (Pacific Standard Time, UTC-08:00) 
#    Comments [2]
Saturday, September 13, 2008 
Cheetah Tips Cheetah Tips

At Cozi, we're writing our new web services in Python (a story for another day). I wrote up a few hard-won tips on using the Cheetah Template library at the Cozi Tech Blog.

posted on Sunday, September 14, 2008 2:34:49 AM (Pacific Daylight Time, UTC-07:00) 
#    Comments [0]
Friday, January 12, 2007 

http://www.georgevreilly.com/blog/content/binary/PythonBatch.jpg

Batchfile Wrapper

I've made some significant changes to my Python Batchfile Wrapper. The main virtue of this wrapper is that it finds python.exe and invokes it on the associated Python script, ensuring that input redirection works.

I've also adapted py2bat to work with my wrapper. I'm calling my version py2cmd.

Here's my latest batch file, which is shorter than its predecessor.

To use it, place it in the same directory as the Python script you want to run and give it the same basename; i.e., d:\some\path or other\example.cmd will run d:\some\path or other\example.py.

 @echo off
 setlocal
 set PythonExe=
 set PythonExeFlags=-u

for %%i in (cmd bat exe) do (
for %%j in (python.%%i) do (
call :SetPythonExe "%%~$PATH:j" ) ) for /f "tokens=2 delims==" %%i in ('assoc .py') do (
for /f "tokens=2 delims==" %%j in ('ftype %%i') do (
for /f "tokens=1" %%k in ("%%j") do (
call :SetPythonExe %%k ) ) ) "%PythonExe%" %PythonExeFlags% "%~dpn0.py" %* goto :EOF :SetPythonExe if not [%1]==[""] (
if ["%PythonExe%"]==[""] (
set PythonExe=%~1
)
)
goto :EOF

This is sufficiently cryptic that it merits some explanation.

The first set of nested loops attempts to find python.cmd, python.bat, and python.exe, respectively, along your PATH:

 for %%i in (cmd bat exe) do (
for %%j in (python.%%i) do (
call :SetPythonExe "%%~$PATH:j"
)
)

The %%~$PATH:j expression searches the PATH for %%j (i.e., python.cmd, etc). If it's found, the expression evaluates to the full path to %%j. Otherwise, it evaluates to the empty string. I've bracketed the expression with double quotes in order to handle spaces in directory names.

The SetPythonExe subroutine simply sets %PythonExe% to %1 if and only if %PythonExe% doesn't already have a value and %1 is not empty:

We can't set %PythonExe% directly in the loop. As explained at for loops and variable expansion, environment variables in the body of the loop are evaluated once before the loop starts and won't change until after the loop terminates:

 :SetPythonExe
if not [%1]==[""] (
if ["%PythonExe%"]==[""] (
set PythonExe=%~1
)
)
goto :EOF

Note: the %~1 notation strips off any surrounding double quotes. (ss64.com has details on parameter syntax.)

The square brackets and double quotes are necessary to make it all work if either %PythonExe% or %1 contains spaces. Getting this right was one of the hardest parts of the whole exercise.

The second set of nested loops are scarier:

 for /f "tokens=2 delims==" %%i in ('assoc .py') do (
for /f "tokens=2 delims==" %%j in ('ftype %%i') do (
for /f "tokens=1" %%k in ("%%j") do (
call :SetPythonExe %%k
)
)
)

The outer loop runs once: assoc .py yields .py=Python.File and %%i is set to Python.File. Running ftype Python.File yields Python.File="C:\Python24\python.exe" "%1" %* (on my machine).

The second loop also runs once: %%j is set to everything on the right-hand side of the =.

The third loop also runs once: %%k is set to the first token in %%j, "C:\Python24\python.exe", which is passed in to SetPythonExe.

At this point, %PythonExe% will have a value if python.cmd (or python.bat or python.exe) existed on your path, or the .py extension was registered.

If it doesn't have a value, then the invocation of "%PythonExe%" will fail, setting %errorlevel% to 9009:

 "%PythonExe%" %PythonExeFlags% "%~dpn0.py" %*
goto :EOF

%PythonExeFlags% was set to -u at the beginning of the script. As explained in my Python Batchfile Wrapper post, this treats stdin, stdout, and stderr as raw streams, instead of transliterating \r\n into \n. If you want cooked input, simply remove the -u.

The "%~dpn0.py" notation yields the absolute path to the Python script with the .py extension sitting beside this batch file: another example of parameter syntax.

Finally, goto :EOF ends execution of the batchfile, skipping the :SetPythonExe subroutine.

Whew!

py2cmd

You can have a batchfile sitting alongside a Python script as above, or you can have a self-contained batchfile cum Python script.

py2bat has been kicking around for years. It takes a Python script and turns it into a batchfile, by relying on a couple of tricks.

I've adapted py2bat into a new script, py2cmd. In essence, the generated batchfile looks like this:

 @echo off
REM="""
... set PythonExe as above ...
"%PythonExe%" -x %0
goto :EOF
"""

# python code starts here
# ...

When this file is executed by cmd.exe, the control flow should be obvious. Disable echoing to the screen, a funny-looking REM, set %PythonExe% as before (not shown), invoke python.exe with the -x flag on the current batchfile, and finally skip past the rest of the file.

When Python is invoked with the -x flag, it skips the first line of the script (@echo off). The second line sets the variable REM to the multiline string which continues down to the closing """ below the goto :EOF. Everything after that is the original Python script. All the batchfile nonsense is wrapped up inside the REM variable.

Download py2cmd.

Other Wrappers

Fredrik Lundh's ExeMaker generates a stub executable to launch a Python script with the same basename. It requires that Python already be installed on the target machine. I couldn't get ExeMaker to work properly. The stub executable leaves me at the Python interpreter's interactive prompt.

py2exe takes a Python script and bundles up all the Python support files to make it run on a machine that doesn't have Python installed. Works fine for me, but you get 4MB+ of associated runtime. Massive overkill if the target machine is known to have Python installed.

posted on Saturday, January 13, 2007 2:49:31 AM (Pacific Standard Time, UTC-08:00) 
#    Comments [0]
Thursday, December 28, 2006 

content/binary/PythonBatch.jpg

I've been getting into Python lately. One problem that I've encountered under Windows, is that input redirection doesn't work if you use the .py file association to run the script; e.g.:

 C:\> foo.py < input.txt

There's a well-known input redirection bug. The fix is to explicitly use python.exe to run the script.

A related problem for me was that stdin was opened as a text file, not a binary file, so \r bytes were being discarded from binary input files. The fix is to run python.exe -u (unbuffered binary input and output).

I didn't want to hardcode the path to python.exe in a batch file, so I came up with the following wrapper, which parses the output from assoc .py and ftype Python.File.

Just place this batch file in the same directory as foo.py and call it foo.bat.

 @setlocal
 @if (%_echo%)==()  set _echo=off
@echo %_echo% :: You must explicitly invoke python.exe, rather than rely on the :: file association for .py, if you want stdin redirection to work. :: See http://mail.python.org/pipermail/python-bugs-list/2004-August/024920.html :: The -u flag to python.exe specifies unbuffered, binary stdin, :: so '\r\n' is not remapped to '\n'. call :FindPythonExe if "%PythonExe%"=="" (
echo Can't find python.exe exit /B 1 ) :: Replace the extension of this batch file with .py: s/.bat$/.py/ set PythonFile=%~dpn0.py

"%PythonExe%" -u %PythonFile% %* goto :EOF :: :: Find python.exe in the path or via the .py association :: :FindPythonExe set PythonExe= :: Search for python.{cmd,bat,exe} in %PATH% for %%i in (cmd bat exe) do (
if "%PythonExe%"=="" (
for %%j in (python.%%i) do set PythonExe=%%~$PATH:j ) ) :: Extract path to python.exe from .py association if "%PythonExe%"=="" call :AssocPy2Exe goto :EOF :: :: Return the executable associated with .py in %PythonExe% :: :AssocPy2Exe call :AssocExtn2Exe .py
set PythonExe=%_exe% goto :EOF :: :: Return the executable associated with file extension %1 in %_exe% :: :AssocExtn2Exe :: assoc .py -> .py=Python.File for /f "usebackq tokens=2 delims==" %%i in (`assoc %1`) do set _ftype=%%i :: ftype Python.File -> Python.File="C:\Python24\python.exe" "%1" %* :: Grab everything after the '=' for /f "usebackq tokens=2 delims==" %%i in (`ftype %_ftype%`) do set _rhs=%%i :: Get the first token of the space-separated list for /f "tokens=1" %%i in ("%_rhs%") do set _exe=%%i goto :EOF

Now you can run foo.bat < bar.jpg with the expected results.

Enjoy!

Update 2007/01/03: The batchfile now searches %PATH% before looking up the .py association.

Update 2007/01/12: See here for a significantly improved batchfile and for py2cmd.

posted on Friday, December 29, 2006 1:47:28 AM (Pacific Standard Time, UTC-08:00) 
#    Comments [0]