I like to use io.StringIO rather than the older cStringIO.StringIO,
as it's Python 3–ready
io.StringIO is also a context manager:
if you use it in a with statement,
the string buffer is automatically closed as you go out of scope.
I tried using io.StringIO with unicodecsv,
as I wanted to capture the CSV output into a string buffer
for use with unit tests.
unicodecsv is a drop-in replacement for Python's built-in csv module,
which supports Unicode strings.
with io.StringIO() as csv_file:
write_csv_rows(csv_file)
lines = csv_file.getvalue().split('\r\n')
return lines[:-1] # drop empty line after trailing \r\n
It failed horribly with
TypeError: unicode argument expected, …continue.
I rarely use doctests, but I do have some code that uses them.
Although I still mostly write Python 2,
I usually import several features of Python 3:
from __future__ import unicode_literals, print_function, absolute_import
Unfortunately unicode_literals doesn't play well with doctests.
The following code will pass with python2 -m doctest demo.py,
but not with python3:
from __future__ import unicode_literals, print_function, absolute_import
def upper(s):
"""
Convert `s` to upper case.
>>> upper('Hello!')
u'HELLO!'
"""
return s.upper()
Python 3 complains:
Failed example:
upper('Hello!')
Expected:
u'HELLO!'
Got:
'HELLO!'
The …continue.
I needed to track down a remote branch created a couple of months ago
on another machine.
I knew which file had been changed,
but none of the far-too-many remote branches' names rang a bell.
Turns out that using git branch --contains in the right way
finds all the relevant branches.
git log --all --format=%H FILENAME \
| while read f; do git branch --remotes --contains $f; done \
| sort -u
The first line, git log --all --format=%H FILENAME,
lists all the hashes for commits that contained changes to FILENAME.
The second finds all the branches that contain those hashes
(I added --remotes).
The …continue.
Despite being a bona fide performance expert—I spent a couple of years as the Performance Lead
for Microsoft's IIS web server product about 15 years ago—I still forget to measure rather than assume.
I wrote some code today that imported nearly 300,000 nodes into a graph
from a 500MB XML file.
The code was not particularly fast and I assumed that it was the XML parser.
I had been using the built-in streaming parser, cElementTree iterparse.
I assumed that using the lmxl iterparse would make the code faster.
It didn't.
Then I had the bright idea of temporarily disabling the per-node processing,
which left only the XML parsing.
Instead of …continue.
I wanted to raise Python's IOError for a file-not-found condition,
but it wasn't obvious what the parameters to the exception should be.
from errno import ENOENT
if not os.path.isfile(source_file):
raise IOError(ENOENT, 'Not a file', source_file)
with open(source_file) as fp:
return fp.read()
IOError can be instantiated with 1, 2, or 3 arguments:
- IOError(errno, strerror, filename)
- These arguments are available
on the errno, strerror, and filename attributes of the exception object,
respectively, in both Python 2 and 3.
The args attribute contains the verbatim constructor arguments as a tuple.
- IOError(errno, strerror)
- These are available on the errno and strerror attributes of the exception,
respectively, in both Python 2 and …continue.
Various Bash guides recommend putting quotes around just about everything.
I had a script that contained this line:
sudo apt-get install --yes $(cat "$BUILD/install_on_aws_ubuntu.txt")
While refactoring, I put another set of quotes around the $(cat ...)
out of an abundance of caution:
sudo apt-get install --yes "$(cat "$BUILD/install_on_aws_ubuntu.txt")"
Several other changes later, I couldn't figure out why my script had stopped working.
Here's what happened:
$ cat $HOME/tmp/demo.txt
foo
bar
quux
$ echo $(cat "$HOME/tmp/demo.txt")
foo bar quux
$ echo "$(cat $HOME/tmp/demo.txt)"
foo
bar
quux
The new outer quotes retained the newlines;
the original replaced newlines with spaces.
The Bash FAQ has more: specifically Command Substition and Word Splitting.
From Python, I needed to find a process
that was performing SSH tunneling on port 8080
and kill it.
The following works in Bash:
ps aux | grep [s]sh.*:8080 | awk '{print $2}' | xargs kill -9
The grep [s]sh trick ensures that the grep command itself
won't make it through to awk.
Here's what I came up with in Python using psutil:
def kill_port_forwarding(host_port):
ssh_args = ["-f", "-N", "-L", "{0}:localhost:{0}".format(host_port)]
for process in psutil.process_iter():
try:
if process.name().endswith('ssh'):
…continue.
I needed to import some plugin code written in Python
from a directory whose path isn't known until runtime.
Further, I needed a class object that was a subclass of the plugin base class.
from somewhere import PluginBase
class SomePlugin(PluginBase):
def f1(self): ...
def f2(self): ...
You can use the imp module to actually load the module from impl_dir.
Note that impl_dir needs to be temporarily prepended to sys.path.
Then you can find the plugin subclass using dir and issubclass.
import os, imp
def import_class(implementation_filename, base_class):
impl_dir, impl_filename = os.path.split(implementation_filename)
module_name, _ = os.path.splitext(impl_filename)
…continue.
I was sent an invite to Keybase a few weeks, which I accepted tonight.
Keybase Wants To Make Serious Encryption Accessible To Mere Mortals
explains:
From a cryptographic standpoint, PGP is rock solid.
In practice, using it is very messy.
Its complexity has deterred the vast majority of people
who might otherwise benefit from using encryption.
The first problem is establishing a valid identity,
especially with other people located oceans away.
The second is distributing public keys
without nefarious types posting alternative keys
that appear to be registered to the same person.
...
The third issue is getting people to install and use PGP software.
I can now be reached via https://keybase.io/georgevreilly.
I've proved my …continue.
[Previously published at the now defunct MetaBrite Dev Blog.]
I needed to sort a Python dictionary by value today, rather than by key.
I found it confusing, so I'll share what I learned.
Assume the following dictionary, where each value is a tuple of (ID, score).
How do we sort by score; i.e., the second item in the value tuple?
(For the purposes of this discussion, ignore the meaning of the dictionary's key.)
>>> some_dict = dict(a=(123, 0.7), b=(372, 0.2), e=(456, 0.85), d=(148, 0.23), c=(502, 0.1))
>>> some_dict
{'a': (123, 0.7), 'c': (502, 0.1), 'b': (372, 0.2), 'e': (456, 0.85), 'd': (148, 0.23)}
Python dictionaries are inherently unsorted,
unless you …continue.
Previous »
« Next