George V. Reilly

fsymbols for Unicode weirdness

My display name on Twitter currently looks like @ɢᴇᴏʀɢᴇᴠʀᴇɪʟʟʏ@ᴛᴇᴄʜ.ʟɢʙᴛ, an attempt to route around Twitter's apparent censorship of Mastodon in­for­ma­tion.

I used the FSymbols Generators to produce several variants.


Many of these variants come from Unicode Block "Math­e­mat­i­cal Al­phanu­mer­ic Symbols".

There are a lot more things you can do with Unicode than just upside-down text.

Backwards Ranges in Python

In Python, if you want to specify a sequence of numbers from a up to (but excluding) b, you can write range(a, b). This generates the sequence a, a+1, a+2, ..., b-1. You start at a and keep going until the next number would be b.

In Python 3, range is lazy and the values in the sequence do not ma­te­ri­al­ize until you consume the range.

>>> range(3,12)
range(3, 12)
>>> list(range(3,12))
[3, 4, 5, 6, 7, 8, 9, 10, 11]

Trey Hunner makes the point that range is a lazy iterable rather than an iterator.

You can also step by an increment other than one: range(a, b, s). This generates a, a+s, a+2*s, ..., b-s (assuming that continue.

Diffing a fragment of a file

A while back, I had extracted some code out of a large file into a separate file and made some mod­i­fi­ca­tions. I wanted to check that the dif­fer­ences were minimal. Let's say that the extracted code had been between lines 123 and 456 of large_old_­file.

diff -u <(sed -n '123,456p;457q' large_old_file) new_file

What's happening here?

A similar example: Diff continue.

Accidentally Quadratic: Python List Membership

We had a per­for­mance regression in a test suite recently when the median test time jumped by two minutes.

We tracked it down to this (simplified) code fragment:

task_inclusions = [ some_collection_of_tasks() ]
invalid_tasks = [t.task_id() for t in airflow_tasks
                 if t.task_id() not in task_inclusions]

This looks fairly in­nocu­ous—and it was—until the size of the result returned from some_­col­lec­tion_of_­tasks() jumped from a few hundred to a few thousand.

The in comparison operator con­ve­nient­ly works with all of Python's standard sequences and col­lec­tions, but its efficiency varies. For a list and other sequences, in must search continue.

Now You Have 32 Problems

Some people, when confronted with a problem, think “I know, I'll use regular ex­pres­sions.” Now they have two problems.

— Jaime Zawinksi

A Twitter thread about very long regexes reminded me of the longest regex that I ever ran afoul of, a par­tic­u­lar­ly horrible multilevel mess that had worked acceptably on the 32-bit .NET CLR, but brought the 64-bit CLR to its knees.

Whenever I ran our ASP.NET web ap­pli­ca­tion [on Win64], it would go berserk, eat up all 4GB of my physical RAM, push the working set of IIS's w3wp.exe to 12GB, and max out one of my 4 cores! The only way to maintain any sanity was to run iisreset every 20 minutes to continue.

Git Diff Tips

The Git Diff utility is much more functional than the standard command-line diff.

To see changes relative to the staging area (aka the index), use git diff.

To see staged changes, use git diff --staged (or --cached).

To see changes side by side on a line (where it makes sense), use the --color-word option.

To compare two arbitrary files in the file system, use git diff --no-index.

To try some other diff algorithms, use the --patience, --histogram, or --minimal options. The default diff algorithm is --myers.

Lots more at the docs.

OrderedDict Initialization

An Or­dered­Dict is a Python dict which remembers insertion order. When iterating over an Or­dered­Dict, items are returned in that order. Ordinary dicts return their items in an un­spec­i­fied order.

Ironically, most of the ways of con­struct­ing an ini­tial­ized Or­dered­Dict end up breaking the ordering in Python 2.x and in Python 3.5 and below. Specif­i­cal­ly, using keyword arguments or passing a dict (mapping) will not retain the insertion order of the source code.

Python 2.7.13 (default, Dec 18 2016, 07:03:39)
>>> from collections import OrderedDict

>>> odict = OrderedDict()
>>> odict['one'] = 1
>>> odict['two'] = 2
>>> odict['three'] = 3
>>> odict['four'] = 4
>>> odict['five'] = 5
>>> odict.items()
[('one', 1), ('two', 2), ('three', 

HTML5 tables require tr inside thead

When I learned HTML tables back in the 90s, at some point I discovered the <thead> element for grouping the <th> column headers. What I missed was there should be a <tr> element between the two. In other words, a well-formed HTML table with a header looks like this:


Negative Circled Digits

I found something very useful in the dingbats range of Unicode characters: the negative circled san-serif digits, ➊ ➋ ➌ ➍ ➎ ➏ ➐ ➑ ➒ ➓ .

I've started using them to label points of interest in code. They play well with the code-block directive in re­Struc­tured­Text.

sudo docker images --format '{{.Repository}}:{{.Tag}}' \ | grep $IMAGE_NAME \ 

JSON data from Docker Images

I was trying to get some structured in­for­ma­tion from docker images, hoping to replace some ugly Sed and AWK trickery. I could have used the docker-py library. Instead I chose to use the poorly documented --format option to docker images (and some other Docker CLI commands). Adrian Mouat gives some useful starting points at Docker Inspect Template Magic and notes that formatting is built around Go templates.

I quickly figured out that this format would meet my immediate need.

sudo docker images --format '{{.Repository}}:{{.Tag}}' \
    | grep $IMAGE_NAME \
    | grep -v latest \
    | head -1

That's fine, but continue.

Previous »