George V. Reilly

Exploring Wordle

Unless YOUVE LIVED UNDER ROCKS, you've heard of Wordle, the online word game that has become wildly popular since late 2021. You've probably seen people posting their Wordle games as grids of little green, yellow, and black (or white) emojis on social media.

Wordle 797 4/6

⬛ ⬛ ⬛ ⬛ 🟨
🟨 ⬛ 🟩 ⬛ ⬛
⬛ ⬛ 🟩 🟨 ⬛
🟩 🟩 🟩 🟩 🟩

The problem that I want to address in this post is:

Given some GUESS=SCORE pairs for Wordle and a word list, pro­gram­mat­i­cal­ly find all the words from the list that are eligible as answers.

Let's look at this four-round game for Wordle 797:

continue.
J U D

Python Enums with Attributes

Python enu­mer­a­tions are useful for grouping related constants in a namespace. You can add additional behaviors to an enum class, but there isn't an easy and obvious way to add attributes to enum members.

class TileState(Enum):
    CORRECT = 1
    PRESENT = 2
    ABSENT  = 3

    def color(self):
        if self is self.CORRECT:
            return "Green"
        elif self is self.PRESENT:
          
continue.

Patching a Python Wheel

Recently, I had to create a new Python wheel for PyTorch. There is a cyclic dependency between PyTorch 2.0.1 and Triton 2.0.0: Torch depends upon Triton, but Triton also depends on Torch. Pip is okay with installing packages where there's a cyclic dependency. Bazel, however, does not handle cyclic de­pen­den­cies between packages. We use Bazel ex­ten­sive­ly at Stripe and this cyclic dependency prevented us from using the latest version of Torch.

I spent a few days trying to build the PyTorch wheel from source. It was a nightmare! I ran out of disk space on the root partition on my EC2 devbox trying to install system packages, so I had to continue.

Backwards Ranges in Python

In Python, if you want to specify a sequence of numbers from a up to (but excluding) b, you can write range(a, b). This generates the sequence a, a+1, a+2, ..., b-1. You start at a and keep going until the next number would be b.

In Python 3, range is lazy and the values in the sequence do not ma­te­ri­al­ize until you consume the range.

>>> range(3,12)
range(3, 12)
>>> list(range(3,12))
[3, 4, 5, 6, 7, 8, 9, 10, 11]

Trey Hunner makes the point that range is a lazy iterable rather than an iterator.

You can also step by an increment other than one: range(a, b, s). This generates a, a+s, a+2*s, ..., b-s (assuming that continue.

Accidentally Quadratic: Python List Membership

We had a per­for­mance regression in a test suite recently when the median test time jumped by two minutes.

We tracked it down to this (simplified) code fragment:

task_inclusions = [ some_collection_of_tasks() ]
invalid_tasks = [t.task_id() for t in airflow_tasks
                 if t.task_id() not in task_inclusions]

This looks fairly in­nocu­ous—and it was—until the size of the result returned from some_­col­lec­tion_of_­tasks() jumped from a few hundred to a few thousand.

The in comparison operator con­ve­nient­ly works with all of Python's standard sequences and col­lec­tions, but its efficiency varies. For a list and other sequences, in must search continue.

OrderedDict Initialization

An Or­dered­Dict is a Python dict which remembers insertion order. When iterating over an Or­dered­Dict, items are returned in that order. Ordinary dicts return their items in an un­spec­i­fied order.

Ironically, most of the ways of con­struct­ing an ini­tial­ized Or­dered­Dict end up breaking the ordering in Python 2.x and in Python 3.5 and below. Specif­i­cal­ly, using keyword arguments or passing a dict (mapping) will not retain the insertion order of the source code.

Python 2.7.13 (default, Dec 18 2016, 07:03:39)
>>> from collections import OrderedDict

>>> odict = OrderedDict()
>>> odict['one'] = 1
>>> odict['two'] = 2
>>> odict['three'] = 3
>>> odict['four'] = 4
>>> odict['five'] = 5
>>> odict.items()
[('one', 1), ('two', 2), ('three', 
continue.

Alembic: Data Migrations

We use Alembic to perform schema migrations whenever we add (or drop) tables or columns from our databases. It's less well known that Alembic can also perform data migrations, updating existing data in tables.

Here's an example adapted from a migration I put together this afternoon. I added a non-NULL Boolean stooge column to the old_timers table, with a default value of FALSE. I wanted to update certain rows to have stooge=TRUE as part of the migration. The following works with PostgreSQL.

Note the server_de­fault=sa.false() in the de­c­la­ra­tion of the stooge column, which is needed to initially set all instances of stooge=FALSE. I then declare a table which has only the two continue.

Rounding

I recently learned from a Stack­Over­flow question that the rounding behavior in Python 3.x is different from Python 2.x:

The round() function rounding strategy and return type have changed. Exact halfway cases are now rounded to the nearest even result instead of away from zero. (For example, round(2.5) now returns 2 rather than 3.)

The “away from zero” rounding strategy is the one that most of us learned at school. The “nearest even” strategy is also known as “banker’s rounding”.

There are actually five rounding strategies defined in IEEE 754:

Mode / Example Value +11.5 +12.5 −11.5 −12.5
to nearest, ties to even +12.0 +12.0 −12.0 −12.0
to nearest, ties away from zero +12.0 +13.0 −12.0 −13.0
toward 0 (truncation) +11.0 +12.0 −11.0 −12.0
toward +∞ (ceiling) +12.0 +13.0 −11.0 −12.0
toward −∞ (floor) +11.0 +12.0 −12.0 −13.0

Further continue.

SQLAlchemy got me Killed

I ran a script this afternoon that died mys­te­ri­ous­ly without any output. It was using SQLAlchemy to query all the rows from a large table so that they could be trans­formed into JSON Lines to be loaded into Elas­tic­search. When I reran my script, I noticed this time that something had printed Killed at the very end.

A little research convinced me that the OOM Killer was the likely assassin. I looked in /var/log/kern.log and I found that my process had used up almost all of the 8GB on this system before being killed.

The query had to be the problem. A little more research led me to augment my continue.

Python: Joining URLs with posixpath.join

On Mac/Linux, os.path.join is an alias for posixpath.join, which always joins path segments with /. On Windows, os.path.join is an alias for ntpath.join, which always uses \. When dealing with URLs, we always want forward slashes, regardless of platform, so posixpath.join should be used to build URL paths.

Running:

from __future__ import print_function

from six.moves.urllib_parse import urljoin as abs_urljoin
from posixpath import join as path_urljoin

def urljoin(site, path):
    return abs_urljoin(site, path)

def test_join(site, path):
    result = urljoin(site, path)
    print("'{0}' + '{1}'\n\t-> '{2}'".format(site, path, result))
    return result

local_path = path_urljoin('2016', '07', '12', 'release', 'index.html')

test_join('https://www.example.com', 'foo/bar/quux.js')
test_join('https://www.example.com', local_path)
test_join('https://www.example.com/', local_path)
test_join('https://www.example.com/prefix', local_path)

Yields:

'https://www.example.com' + 'foo/bar/quux.js'
  
continue.
Previous »