Most regular expression engines make it easy to
match alternations (or disjunctions) with the | operator:
to match either foo or bar,
Few regex engines have any provisions for conjunctions,
and the syntax is often horrible.
Awk makes it easy to match /pat1/ && /pat2/ && /pat3/.
$ cat <<EOF | awk '/bar/ && /foo/'
> foo bar
> barfy food
In the case of a Unix pipeline,
the conjunction could also be expressed as a series of pipes:
... | grep pat1 | grep pat2 | grep pat3 | ....
The longest regex that I ever encountered
was an enormous alternation—a true horror that shouldn't have …continue.
Python enumerations are useful for grouping related constants in a namespace.
You can add additional behaviors to an enum class,
but there isn't an easy and obvious way
to add attributes to enum members.
CORRECT = 1
PRESENT = 2
ABSENT = 3
if self is self.CORRECT:
elif self is self.PRESENT:
My display name on Twitter currently looks like @ɢᴇᴏʀɢᴇᴠʀᴇɪʟʟʏ@ᴛᴇᴄʜ.ʟɢʙᴛ,
an attempt to route around Twitter's apparent censorship of Mastodon information.
I used the FSymbols Generators to produce several variants.
Many of these variants come from
Unicode Block "Mathematical Alphanumeric Symbols".
There are a lot more things you can do with Unicode
than just upside-down text.
In Python, if you want to specify a sequence of numbers
a up to (but excluding)
you can write
This generates the sequence
a, a+1, a+2, ..., b-1.
You start at
a and keep going until the next number would be
In Python 3,
range is lazy
and the values in the sequence do not materialize
until you consume the range.
[3, 4, 5, 6, 7, 8, 9, 10, 11]
Trey Hunner makes the point that
range is a lazy iterable
rather than an iterator.
You can also step by an increment other than one:
range(a, b, s).
a, a+s, a+2*s, ..., b-s
(assuming that …continue.
A while back, I had extracted some code out of a large file
into a separate file and made some modifications.
I wanted to check that the differences were minimal.
Let's say that the extracted code had been between
lines 123 and 456 of large_old_file.
diff -u <(sed -n '123,456p;457q' large_old_file) new_file
What's happening here?
- sed -n '123,456p' is printing lines 123–456 of large_old_file.
- The 457q tells sed to abandon the file at line 457.
Otherwise, it will keep reading all the way to the end.
- The <(sed ...) is an example of process substitution.
The output of the sed invocation
becomes the first input of the diff command.
A similar example: Diff …continue.
We had a performance regression in a test suite recently
when the median test time jumped by two minutes.
We tracked it down to this (simplified) code fragment:
task_inclusions = [ some_collection_of_tasks() ]
invalid_tasks = [t.task_id() for t in airflow_tasks
if t.task_id() not in task_inclusions]
This looks fairly innocuous—and it was—until the size of the result returned from some_collection_of_tasks()
jumped from a few hundred to a few thousand.
The in comparison operator conveniently works
with all of Python's standard sequences and collections,
but its efficiency varies.
For a list and other sequences,
in must search …continue.
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”
Now they have two problems.
— Jaime Zawinksi
A Twitter thread about very long regexes
reminded me of the longest regex that I ever ran afoul of,
a particularly horrible multilevel mess
that had worked acceptably on the 32-bit .NET CLR,
but brought the 64-bit CLR to its knees.
Whenever I ran our ASP.NET web application [on Win64],
it would go berserk, eat up all 4GB of my physical RAM,
push the working set of IIS's w3wp.exe to 12GB,
and max out one of my 4 cores!
The only way to maintain any sanity was to run iisreset
every 20 minutes to …continue.
The Git Diff utility is much more functional than the standard command-line diff.
To see changes relative to the staging area (aka the index),
use git diff.
To see staged changes, use git diff --staged (or --cached).
To see changes side by side on a line (where it makes sense),
use the --color-word option.
To compare two arbitrary files in the file system,
use git diff --no-index.
To try some other diff algorithms,
use the --patience, --histogram, or --minimal options.
The default diff algorithm is --myers.
Lots more at the docs.
An OrderedDict is a Python dict which remembers insertion order.
When iterating over an OrderedDict, items are returned in that order.
Ordinary dicts return their items in an unspecified order.
Ironically, most of the ways of constructing an initialized OrderedDict
end up breaking the ordering in Python 2.x and in Python 3.5 and below.
Specifically, using keyword arguments or passing a dict (mapping)
will not retain the insertion order of the source code.
Python 2.7.13 (default, Dec 18 2016, 07:03:39)
>>> from collections import OrderedDict
>>> odict = OrderedDict()
>>> odict['one'] = 1
>>> odict['two'] = 2
>>> odict['three'] = 3
>>> odict['four'] = 4
>>> odict['five'] = 5
[('one', 1), ('two', 2), ('three', …continue.
When I learned HTML tables back in the 90s,
at some point I discovered the <thead> element
for grouping the <th> column headers.
What I missed was there should be a <tr> element between the two.
In other words, a well-formed HTML table with a header looks like this: