George V. Reilly

Python: Joining URLs with posixpath.join

On Mac/Linux, os.path.join is an alias for posixpath.join, which always joins path segments with /. On Windows, os.path.join is an alias for ntpath.join, which always uses \. When dealing with URLs, we always want forward slashes, regardless of platform, so posixpath.join should be used to build URL paths.

Running:

from __future__ import print_function

from six.moves.urllib_parse import urljoin as abs_urljoin
from posixpath import join as path_urljoin

def urljoin(site, path):
    return abs_urljoin(site, path)

def test_join(site, path):
    result = urljoin(site, path)
    print("'{0}' + '{1}'\n\t-> '{2}'".format(site, path, result))
    return result

local_path = path_urljoin('2016', '07', '12', 'release', 'index.html')

test_join('https://www.example.com', 'foo/bar/quux.js')
test_join('https://www.example.com', local_path)
test_join('https://www.example.com/', local_path)
test_join('https://www.example.com/prefix', local_path)

Yields:

'https://www.example.com' + 'foo/bar/quux.js'
  
continue.

Logging in Python: Don't use new-fangled format

Python 2.6 introduced the format method to strings. In general, format is now the preferred way to build strings instead of the old % formatting operator.

One exception is with the logging module, where the best practice is to use %s and %d. Why? First, %s is the idiomatic way to use logging, which was built years before format was introduced. Second, if there's a literal % in the in­ter­po­lat­ed values, logging will be unhappy, since there won't be cor­re­spond­ing arguments in the call. It won't fall over, since “The logging package is designed to swallow exceptions which occur while logging in production. This is so that errors which occur while handling logging continue.

Bash: Bulk Renaming

I had to rename several hundred thousand files today. Thanks to a botched invocation of Im­ageMag­ick, they all looked like unique_pre­fix.png.jpg, whereas we simply wanted unique_pre­fix.jpg.

I found a suitable answer at the Unix Stack­Ex­change. As one of the many variants of parameter sub­sti­tu­tion, Bash supports ${var/Pattern/Re­place­men­t}: “first match of Pattern within var replaced with Re­place­ment.”

for f in *.png.jpg;
do
    mv $f "${f/.png}"
done

The target expression could also have been written as "${f/.png.jpg/.jpg}"

reStructuredText Anonymous Hyperlinks

While re­search­ing yes­ter­day's post about nested markup in Re­Struc­tured­Text, I finally learned how to use anonymous hyperlinks.

Hitherto, I used one of these three forms for hyperlinks:

1. The central conceit of the fictional `Flashman Papers`_ is that Flashy
2. besieging Breda_ in 1625.
3. my club, `Freely Speaking Toastmasters <http://freelyspeaking.org/>`_.

.. _Flashman Papers:
    https://en.wikipedia.org/wiki/The_Flashman_Papers
.. _Breda: http://en.wikipedia.org/wiki/Siege_of_Breda

The first, `Flashman Papers`_, is a named hyperlink reference, which refers to an external hyperlink target, .. _Flashman Papers: URI. Note that the reference name starts with a backquote, `, and ends with backquote-underscore, `_.

The second, Breda_, is a simple reference name—the backquotes are optional, but the trailing _ is crucial.

`Freely continue.

reStructuredText Nested Markup

I use re­Struc­tured­Text for both this blog and the MetaBrite DevBlog. This blog is built with Acrylamid, while the MetaBrite blog is built with Nikola.

Yesterday I used a link (~/.pgpass) that styled the link as an inline literal; i.e., in the code font. Re­Struc­tured­Text doesn't support nested markup, but you can pull it together with a two-step sub­sti­tu­tion reference:

Here you have |optparse.OptionParser|_.

.. |optparse.OptionParser| replace:: ``optparse.OptionParser`` documentation
.. _optparse.OptionParser: http://docs.python.org/library/optparse.html

This is tedious as you have to create a pair of directives for every such link that you wish to style.

Nested inline markup has been on the todo list for 15 years—it ain't happening.

Creating a New PostgreSQL Database at RDS

Many of us are guilty of saying “database” when we mean a database server or a DBMS. A database is a collection of tables storing related data, schemas, stored procs, and per­mis­sions. Most database servers are capable of managing many databases si­mul­ta­ne­ous­ly.

I needed to create a new PostgreSQL database at Amazon's RDS last week. I already had an RDS instance; I needed a new database on that instance. My Google searches turned up various recipes for creating a new RDS instance.

The following worked for me:

psql --host=SOME-DBMS-HOST --dbname 
continue.

What3Words

Without an Address, You’re No One introduced me to What3Words, an innovative system that uses just three English words to address any 3m×3m square on the planet. These words are drawn from a 40,000-word vocabulary. A 3m×3m square is precise enough to identify a particular doorway in a large building or a towel on a crowded beach.

I spent much of my childhood living at uses.pills.crunch (Dublin). I spent ten years working at navy.clear.poems (Mi­crosoft­'s Redmond campus). If I pinpointed the buildings that I worked in, they would each have completely different w3w addresses.

It reminds me a little of Diceware which strings together several words to form an easy-to-remember but continue.

Shrinking PDF File Size

Our poster designer sent me a PDF of this year's Bloomsday poster. I thought the file was too large at 7.2MB and I wanted to reduce the file size without sig­nif­i­cant loss of image quality. I was unable to achieve this in Preview or Acrobat Reader, but Ghost­script did the trick, thanks to an answer on AskUbuntu:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
    -dPDFSETTINGS=/prepress -dNOPAUSE -dQUIET -dBATCH \
    -sOutputFile=output.pdf input.pdf

The results speak for themselves.

Crop of the Original PDF, size 7.2MB.

Crop of -dPDF­SET­TINGS=/screen. PDF size: 78KB

Crop of -dPDF­SET­TINGS=/ebook. PDF size: 234KB

Crop of -dPDF­SET­TINGS=/prepress. PDF size: 1.75MB

Subtracting Compound Objects

Quick! How many days between 2014-11-29 and 2016-05-17? What's the angle between the hour hand and the minute hand on an analog clock when the time reads 11:37?

The hard way to compute the difference between the two dates is to start counting back months and days until you reach the earlier date, or equiv­a­lent­ly to count forward from the beginning. (Don't forget that Feb 2016 has 29 days but Feb 2015 has 28.) Similarly with the angle between the hands.

The easier way is to compute the number of units between the first point and some reference (or base) point, to do the same for the second point, and to subtract continue.

The Unix file Command

I had forgotten all about the file command until it was mentioned in a Stack­Over­flow answer today. If you run file some.iso, it will display the label embedded in the disk image. More generally, you can run file on many different kinds of files and it will do a decent job of iden­ti­fy­ing the type of data.

Previous » « Next