George V. Reilly

Homograph Attacks

During an internal training exercise today, as a sort of one-man Chaos Monkey, I de­lib­er­ate­ly broke a test system by changing a config setting to read:

itemfinder.url = http://test-і

The correct value should have been:

itemfinder.url =

What's that, you say? There's no difference, you say?

There is a difference, but it's subtle. The first i in the URL is 'CYRILLIC SMALL LETTER BYELORUSS­IAN-UKRAINIAN I' (U+0456), not 'LATIN SMALL LETTER I' (U+0069). Depending upon the font, the two is may be visually in­dis­tin­guish­able, very similar looking, or the Cyrillic i may not render.

This is an example of an In­ter­na­tion­al Domain Name Homograph Attack. There are Greek letters and Cyrillic letters that look continue.

Unicode Upside-Down Mapping, Part 2

Yesterday I showed File­For­mat's ɹǝʇɹǝʌuoↃ uʍo◖-ǝpısd∩ ǝpoɔıu∩. Although the lowercase letters generally looked good, several of the uppercase letters and numerals were un­sat­is­fac­to­ry. Looking through the Unicode Table site, I came across the Fraser Lisu alphabet, which is un­for­tu­nate­ly not well supported in most fonts. The following renders in Hack and Source Code Pro in MacVim, but not in the Source Code Pro webfont from Google Fonts:

B: ꓭ u+A4ED  Lisu Letter Gha
D: ꓷ u+A4F7  Lisu Letter Oe
J: ꓩ u+A4E9  Lisu Letter Fa
K: ꓘ u+A4D8  Lisu Letter Kha
L: ꓶ u+A4F6  Lisu Letter Uh
R: ꓤ u+A4E4  Lisu Letter Za
T: ꓕ u+A4D5  Lisu 

Unicode Upside-Down Mapping

Unicode is so versatile that you can (more or less) invert the Latin alphabet:

ɐqɔpǝɟƃɥıɾʞʃɯuodbɹsʇnʌʍxʎz ∀𐐒Ↄ◖ƎℲ⅁HIſ⋊⅂WᴎOԀΌᴚS⊥∩ᴧMX⅄Z 012Ɛᔭ59Ɫ86
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
68Ɫ95ᔭƐ210 Z⅄XMᴧ∩⊥SᴚΌԀOᴎW⅂⋊ſIH⅁ℲƎ◖Ↄ𐐒∀ zʎxʍʌnʇsɹbdouɯʃʞɾıɥƃɟǝpɔqɐ

Obtained via the ɹǝʇɹǝʌuoↃ uʍo◖-ǝpısd∩ ǝpoɔıu∩. More at Unicode Upside-Down Mapping.

Update: more tomorrow.

URLs from Unicode Strings

[Pre­vi­ous­ly published at the now defunct MetaBrite Dev Blog.]

Some time ago, we made an ill-considered decision to use recipe names for image URLs, which simplified image management with our then-rudi­men­ta­ry tools. For example, the recipe named "Twisted Pasta With Browned Butter, Sage, and Walnuts" becomes a URL ending in "Twist­ed%20­Pas­ta%20With­%20Browned%20But­ter%2C%20Sage%2C%20and%20Wal­nuts.jpg".

Life becomes more in­ter­est­ing when you escape the confines of 7-bit ASCII and use Unicode. How should u"Sautéed crème fraîche Provençale" be handled? The only reasonable thing to do is to first convert the Unicode string to UTF-8 and then hex-encode those octets: "Saut%C3%A9ed%20cr%C3%A8me%20fra%C3%AEche%20Proven%C3%A7ale".

That seems reasonable, but it was giving us in­con­sis­tent results when the images were uploaded to an S3 bucket. When continue.