During an internal training exercise today,
as a sort of one-man Chaos Monkey,
I deliberately broke a test system by changing a config setting to read:
itemfinder.url = http://test-іtemfinder.example.com/
The correct value should have been:
itemfinder.url = http://test-itemfinder.example.com/
What's that, you say? There's no difference, you say?
There is a difference, but it's subtle.
The first i in the URL is
'CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I' (U+0456),
not 'LATIN SMALL LETTER I' (U+0069).
Depending upon the font, the two is may be visually indistinguishable,
very similar looking, or the Cyrillic i may not render.
This is an example of an International Domain Name Homograph Attack.
There are Greek letters and Cyrillic letters that look …continue.
Yesterday I showed FileFormat's ɹǝʇɹǝʌuoↃ uʍo◖-ǝpısd∩ ǝpoɔıu∩.
Although the lowercase letters generally looked good,
several of the uppercase letters and numerals were unsatisfactory.
Looking through the Unicode Table site,
I came across the Fraser Lisu alphabet,
which is unfortunately not well supported in most fonts.
The following renders in Hack and Source Code Pro in MacVim,
but not in the Source Code Pro webfont from Google Fonts:
B: ꓭ u+A4ED Lisu Letter Gha
D: ꓷ u+A4F7 Lisu Letter Oe
J: ꓩ u+A4E9 Lisu Letter Fa
K: ꓘ u+A4D8 Lisu Letter Kha
L: ꓶ u+A4F6 Lisu Letter Uh
R: ꓤ u+A4E4 Lisu Letter Za
T: ꓕ u+A4D5 Lisu …continue.
Unicode is so versatile that you can (more or less) invert the Latin alphabet:
ɐqɔpǝɟƃɥıɾʞʃɯuodbɹsʇnʌʍxʎz ∀𐐒Ↄ◖ƎℲ⅁HIſ⋊⅂WᴎOԀΌᴚS⊥∩ᴧMX⅄Z 012Ɛᔭ59Ɫ86
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
68Ɫ95ᔭƐ210 Z⅄XMᴧ∩⊥SᴚΌԀOᴎW⅂⋊ſIH⅁ℲƎ◖Ↄ𐐒∀ zʎxʍʌnʇsɹbdouɯʃʞɾıɥƃɟǝpɔqɐ
Obtained via the ɹǝʇɹǝʌuoↃ uʍo◖-ǝpısd∩ ǝpoɔıu∩.
More at Unicode Upside-Down Mapping.
Update: more tomorrow.
[Previously published at the now defunct MetaBrite Dev Blog.]
Some time ago,
we made an ill-considered decision to use recipe names for image URLs,
which simplified image management with our then-rudimentary tools.
For example, the recipe named
"Twisted Pasta With Browned Butter, Sage, and Walnuts"
becomes a URL ending in
Life becomes more interesting when you escape the confines of 7-bit ASCII and use Unicode.
How should u"Sautéed crème fraîche Provençale" be handled?
The only reasonable thing to do is to first convert the Unicode string to UTF-8
and then hex-encode those octets:
That seems reasonable, but it was giving us inconsistent results
when the images were uploaded to an S3 bucket.