George V. Reilly

Homograph Attacks

During an internal training exercise today, as a sort of one-man Chaos Monkey, I de­lib­er­ate­ly broke a test system by changing a config setting to read:

itemfinder.url = http://test-іtemfinder.example.com/

The correct value should have been:

itemfinder.url = http://test-itemfinder.example.com/

What’s that, you say? There’s no difference, you say?

There is a difference, but it’s subtle. The first i in the URL is CYRILLIC SMALL LETTER BYELORUSS­IAN-UKRAINIAN I’ (U+0456), not LATIN SMALL LETTER I’ (U+0069). Depending upon the font, the two is may be visually in­dis­tin­guish­able, very similar looking, or the Cyrillic i may not render.

This is an example of an In­ter­na­tion­al Domain Name Homograph Attack. There are Greek letters and Cyrillic letters that look very similar to Latin letters, but which have distinct meanings, histories, and code points. In­ter­na­tion­al Domain Names permit the con­struc­tion of non-Latin domain names.

Since domain name labels may only contain Latin letters, digits, and hyphens, an encoding scheme known as Punycode transforms Unicode domain names into ASCII domain names. For example, test-іtemfinder.example.com (Cyrillic i) becomes xn--test-temfinder-99l.example.com in Punycode.

Obviously, homographs can be used to spoof URLs. Browsers generally present the Punycode form of an IDN if it looks like suspicious homographs might be present in the address bar, while presenting valid IDNs in Unicode; e.g., http://ουτοπία.δπθ.gr/.

blog comments powered by Disqus
Flame Graphs and Flame Charts » « Review: Flashman in the Great Game