[idn] Re: Unicode and Security

Elliotte Rusty Harold <elharo@metalab.unc.edu> Thu, 07 February 2002 21:21 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA10228 for <idn-archive@lists.ietf.org>; Thu, 7 Feb 2002 16:21:52 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 16Yvm4-00091o-00 for idn-data@psg.com; Thu, 07 Feb 2002 13:08:08 -0800
Date: Thu, 07 Feb 2002 12:22:18 -0500
To: Unicode List <unicode@unicode.org>
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Subject: [idn] Re: Unicode and Security
Cc: idn@ops.ietf.org, schneier@counterpane.com
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Message-Id: <E16Yvm4-00091o-00@psg.com>

I've been thinking about security issues in Unicode, and I've come up
with one that's quite scary and worse than any I've heard before. It
uses only plaintext, no fonts involved, doesn't require buggy
software, and works over e-mail instead of the Web. All it requires
added to the existing infrastructure is internationalized domain
names. So in the hope that this becomes a self-defeating prophecy,
here's the scenario:

I as a reporter or industrial spy or detective working on a divorce
case, have learned the identities and internal e-mail addresses of
two people, call them Alice and Bob, at Microsoft (or just about any
other large company). I've somehow communicated with these people
personally, for instance on an e-mail list completely unrelated to
work but for which they use their work e-mail so I'm familiar with
their style and signature files. Or perhaps, I've communicated with
them on work related matters before. In any case, it's not hard to
get two people who know each other at a large company to send you
e-mail. Of course, they would presumably be careful not to give me
secret company information since they know they're talking to an
outsider.

For the sake of argument, let's call the company they work at
Microsoft, but this attack could hit most companies with a .com
address. Let's say I register microsoft.com, only the fifth letter
isn't a lower-case Latin o. It's actually a lower case Greek omicron.
I then forge a believable letter from alice@microsoft.com to
bob@microsoft.com saying "Can you please update me on your budget?"
Bob, noticing that the e-mail appears to come from Alice, whom he
knows and trusts, fires off a reply with his confidential
information. Only it doesn't go to Alice. It goes to me. I can then
reply to Bob, asking for clarification or more details. I can ask him
to attach the latest build of his software. I can carry on a
conversation in which Bob believes me to be Alice and spills his
guts. This is very, very bad.

E-mail forgery has been a problem for a long time, but it's always
been one-way. You couldn't trick somebody into sending you a reply
because doing so required using a different e-mail address than the
one they expected, thus revealing the message as forged. With a
Unicode enabled mailer, that's no longer true. If the fonts Bob (not
me, but Bob) chooses for his e-mail program do not make a clear
distinction between an o and an omicron, this works. There are lots
of other attacks. The Cyrillic and Greek alphabets provide lots of
options for replacing single letters in Latin domain names.

I'm not sure whether or not the internationalized domain names
working group has fully grokked this or not. Like Unicode, they seem
to be trying to pass the buck. In particular, they state
<http://www.ietf.org/internet-drafts/draft-ietf-idn-requirements-09.txt>
:

Specifying requirements for internationalized domain names does not
itself raise any new security issues. However, any change to the DNS
MAY affect the security of any protocol that relies on the DNS or on
DNS names. A thorough evaluation of those protocols for security
concerns will be needed when they are developed. In particular, IDNs
MUST be compatible with DNSSEC and, if multiple charsets or
representation forms are permitted, the implications of this
name-spoof MUST be throughly understood.

In other words, it's not our fault. Blame the client software. Sounds
distressingly like the Unicode Consortium's approach to these issues.
Interestingly, my attack works with a single character representation
(Unicode). It is not dependent on multiple charsets. I don't know if
the IDN working group has thought of this problem. I hope they have,
and consider it their responsibility to prevent. I also hope the
Unicode consortium and vendors of client software think about these
problems. But I don't think we can count on client software getting
this right. (Hell, Microsoft, can't even stop e-mail from running
scripts.)  The problem needs to be fixed closer to the source.
--

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+