Re: [idn] nameprep2 and the slash homograph issue

John C Klensin <klensin@jck.com> Wed, 23 February 2005 08:47 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA13492 for <idn-archive@lists.ietf.org>; Wed, 23 Feb 2005 03:47:00 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D3s3u-000853-I5 for idn-data@psg.com; Wed, 23 Feb 2005 08:40:02 +0000
Received: from [209.187.148.211] (helo=bs.jck.com) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D3s3r-000841-7S for idn@ops.ietf.org; Wed, 23 Feb 2005 08:39:59 +0000
Received: from [209.187.148.215] (helo=scan.jck.com) by bs.jck.com with esmtp (Exim 4.34) id 1D3s3q-0009Ac-RK for idn@ops.ietf.org; Wed, 23 Feb 2005 03:39:58 -0500
Date: Wed, 23 Feb 2005 03:39:58 -0500
From: John C Klensin <klensin@jck.com>
To: IETF idn working group <idn@ops.ietf.org>
Subject: Re: [idn] nameprep2 and the slash homograph issue
Message-ID: <D872CCF059514053ECF8A198@scan.jck.com>
In-Reply-To: <20050223072837.GA21463~@nicemice.net>
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net>
X-Mailer: Mulberry/3.1.6 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, MAILTO_TO_REMOVE autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit


--On Wednesday, 23 February, 2005 07:28 +0000 "Adam M. Costello"
<idn.amc+0@nicemice.net.RemoveThisWord> wrote:

> Erik van der Poel <erik@vanderpoel.org> wrote:
> 
>> Another argument against banning the slash homograph is that
>> any new banning would require a new ACE prefix, which is a
>> lot of work, and, as John said, there should be a high
>> threshold for any demonstration that tries to show that a new
>> prefix is necessary.
> 
> An alternative, rather than banning the character, is to
> recommend that it not be shown; the ACE form could be shown
> instead.  This would effectively make the character useless in
> domain names (for both phishers and honest folks) without
> requiring a new ACE prefix.
> 
> We could push ToUnicode down inside a wrapper function,
> ToDisplay. Applications would never call ToUnicode directly
> anymore.  Whenever they wanted to display a domain name,
> they'd call ToDisplay, which would call ToUnicode, check the
> result, and if it didn't like it, call ToASCII.  (Of course,
> since ToUnicode typically calls ToASCII, there are
> opportunities to optimize that logic.)

Adam, there are two problems with this.  First, it effectively
dictates UI behavior, which is generally a bad idea.  It is a
particularly bad idea in this case because you are proposing the
sort of UI behavior that generates a lot of very confused
questions and trouble reports, which is something no sane
implementer wants. And "won't display" is not the right answer
on the registration side of the process, even if it were right
on the lookup side.   The second problem is that it is a kludge,
and that inserting kludges into critical protocols or procedures
--and IDNs are certainly critical-- almost always turns out to
be a seriously bad idea sooner or later.

If we find a need to start banning characters that we could not
agree on banning the first time around, there is another
approach, also unpleasant but IMO less problematic, that could
be considered.  Just as RFC 2822 moved past a lot of legacy
nonsense by having two separate "create" and "accept" syntaxes,
we could define an additional profile, say "NameRegisterPrep".
It would look a lot like Nameprep but would ban the characters
you are now suggesting banning, plus, based on what I think is
growing experience in the registries, ban any character that
mapped to anything else.  The effect would be to permit only
those code points as input to the registration process that
could be output into punycode and the DNS.  Several registries
have adopted the latter part of that model already: basically
what you register is ToASCII(string) and/or
ToUnicode(ToASCII(string)), but never "string".  The lookup
process would remain the same, with no changes to Nameprep being
made at all.  And, by eliminating all of the mapping tables and
replacing them with prohibitions, it would make the question
"can this character appear in an IDN" a great deal less
complicated, which would certainly be an advantage.

This type of registration restriction is rather different from
our asking/expecting ICANN and the registries to adopt rules
about, e.g., mixed-script registrations that would help people
stay out of trouble.  For better or worse, ICANN has a great
deal less trouble asking (or demanding) that people conform to
the protocols than it does with making up a somewhat fuzzy
guideline and enforcing it.  In the most extreme of cases,
violating a protocol in a significant way is one of those "not
acting in the best interests of the local users and the Internet
community" that RFC 1591 warns against and indicates could be
grounds for redelegating a registry.

It would leave the registries and ICANN stuck with the problem
of what to do about anything that was now registered which
violated the new rules, but that problem would exist for _any_
substantive change we made.

    john