Re: [idn] process

"Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord.cnri.reston.va.us> Sat, 26 February 2005 08:23 UTC

Date: Sat, 26 Feb 2005 08:19:13 +0000
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord.cnri.reston.va.us>
To: idn@ops.ietf.org
Subject: Re: [idn] process
Message-ID: <20050226081913.GD14956~@nicemice.net>
Reply-To: IETF idn working group <idn@ops.ietf.org>
References: <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <421FCBD7.8000805@vanderpoel.org> <421FA55B.9000308@vanderpoel.org> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <00a401c51af3$7863aae0$030aa8c0@DEWELL>
User-Agent: Mutt/1.5.6+20040722i
Sender: owner-idn@ops.ietf.org
Precedence: bulk

Doug Ewell <dewell@adelphia.net> wrote:

> Is it really possible that we spent a year and a half, two years on
> putting together an IDN architecture, and during all that time nobody
> ever gave the slightest thought to the possibility of someone using
> IDNs for spoofing purposes,

No, it was thought about, and it was decided that the IDNA protocol was
not the place to address those issues; that they should be addressed in
registries and user interfaces.

IDNA could have addressed the easier portion of the problem (prohibiting
punctuation and symbols) (and for a while I was arguing for that), but
it still would have left the harder part of the problem (dealing with
script mixtures and homographs among letters) for the registries and
user interfaces to deal with, so why not let them deal with the easier
part too?

(Of course, one could then ask why that argument doesn't apply to all
the invisible characters that IDNA does prohibit.  I have no good answer
at the moment.  Maybe invisibility was the only disqualifying attribute
that everyone could agree on.)

John C Klensin <klensin@jck.com> wrote:

> I hope that those who wrote the IDNA specs will agree with the
> statement of those principles I'm about to make, or at least that they
> are close... they may not.
>
> (1) To the extent possible, we should accommodate all Unicode
> characters, excluding as little as possible.

That (or something very similar) was a principle that went into the
IDNA spec.  I personally was inclined to define both internationalized
domain names and internationalized host names, where the former would
be completely general (allowing *all* Unicode characters, even the
invisible ones), and the latter would be much narrower (excluding most
punctuation and symbols).  This would be an analogy to traditional
domain names (which allow all ASCII characters, even control characters)
and traditional host names (which allow only the ASCII letters, digits,
and one punctuation mark, the hyphen-minus).

On the other hand, there was an argument that the traditional
distinction between domain names and host names was the source of
endless confusion and debate, and was a mistake that should not be
repeated with IDNs.  I have some sympathy for that argument.

In any case, we ended up with just one set of non-ASCII characters for
IDNs, between the two extremes: only invisible characters are excluded.
(I think there's one exception--a visible space character that is also
excluded).

> (2) When code points had been identified by UTC as the same as, or
> equivalent to, others, we tended to map them together, rather than
> picking one and prohibiting the others.

This was more than a tendency; it was strictly followed.

> This has caused more problems than most of us expected, with people
> being surprised when they register or query using one character and
> the result that comes back uses another.

I think this happens only for the case-folding mappings.  The
normalization mappings should not surprise anyone.

> It also creates a near-homograph problem that we haven't "discovered"
> in the last couple of weeks: If we have character X mapping to
> character Y, but X looks vaguely like Z, then there may be no Y-Z
> homograph, but there may be an X-Z one.

True.  And again, I think it's just the case-folding mappings that do
this, not the normalization mappings.

> Curiously, if we followed existing precedents, we could even move
> IDNA from Proposed to Draft and change the tables to eliminate many
> mappings and characters: no change to the algorithm, just elimination
> of some features that didn't work in practice.

If we want to place further restrictions the set of characters used
in IDNs, I think it would be pretty rude of us to simply add them to
the set of prohibited characters in Nameprep.  What about the guy who
registered <not_equal>.com?  What if people had already bookmarked that
site, and created links to it?  Are we just going to break those links?

A less rude approach would be to recommend that domain labels containing
certain characters not be displayed.  Their ACE forms could still be
display, and they could still be looked up.  The domain holder in this
example could register a new displayable domain name, and could put an
HTTP redirector at the old site, and existing bookmarks and links would
continue to work.

Erik van der Poel <erik@vanderpoel.org> wrote:

> I believe it would be difficult to reach consensus on a relatively
> narrow extension of the LDH rule.  Just for starters, the hyphen used
> to separate names and other strings in the Western world is not used
> in Japan for Katakana, because Katakana uses a middle dot (U+30FB) to
> separate 2 Katakana strings.  In fact, this character is allowed in
> .jp.

But notice how seldom the hyphen-minus is actually used in domain
names.  People prefer to just run words together, even in languages that
customarily use word breaks.  Maybe the analogous characters in other
scripts (like the katakana middle dot) would likewise be very seldom
used in practice (especially in Japan where the lack of word breaks is
the norm), and would not be missed if they were deprecated.

> It may be possible to "tune" the tables, but nowhere in your email do
> I find any reference to the ACE prefix.  I think that we should also
> figure out exactly which types of changes would absolutely require a
> new ACE prefix,

Coming up with the necessary and sufficient conditions will be tricky,
but now that you've got me thinking about it, I think I can supply
one sufficient condition:  If the only changes you make are to add
characters to the prohibited table, I don't think you need to change the
ACE prefix.  This would cause some valid IDN labels under the old spec
to become invalid under the new spec, and would cause some valid ACE
labels under the old spec to become bogo-ACE labels under the new space.
(The bogo-ACE phenomenon already exists: there are labels that begin
with the ACE prefix but don't validate during ToUnicode and therefore
display as literal ASCII strings.)  It would not cause anything to
encode or decode to something different than it used to.

But I don't advocate making such a change (see my argument above about
rudeness).

AMC

[idn] related work Erik van der Poel
[idn] Unicode categories Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue John C Klensin
Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
Re: [idn] something a little lighter for the week… Doug Ewell
Re: [idn] stability Erik van der Poel
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: process Adam M. Costello
Re: [idn] punctuation John C Klensin
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] Re: character tables Gervase Markham
Re: [idn] stringprep: PRI #29 Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
Re: [idn] Re: stability Erik van der Poel
Re: [idn] process Paul Hoffman
Re: [idn] Re: character tables YAO Jiankang
Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
Re: [idn] punctuation John C Klensin
Re: [idn] punctuation tedd
Re: [idn] Re: character tables JFC (Jefsey) Morfin
Re: [idn] punctuation Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
Re: [idn] Re: stability Erik van der Poel
Re: [idn] Re: character tables Adam M. Costello
[idn] Re: character tables John C Klensin
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] Re: character tables Paul Hoffman
Re: [idn] Re: stability Martin v. Löwis
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: stability John C Klensin
[idn] Re: Unicode categories John C Klensin
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
[idn] character tables Erik van der Poel
Re: [idn] Re: character tables John C Klensin
Re: [idn] Re: stability Mark Davis
Re: [idn] Re: stringprep: PRI #29 Erik van der Poel
[idn] stability Erik van der Poel
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
Re: [idn] process Adam M. Costello
Re: [idn] Re: character tables William Tan
Re: [idn] Re: process James Seng
[idn] Re: stability Simon Josefsson
Re: [idn] stability Erik van der Poel
[idn] Re: stability Martin v. Löwis
Re: [idn] Re: process Jaap Akkerhuis
Re: [idn] Re: stringprep: PRI #29 Adam M. Costello
Re: [idn] punctuation tedd
[idn] Re: dichotomies Erik van der Poel
Re: [idn] Re: stability Martin v. Löwis
Re: [idn] punctuation Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] process JFC (Jefsey) Morfin
[idn] Re: stability Simon Josefsson
Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
[idn] Re: stringprep: PRI #29 Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
Re: [idn] process John C Klensin
Re: [idn] Re: Unicode categories Mark Davis
Re: [idn] process Doug Ewell
Re: [idn] Re: stability Adam M. Costello
Re: [idn] process Erik van der Poel
[idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] punctuation tedd
[idn] punctuation Erik van der Poel
Re: [idn] Re: stability James Seng
[idn] Re: stability Simon Josefsson
[idn] something a little lighter for the weekend Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] something a little lighter for the week… Adam M. Costello
Re: [idn] process Gervase Markham
[idn] Re: character tables Cary Karp
[idn] Mozilla? JFC (Jefsey) Morfin
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] punctuation Erik van der Poel
[idn] Re: Unicode categories Erik van der Poel
[idn] Re: stability Simon Josefsson
Re: [idn] Re: character tables JFC (Jefsey) Morfin
[idn] Re: process Stephane Bortzmeyer
Re: [idn] process Erik van der Poel
Re: [idn] punctuation Jaap Akkerhuis
Re: [idn] Re: character tables Gervase Markham
Re: [idn] Re: process Jaap Akkerhuis
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] Re: process James Seng
[idn] stringprep mailing list Erik van der Poel
Re: [idn] Re: dichotomies Erik van der Poel
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
Re: [idn] Re: stability Erik van der Poel
Re: [idn] Re: character tables Erik van der Poel
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] Re: process Erik van der Poel
[idn] Re: stringprep: PRI #29 Simon Josefsson
Re: [idn] punctuation Erik van der Poel
Re: [idn] stability Martin v. Löwis
[idn] stringprep: PRI #29 Erik van der Poel
Re: [idn] Re: character tables Paul Hoffman
Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
[idn] Re: stability Simon Josefsson
[idn] process Erik van der Poel
[idn] stringprep: existing profiles and string pr… Erik van der Poel
Re: [idn] Re: stability Erik van der Poel
[idn] dichotomies Erik van der Poel
Re: [idn] stability JFC (Jefsey) Morfin
[idn] Re: character tables Cary Karp
Re: [idn] Re: process Erik van der Poel
[idn] Re: stringprep mailing list Simon Josefsson
Re: [idn] Re: Unicode categories Martin v. Löwis
Re: [idn] Re: stability JFC (Jefsey) Morfin
Re: [idn] something a little lighter for the week… John C Klensin
Re: [idn] something a little lighter for the week… Adam M. Costello
Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
Re: [idn] Re: stability Erik van der Poel
Re: [idn] Re: stability Erik van der Poel
[idn] Re: stringprep: PRI #29 Simon Josefsson
Re: [idn] stability Erik van der Poel
[idn] Re: stringprep: PRI #29 Simon Josefsson