Re: [idn] IDNA section 3.1 requirement 3
"Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord.cnri.reston.va.us> Sun, 27 March 2005 23:28 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA15785 for <idn-archive@lists.ietf.org>; Sun, 27 Mar 2005 18:28:27 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DFh88-000INw-U5 for idn-data@psg.com; Sun, 27 Mar 2005 23:25:16 +0000
Received: from [64.36.79.201] (helo=nicemice.net) by psg.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.44 (FreeBSD)) id 1DFh86-000INb-Oj for idn@ops.ietf.org; Sun, 27 Mar 2005 23:25:14 +0000
Received: from amc by nicemice.net with local (Exim 4.44) id 1DFh84-0004IL-QW for idn@ops.ietf.org; Sun, 27 Mar 2005 15:25:12 -0800
Date: Sun, 27 Mar 2005 23:25:12 +0000
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord.cnri.reston.va.us>
To: IETF idn working group <idn@ops.ietf.org>
Subject: Re: [idn] IDNA section 3.1 requirement 3
Message-ID: <20050327232512.GB15994~@nicemice.net>
Reply-To: IETF idn working group <idn@ops.ietf.org>
References: <DCA85A0719E37431D3C99DC8@scan.jck.com> <20050316221337.GB25580~@nicemice.net> <052C0407EAFFC3AC92D2D7D9@7AD4D3FB4841A5E367CCF211> <33c401c52b07$3004e600$477d3009@sanjose.ibm.com> <20050316221337.GB25580~@nicemice.net> <052C0407EAFFC3AC92D2D7D9@7AD4D3FB4841A5E367CCF211> <20050316221337.GB25580~@nicemice.net> <p06210260be5e6f4e2379@[10.20.30.249]> <20050317063435.GA26106~@nicemice.net> <p0621027abe5f6eaa4ccf@[10.20.30.249]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <34bc01c52b2a$d01efe00$477d3009@sanjose.ibm.com> <DCA85A0719E37431D3C99DC8@scan.jck.com> <33c401c52b07$3004e600$477d3009@sanjose.ibm.com> <p0621027abe5f6eaa4ccf@[10.20.30.249]>
User-Agent: Mutt/1.5.6+20040907i
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,URIBL_SBL autolearn=no version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
I have been persuaded that the recommendations in my proposal were too narrow, and did not leave enough room for implementations to experiment with alternative ways of presenting suspicious labels, like color and decoration. Here's a less restrictive proposal: --begin-- [section 3.1] 3) When a domain label occupying or obtained from a domain name slot is to be shown to a user, it SHOULD NOT simply be shown in whatever form it was found in; before being shown it SHOULD be forced into either ASCII form (which can be obtained by applying ToASCII) or non-ACE form (which can be obtained by applying ToUnicode, see section 4). Implementors are encouraged to develop policies that balance the conflicting goals of not showing unintelligible ACE strings and not showing misleading Unicode strings. See appendix A for suggestions. When the user has explicitly requested to see one form or the other, that form SHOULD be shown. When requirements 2 and 3 both apply, requirement 2 takes precedence. [appendix A] This appendix offers suggestions only, not recommendations or requirements. For labels that are ACEs or have ACE forms, there are various factors that an application can consider when deciding how to display the label to a user. The ACE form is unsuitable for presentation to a user because it is unintelligible, unrecognizable, not very useful, and quite unfriendly. Its one redeeming feature is that it is ASCII-only, and therefore has the best chance of being displayable and copy-and-pastable, and the least chance of containing misleading characters. The non-ACE form is intelligible to the user (and therefore much friendlier and more useful), if it is displayable. But in an environment that cannot handle the characters, the non-ACE form could turn out to be even less useful than the ACE form. The non-ACE form can be misleading, by containing characters that look like delimiters (for example, U+2044 looks like a slash), or that look like characters in other scripts (for example, many Latin, Greek, and Cyrillic letters look alike). The misleading-label problem is the most complex to deal with. A general approach is to identify suspicious characters, and then use some means to avoid displaying the suspicious characters, or to display them safely. Below are a few (but certainly not all) possible methods. Implementations are free to experiment and innovate. Identifying suspicious characters: Some characters could be considered suspicious in all contexts; for example, all characters outside Unicode categories L (letter), N (number), and M (mark), except U+002D hyphen-minus. Some characters could be considered suspicious in labels that are children of (or descendents of) certain domains. For example, in a domain whose registration rules are believed to avoid confusion only for certain scripts, characters outside those scripts could be considered suspicious. In a domain believed to have no restrictions on registered names, all non-ASCII characters could be considered suspicious. Some characters could be considered suspicious depending on what other characters appear in the same label. For example, in a label containing Cyrillic, Greek, and Latin characters, one of those scripts could be chosen as the main script (possibly the one that appears first, or appears most often), and the others could be considered suspicious. Avoiding displaying suspicious characters: Showing the ACE form avoids displaying all non-ASCII characters, but see above for the disadvantages of the ACE form. Showing a replacement character for the suspicious characters, while displaying non-suspicious characters normally, will have a friendlier appearance than the ACE form, but is likely to break copy-and-paste. In some contexts, an escape mechanism is available that can be used to obscure characters. For example, in an International Resource Identifier (IRI), any character can be represented by percent-encoded UTF-8. This will interject some ugliness, but is still likely to have a friendlier appearance than the ACE form, and will not break copy-and-paste. Displaying suspicious characters safely: Color, highlighting, underlining, etc. could be used to flag suspicious characters. One concern with this approach is whether the user will understand the significance of such markings. --end-- The rest of this message contains a few responses to some criticisms of my previous proposal. The responses are probably moot, since that proposal has now been replaced, but anyway... Paul Hoffman <phoffman@imc.org> wrote: > > As long as they don't see "paypal" or whatever, they're at least not > > being misled. > > The example given at the start of your message was the > homograph-slash. A spoof domain name that has that character and > "paypal" will *still* display "paypal" in the Punycode. True. My explanation was a bit sloppy. Let me try again: As long as the user doesn't see the misleading characters, the user is not being misled. If the misleading character is a cyrillic "a" in "paypal", then it's enough to avoid displaying the cyrillic "a". If the misleading character is a slash-homograph immediately following "paypal.com", it's enough to avoid displaying the slash-homograph. > As you remember from a few years ago, cut-and-paste often doesn't work > reliably with non-ASCII characters even under good conditions. This > seems like a red herring. For me, the inability to copy and paste URIs into and out of a browser's location field would be a major inconvenience, probably enough to make me abandon the browser in favor of another. Once IRIs become widespread, I expect I'll feel the same way about them. Mark Davis <mark.davis@jtcsv.com> wrote: > Use of the raw punycode in place of the represented characters will > cause more user confusion, not less. > User sees: > > xn--tlralit-byabbe.fr versus xn--tlralit-byabb390f.fr > > Presented with a collection of apparently random letters, eyes quickly > glaze over, and people really can't distinguish between two names > in any sensible fashion. Users are not going to memorize which > gobbledygook is the one they want. I agree, but that's beside the point. I don't expect anyone to recognize an ACE, or distinguish between one ACE and another. I expect people to consider all ACEs as unrecognized domains. I expect registrants who want recognizable domains to pick domain names that won't display as ACE to the target audience. AMC
- Re: [idn] IDNA section 3.1 requirement 3 John C Klensin
- Re: [idn] IDNA section 3.1 requirement 3 Paul Hoffman
- Re: [idn] IDNA section 3.1 requirement 3 Adam M. Costello
- Re: [idn] IDNA section 3.1 requirement 3 Mark Davis
- [idn] IDNA section 3.1 requirement 3 Adam M. Costello
- Re: [idn] IDNA section 3.1 requirement 3 John C Klensin
- Re: [idn] IDNA section 3.1 requirement 3 James Seng
- Re: [idn] IDNA section 3.1 requirement 3 Gervase Markham
- Re: [idn] IDNA section 3.1 requirement 3 Mark Davis
- Re: [idn] IDNA section 3.1 requirement 3 Adam M. Costello
- Re: [idn] IDNA section 3.1 requirement 3 Paul Hoffman