Re: looking up XN-labels with unassigned characters
Erik van der Poel <erikv@google.com> Mon, 23 March 2009 18:15 UTC
Return-Path: <erikv@google.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 57ECE39E2B5 for <idna-update@alvestrand.no>; Mon, 23 Mar 2009 19:15:38 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qEbQoLV+SNSK for <idna-update@alvestrand.no>; Mon, 23 Mar 2009 19:15:34 +0100 (CET)
X-Greylist: domain auto-whitelisted by SQLgrey-1.6.8
Received: from smtp-out.google.com (smtp-out.google.com [216.239.45.13]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 11FF439E1CB for <idna-update@alvestrand.no>; Mon, 23 Mar 2009 19:15:32 +0100 (CET)
Received: from spaceape12.eur.corp.google.com (spaceape12.eur.corp.google.com [172.28.16.146]) by smtp-out.google.com with ESMTP id n2NIFTIj018133 for <idna-update@alvestrand.no>; Mon, 23 Mar 2009 11:15:30 -0700
Received: from wa-out-1112.google.com (wafj37.prod.google.com [10.114.186.37]) by spaceape12.eur.corp.google.com with ESMTP id n2NIExHa016199 for <idna-update@alvestrand.no>; Mon, 23 Mar 2009 11:15:27 -0700
Received: by wa-out-1112.google.com with SMTP id j37so1325361waf.9 for <idna-update@alvestrand.no>; Mon, 23 Mar 2009 11:15:27 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.184.11 with SMTP id h11mr4968381waf.100.1237832126144; Mon, 23 Mar 2009 11:15:26 -0700 (PDT)
In-Reply-To: <30b660a20903231043m36451991x26f199d6101951e4@mail.gmail.com>
References: <c07a32650903231009i303ee8bar4fc11d6375e43443@mail.gmail.com> <30b660a20903231043m36451991x26f199d6101951e4@mail.gmail.com>
Date: Mon, 23 Mar 2009 11:15:26 -0700
Message-ID: <c07a32650903231115y3bf780eaoa54caff2011ce9e5@mail.gmail.com>
Subject: Re: looking up XN-labels with unassigned characters
From: Erik van der Poel <erikv@google.com>
To: Mark Davis <mark@macchiato.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-System-Of-Record: true
Cc: James Seng <james@seng.sg>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Mon, 23 Mar 2009 18:15:38 -0000
Ah, OK, I see your point of view now. I was thinking along the lines of using A-labels in outgoing HTML because MSIE6 does not support U-labels. To me, it's somewhat analogous to 7-bit email that could "pass through" any intermediary -- i.e. XN-labels should "pass through" any obstacle, unimpeded. Display is a separate issue. Erik On Mon, Mar 23, 2009 at 10:43 AM, Mark Davis <mark@macchiato.com> wrote: > I agree that it would be nice, and it would be equally nice to lookup any > Unicode-Label*. That is, if there is any reason to be suspicious of a > Unicode label, then we should be equally suspicious of the corresponding > XN-Label. If XN-Labels are presumed to be safe, then we have no reason to > mistrust the corresponding Unicode label. > > It may very well be that there there is a particular set of cooperating > systems, with a gatekeeper, and that anything in XN-Label form has been > validated as being an A-Label. And in that case, it clearly doesn't make any > sense to validate over and over again. But you'd better validate on input to > that set of systems, otherwise you have no guarantee that it is in fact an > A-Label. > > Similarly, you could have a particular set of cooperating systems, with a > gatekeeper, and that anything in Unicode-Label form has been validated as > being a U-Label. And in that case, it clearly doesn't make any sense to > validate over and over again. But you'd better validate on input to that set > of systems, otherwise you have no guarantee that it is in fact an U-Label. > > Mark > > > On Mon, Mar 23, 2009 at 10:09, Erik van der Poel <erikv@google.com> wrote: >> >> Hi Mark, >> >> I was referring to your comment: >> >> "if it is important to check those requirements, then it is important >> to test both A and U Labels; if it is not important to test them, then >> it should not be a requirement for either one." >> >> All I'm saying is that it would be nice if a client could lookup any >> label that starts with "xn--". This would allow e.g. Google Search to >> convert legal U-labels to legal A-labels, and it would allow the >> browser to follow the links, even if the browser implemented an old >> version of IDNA. >> >> It would then be up to the browser to display the underlying Unicode >> string in a safe way. Note that browsers are already expected to apply >> certain restrictions when displaying the domain name. >> >> I do agree with your point that, now that we are at Unicode 5.1, >> future additions to Unicode are not quite as impactful to the DNS. >> We're in the "long tail". >> >> Erik >> >> On Mon, Mar 23, 2009 at 9:38 AM, Mark Davis <mark@macchiato.com> wrote: >> > First off, I have not been pushing for allowing UNASSIGNED on lookup in >> > IDNA2008. This is for two reasons: >> > >> > We have had many Unicode versions since 3.2, so the urgency is not as >> > prominent >> > Because IDNA2008 updates more regularly, there is less need. >> > >> > What I have been saying is that allowing UNASSIGNED on lookup wouldn't >> > make >> > a difference, and that's the case even if a character maps to ".". >> > >> > Let's take a specific example: àbcdèf.com, where the middle character, >> > \u0378, is currently unassigned as far as the client is concerned >> > (because >> > it is back-reved), while the registry is on Unicode 6.0. The XN form is >> > xn--bcdf-zna5c481a.com. >> > >> > Here's what happens when the client software (browser, emailer, etc) >> > looks >> > the domain name up, depending on what \u0378 turns into under 6.0. >> > >> > \u0378 becomes DISALLOWED. No problem. No conformant registry can >> > support >> > it, even on Unicode 6.0; the lookup is denied. >> > \u0378 becomes PVALID. No problem - the lookup works. >> > \u0378 becomes mapped to X (assuming we allow mapping on lookup) >> > >> > X is DISALLOWED, say "$". No problem. No conformant registry can >> > support >> > it, even on Unicode 6.0; the lookup is denied. >> > X is PVALID, say "X". The lookup fails. The remapped domain name would >> > work >> > as xn--bcxdf-qqa4d.com, but the original URL would not work until the >> > client >> > is updated, or unless the user learns to type X instead until s/he >> > updates >> > his/er client. >> > X is ".". The lookup fails. The remapped domain name would work as >> > xn--bc-iia.xn--df-7ia.com, but the original URL would not work until the >> > client is updated, or unless the user learns to type X instead until >> > s/he >> > updates his/er client. >> > >> > Whether the character maps to a dot or not in Unicode 6.0 doesn't make >> > any >> > difference in the scenario. It just fails the lookup in a different way >> > (3.3 >> > instead of 3.2), but the lookup fails in either case. >> > >> > Mark >> > >> > On Sun, Mar 22, 2009 at 17:00, Erik van der Poel <erikv@google.com> >> > wrote: >> >> >> >> Hi again James, thank you for the email. I am quite aware of the dot >> >> issues in IDNA. I have first-hand experience with Japanese input >> >> methods and their modes, and I understand the motivation for the >> >> addition of non-ASCII dot processing in IDNA2003. >> >> >> >> The issue with U+2CFE COPTIC FULL STOP is a bit subtle, so let me >> >> explain. U+2CFE was added in Unicode 4.1. This means that, from the >> >> point of view of an IDNA2003 implementation, it is simply an >> >> unassigned character. Let's say we have a domain name like: >> >> >> >> aaa <U+2CFE> bbb . com >> >> >> >> Suppose that aaa and bbb are Coptic characters, and the typist >> >> happened to have a Coptic input method (though I have no idea whether >> >> such things exist!). Further, let's suppose that the client is using >> >> IDNA2003 with the flag "allow unassigned" set to true. If aaa and bbb >> >> are already lower-case, the client will do the right thing with them >> >> (leaving them as is). However, the client will not know that U+2CFE is >> >> a new dot-like character, so it will treat the entire sequence >> >> "aaa<U+2CFE>bbb" as a single label. It will then encode it in Punycode >> >> (including the dot-like character), and try to resolve that in DNS. >> >> >> >> Of course, this will not work because the intention was to resolve >> >> aaa.bbb.com, not aaa<U+2CFE>bbb.com. In other words, a new client and >> >> an old client would resolve this name differently. >> >> >> >> I don't know how many IDNA2003 clients actually set the "allow >> >> unassigned" flag to true. It is obviously very dangerous, since the >> >> client cannot possibly know how to case-fold the new characters, >> >> including Coptic. >> >> >> >> (And this is also why Mark is wrong when he says that if clients are >> >> allowed to lookup XN-labels with unassigned characters, then they >> >> should also be allowed to lookup Unicode labels with unassigned >> >> characters.) >> >> >> >> Erik >> >> >> >> On Sun, Mar 22, 2009 at 2:33 PM, James Seng <james@seng.sg> wrote: >> >> > I think you misunderstood about the "dot" problem. It is not these >> >> > "dots" are allowed as domain name but they are identified as >> >> > "separator" like "." >> >> > >> >> > The main reason is to because when a user switch to CJK inputs, when >> >> > he press ".", most IME will spur out U+3002 instead. If you do not >> >> > identify U+3002 as a separator, then a user will have to enter CJK >> >> > IME, switch back to English, enter a ".", switch back to CJK IME etc. >> >> > >> >> > See http://tools.ietf.org/html/draft-jet-idnabis-cjk-localmapping-00 >> >> > >> >> > -James Seng >> >> > >> >> > On Mon, Mar 23, 2009 at 1:51 AM, Erik van der Poel <erikv@google.com> >> >> > wrote: >> >> >> Another question from the summary: >> >> >> >> >> >>> A. Multiple characters are allowed as "dots" in domain names under >> >> >>> IDNA2003 and presumably under IDNAV2. This is a general problem for >> >> >>> all versions of IDNA but may be exacerbated by the variants for >> >> >>> "dots" >> >> >>> that are permitted under IDNA2003 and IDNAv2. What is the WG view? >> >> >> >> >> >> In my view, non-ASCII dots should never have been allowed in >> >> >> IDNA2003. >> >> >> However, now that many IDNA2003 implementations have been >> >> >> distributed >> >> >> to users and a few stored domain names use these non-ASCII dots, >> >> >> some >> >> >> may feel that we have to support them (forever). >> >> >> >> >> >> Having said that, I am quite concerned about adding yet another >> >> >> non-ASCII dot in IDNAv2 (U+2CFE COPTIC FULL STOP) because IDNA2003 >> >> >> includes a flag that allows for the lookup of unassigned (in Unicode >> >> >> 3.2) characters. Such applications would not only fail to case-fold >> >> >> post-Unicode-3.2 characters correctly, they would fail to divide the >> >> >> full domain name into individual labels, and since DNS labels are >> >> >> "owned" by different owners, this just seems like an invitation to >> >> >> further problems. >> >> >> >> >> >> In my view, the dot is a keyboard and UI issue. Of course, it would >> >> >> be >> >> >> nice if we could push ALL mappings out to the keyboard and UI, but, >> >> >> to >> >> >> use one of John's favorite words, this may be "unrealistic". ;-) >> >> >> >> >> >> Erik >> >> >> _______________________________________________ >> >> >> Idna-update mailing list >> >> >> Idna-update@alvestrand.no >> >> >> http://www.alvestrand.no/mailman/listinfo/idna-update >> >> >> >> >> > >> >> _______________________________________________ >> >> Idna-update mailing list >> >> Idna-update@alvestrand.no >> >> http://www.alvestrand.no/mailman/listinfo/idna-update >> > >> > > >
- looking up XN-labels with unassigned characters Erik van der Poel
- Re: looking up XN-labels with unassigned characte… Mark Davis
- Re: looking up XN-labels with unassigned characte… Erik van der Poel