Re: [I18ndir] Guidance on Return of A-Labels in a URL?

John C Klensin <john-ietf@jck.com> Thu, 10 September 2020 23:19 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8F4E43A119A; Thu, 10 Sep 2020 16:19:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RoER5genWP-a; Thu, 10 Sep 2020 16:19:48 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F4233A0F52; Thu, 10 Sep 2020 16:19:45 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1kGVqy-000HDW-4h; Thu, 10 Sep 2020 19:19:44 -0400
Date: Thu, 10 Sep 2020 19:19:37 -0400
From: John C Klensin <john-ietf@jck.com>
To: Patrik Fältström <patrik=40frobbit.se@dmarc.ietf.org>, "Hollenbeck, Scott" <shollenbeck=40verisign.com@dmarc.ietf.org>
cc: i18ndir@ietf.org
Message-ID: <4D25DED743AC92D995786A92@PSB>
In-Reply-To: <0864F4FF-A615-451E-8828-433F3098A599@frobbit.se>
References: <326c954da33646f79a4e3bc4f27b7cb7@verisign.com> <0864F4FF-A615-451E-8828-433F3098A599@frobbit.se>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/YSNZIAg6uiT2ISnEA8m08feawBk>
Subject: Re: [I18ndir] Guidance on Return of A-Labels in a URL?
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2020 23:19:51 -0000

Scott,

Let me add a slightly different perspective (and a warning) to
what Patrik and John Levine have said without in any way
disagreeing with them.  It will also give you a somewhat deeper
analysis and set of references to refer to if needed.

As long as IDNA2008 is strictly followed, then it really makes
no difference whether U-labels and A-labels are returned and the
choice should be made according to what works best for the
application protocol.  If the URLs you are referring to are
strictly conformant to RFC 3986, there there is a strong case
for using A-labels because the alternative is %-encoding,
presumably of UTF-8 [1].  That form is hard to read, less
optimized for DNS labels, much less compact, etc., than the
A-label form.

_However_ note "strictly" above.  There are communities out
there that consider 3986 (and 3987) obsolete and are developing
their own URL specs.  There are overlapping ones that use or
assume profiles of Unicode UTR#46 which involve some specific
mappings and, for all intents and purposes, rely on IDNA2003.
For IDNA2003 and those profiles of UTR#46, one cannot make an
identity comparison between native character Unicode strings
(UTF-8 or otherwise) without IDNA-specific processing.
Whatever the arguments for and against reliance on UTR#46 rather
than strict conformance to IDNA2008, that is a very strong
argument for RDAP (and other protocols involving registration
data) work in terms of  A-labels rather than anything else.

FWIW and AFAIK, while, as Pstrik indicates, IETF has not taken a
position on this issue for RDAP and similar protocols, ICANN
has.  Long ago, when the Board created two committees chaired by
Katoh-san, there were conclusions that, regardless of what
interfaces registrars chose to present to users and what they
accepted as input, all registration databases and access
protocols [2] should work strictly in terms of already mapped
and processed Punycode-encoded form and that, if native
characters were presented to users from those systems, they
should be the result of 
   ToUnicode(ToASCII(user-supplied-string) 
and not the user supplied string.   Those conclusions and the
reasoning that went into them strongly influenced RFC 4690 and
hence IDNA2008 but, more important from your standpoint, I don't
believe ICANN has ever deprecated or formally abandoned them.

     john



[1] See the rather flexible text in Section 2.5 of 3986.

[2] While WHOIS was the only game in town then, the Whois++
specs had been published a half-dozen years earlier and the
debates about databases and database access permissions and
tools were well underway.



--On Thursday, September 10, 2020 22:30 +0200 Patrik Fältström
<patrik=40frobbit.se@dmarc.ietf.org> wrote:

> Now some time...
> 
> Regarding the use of U-Label and A-Label nothing specific is
> said about RDAP.
> 
> What you have is RFC 5890 2.3.2.6. Domain Name Slot and some
> previous sections that talks about the equivalence between
> U-Label and A-Label which is a 1:1 mapping (compared to
> earlier versions of IDNA standard).
> 
> Note section 2.3.2.1.  IDNA-valid strings, A-label, and
> U-label in the same RFC which says:
> 
>> A "U-label" is an IDNA-valid string of Unicode characters, in
>> Normalization Form C (NFC)...
> 
> My view is that it is up to the protocol that pass around
> domain names in a Domain Name Slot how to handle the
> situation, and because of that ensure that if U-Label is in
> use that the requirements on that protocol element matches the
> definition of a U-Label, and not just "random Unicode code
> points in some random encoding and unknown normalisation".
> 
> I can see situations where you in RDAP do want to be able to
> send random Unicode Code Points just with the intention to do
> some search on the server, but that is then NOT a U-Label. The
> same way I do see interest in sending A-Labels to be sure the
> string is stable and as it was supposed to be during the whole
> transaction.
> 
> Does this help?
> 
>    Patrik
> 
> On 10 Sep 2020, at 14:41, Hollenbeck, Scott wrote:
> 
>> Does anyone know of any text in any of the IDN RFCs that
>> would support a recommendation to return A-Labels in RDAP
>> response URLs?
>> 
>> Marc Blanchet wrote an I-D
>> (https://www.ietf.org/archive/id/draft-blanchet-regext-rdap-d
>> eployfindings-05.txt) a while ago in which he described RDAP
>> deployment findings. In Section 3.6, he suggests that "All
>> links of any "rel" types should always be returned in the
>> A-Label form for IDNs in the href or value members,
>> independent of if the query was a U-Label or A-Label or a
>> mix". That seems like a good idea since a server doesn't know
>> what a client is capable of consuming, but I was hoping to
>> support this recommendation with a reference to something in
>> IDNA. I didn't see anything obvious.
>> 
>> I sent this same basic question to both Marc and Patrik
>> individually. I haven't heard back from either of them, so I
>> apologize if I'm "jumping the gun" by asking he directorate
>> without waiting to see if they respond.
>> 
>> Scott
>> 
>> -- 
>> I18ndir mailing list
>> I18ndir@ietf.org
>> https://www.ietf.org/mailman/listinfo/i18ndir