Re: [I18ndir] Guidance on Return of A-Labels in a URL?

Patrik Fältström <patrik@frobbit.se> Fri, 11 September 2020 04:29 UTC

Return-Path: <patrik@frobbit.se>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA88C3A13D9 for <i18ndir@ietfa.amsl.com>; Thu, 10 Sep 2020 21:29:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=frobbit.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tJA_d40OUqh2 for <i18ndir@ietfa.amsl.com>; Thu, 10 Sep 2020 21:29:57 -0700 (PDT)
Received: from mail.frobbit.se (mail.frobbit.se [85.30.129.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BA2D03A0BEE for <i18ndir@ietf.org>; Thu, 10 Sep 2020 21:29:57 -0700 (PDT)
Received: from [192.165.72.128] (unknown [IPv6:2a02:80:3ffc:0:a5c7:8003:cdb7:13cf]) by mail.frobbit.se (Postfix) with ESMTPSA id E3E3C2713D; Fri, 11 Sep 2020 06:29:54 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=frobbit.se; s=mail; t=1599798594; bh=ZPPbBLVe6Q6UJ4HDHToK8GoVhtvtCx3Z0iXk7qWioWU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R9T8sFF0sHDV2HHABKoebazv+geh0W3v1IoNH5YC0SK/SNcsUDoy3pDQ5s/1i1gXY 6zipZjlZRlUSo4bY4ULsIXamYI6S894OUKF40ReBP4kH/j215x5ZeFfD16t7sOnM3o 04/EIWBMv8B99Bm8Fp1PlpeBnhbpMY2EEKiagtlk=
From: Patrik Fältström <patrik@frobbit.se>
To: John C Klensin <john-ietf@jck.com>
Cc: "Hollenbeck, Scott" <shollenbeck@verisign.com>, i18ndir@ietf.org
Date: Fri, 11 Sep 2020 06:29:50 +0200
X-Mailer: MailMate (1.13.2r5673)
Message-ID: <B6ED8D5D-39D3-4D12-AF2F-BD2C18B70393@frobbit.se>
In-Reply-To: <4D25DED743AC92D995786A92@PSB>
References: <326c954da33646f79a4e3bc4f27b7cb7@verisign.com> <0864F4FF-A615-451E-8828-433F3098A599@frobbit.se> <4D25DED743AC92D995786A92@PSB>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_72596EE2-B22E-44C4-BADA-34A0DC9C3EF5_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/s22FNP2WOppJ05enkGEL1JxwpD4>
Subject: Re: [I18ndir] Guidance on Return of A-Labels in a URL?
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2020 04:30:00 -0000

Scott: TL;DR of all of our messages are: A-Label and U-Labels are 1:1 mappings to each other. The question is what you want to happen (and who is doing validation) of:

- Strings starting with xn-- that are not A-Labels
- Unicode strings that are not U-Labels
- URLs that do not follow RFC 3986 or 3987

In this mix you absolutely must include (for natural reasons) people that do believe IETF, W3C are nut heads, including for example the ccTLDs registries that do provide domain names not IDNA-2008 conformant. Those following TR#46 and not IETF etc.

Basically, as with all protocol design, the question is not what you do with the ones that do follow the specification, but the ones that do not.

   Patrik

On 11 Sep 2020, at 1:19, John C Klensin wrote:

> Scott,
>
> Let me add a slightly different perspective (and a warning) to what Patrik and John Levine have said without in any way
> disagreeing with them.  It will also give you a somewhat deeper analysis and set of references to refer to if needed.
>
> As long as IDNA2008 is strictly followed, then it really makes no difference whether U-labels and A-labels are returned and the choice should be made according to what works best for the
> application protocol.  If the URLs you are referring to are
> strictly conformant to RFC 3986, there there is a strong case for using A-labels because the alternative is %-encoding,
> presumably of UTF-8 [1].  That form is hard to read, less
> optimized for DNS labels, much less compact, etc., than the
> A-label form.
>
> _However_ note "strictly" above.  There are communities out
> there that consider 3986 (and 3987) obsolete and are developing their own URL specs.  There are overlapping ones that use or assume profiles of Unicode UTR#46 which involve some specific mappings and, for all intents and purposes, rely on IDNA2003.
> For IDNA2003 and those profiles of UTR#46, one cannot make an identity comparison between native character Unicode strings (UTF-8 or otherwise) without IDNA-specific processing.
> Whatever the arguments for and against reliance on UTR#46 rather than strict conformance to IDNA2008, that is a very strong
> argument for RDAP (and other protocols involving registration data) work in terms of  A-labels rather than anything else.
>
> FWIW and AFAIK, while, as Pstrik indicates, IETF has not taken a position on this issue for RDAP and similar protocols, ICANN has.  Long ago, when the Board created two committees chaired by Katoh-san, there were conclusions that, regardless of what
> interfaces registrars chose to present to users and what they accepted as input, all registration databases and access
> protocols [2] should work strictly in terms of already mapped and processed Punycode-encoded form and that, if native
> characters were presented to users from those systems, they
> should be the result of
>    ToUnicode(ToASCII(user-supplied-string)
> and not the user supplied string.   Those conclusions and the reasoning that went into them strongly influenced RFC 4690 and hence IDNA2008 but, more important from your standpoint, I don't believe ICANN has ever deprecated or formally abandoned them.
>
>      john
>
>
>
> [1] See the rather flexible text in Section 2.5 of 3986.
>
> [2] While WHOIS was the only game in town then, the Whois++
> specs had been published a half-dozen years earlier and the
> debates about databases and database access permissions and
> tools were well underway.
>
>
>
> --On Thursday, September 10, 2020 22:30 +0200 Patrik Fältström
> <patrik=40frobbit.se@dmarc.ietf.org> wrote:
>
>> Now some time...
>>
>> Regarding the use of U-Label and A-Label nothing specific is
>> said about RDAP.
>>
>> What you have is RFC 5890 2.3.2.6. Domain Name Slot and some
>> previous sections that talks about the equivalence between
>> U-Label and A-Label which is a 1:1 mapping (compared to
>> earlier versions of IDNA standard).
>>
>> Note section 2.3.2.1.  IDNA-valid strings, A-label, and
>> U-label in the same RFC which says:
>>
>>> A "U-label" is an IDNA-valid string of Unicode characters, in
>>> Normalization Form C (NFC)...
>>
>> My view is that it is up to the protocol that pass around
>> domain names in a Domain Name Slot how to handle the
>> situation, and because of that ensure that if U-Label is in
>> use that the requirements on that protocol element matches the
>> definition of a U-Label, and not just "random Unicode code
>> points in some random encoding and unknown normalisation".
>>
>> I can see situations where you in RDAP do want to be able to
>> send random Unicode Code Points just with the intention to do
>> some search on the server, but that is then NOT a U-Label. The
>> same way I do see interest in sending A-Labels to be sure the
>> string is stable and as it was supposed to be during the whole
>> transaction.
>>
>> Does this help?
>>
>>    Patrik
>>
>> On 10 Sep 2020, at 14:41, Hollenbeck, Scott wrote:
>>
>>> Does anyone know of any text in any of the IDN RFCs that
>>> would support a recommendation to return A-Labels in RDAP
>>> response URLs?
>>>
>>> Marc Blanchet wrote an I-D
>>> (https://www.ietf.org/archive/id/draft-blanchet-regext-rdap-d
>>> eployfindings-05.txt) a while ago in which he described RDAP
>>> deployment findings. In Section 3.6, he suggests that "All
>>> links of any "rel" types should always be returned in the
>>> A-Label form for IDNs in the href or value members,
>>> independent of if the query was a U-Label or A-Label or a
>>> mix". That seems like a good idea since a server doesn't know
>>> what a client is capable of consuming, but I was hoping to
>>> support this recommendation with a reference to something in
>>> IDNA. I didn't see anything obvious.
>>>
>>> I sent this same basic question to both Marc and Patrik
>>> individually. I haven't heard back from either of them, so I
>>> apologize if I'm "jumping the gun" by asking he directorate
>>> without waiting to see if they respond.
>>>
>>> Scott
>>>
>>> -- 
>>> I18ndir mailing list
>>> I18ndir@ietf.org
>>> https://www.ietf.org/mailman/listinfo/i18ndir