Re: [xmpp] RFC 3920bis and IDNA2008

Peter Saint-Andre <stpeter@stpeter.im> Tue, 15 December 2009 17:18 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: xmpp@core3.amsl.com
Delivered-To: xmpp@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 027733A6A9E for <xmpp@core3.amsl.com>; Tue, 15 Dec 2009 09:18:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.54
X-Spam-Level:
X-Spam-Status: No, score=-3.54 tagged_above=-999 required=5 tests=[AWL=1.059, BAYES_00=-2.599, GB_I_LETTER=-2]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WVJlRovACjgA for <xmpp@core3.amsl.com>; Tue, 15 Dec 2009 09:18:10 -0800 (PST)
Received: from stpeter.im (stpeter.im [207.210.219.233]) by core3.amsl.com (Postfix) with ESMTP id 659E83A6ACE for <xmpp@ietf.org>; Tue, 15 Dec 2009 09:18:10 -0800 (PST)
Received: from dhcp-64-101-72-234.cisco.com (dhcp-64-101-72-234.cisco.com [64.101.72.234]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 30BC640C29; Tue, 15 Dec 2009 10:17:56 -0700 (MST)
Message-ID: <4B27C4C2.8000407@stpeter.im>
Date: Tue, 15 Dec 2009 10:17:54 -0700
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: Bernard Aboba <bernard.aboba@gmail.com>
References: <223709240912150842v1b4429b2wf11928797e071838@mail.gmail.com>
In-Reply-To: <223709240912150842v1b4429b2wf11928797e071838@mail.gmail.com>
X-Enigmail-Version: 0.96.0
OpenPGP: url=http://www.saint-andre.com/me/stpeter.asc
Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg="sha1"; boundary="------------ms020607010301090308000801"
Cc: xmpp@ietf.org
Subject: Re: [xmpp] RFC 3920bis and IDNA2008
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xmpp>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Dec 2009 17:18:12 -0000

On 12/15/09 9:42 AM, Bernard Aboba wrote:
> In looking at 3920bis there would appear to be some text that may be
> worth updating for IDNA2008.

Many thanks for this post (and your previous one about e2e encryption,
to which I still need to reply).

The IDNA2008 work introduces a number of interesting issues for XMPP.
And those issues are not limited to domain labels, because we also use
stringprep (as did IDNA2003) for localparts and resourceparts of XMPP
addresses. We've been dancing around these issues for quite a while,
hoping that a more general approach to "fixing i18n" would emerge within
the IETF. That has not happened yet, but I have heard rumors that there
will be more discussion about this in Anaheim. I've been meaning to
write a more general report about these issues as they impinge on XMPP
(based on my in-depth reading of the IDNA2008 specs and my own imperfect
knowledge of i18n challenges), but I haven't found the time yet. Perhaps
I'll do that in the next 10 days...

> Section 17: References
> 
> You may wish to remove the [IDNA] reference and substitute references to
> the IDNA2008 document set:
> 
> [IDNA2008-Defs] Klensin, J., "Internationalized Domain Names for
> Applications (IDNA): Definitions and Document Framework", Internet draft
> (work in progress), draft-ietf-idnabis-defs-12.txt, October 2009.
> 
> [IDNA2008-Rationale] Klensin, J., "Internationalized Domain Names for
> Applications (IDNA):  Background, Explanation, and Rationale", Internet
> draft (work in progress), draft-ietf-idnabis-rationale-14.txt, October
> 2009.
> 
> [IDNA2008-Protocol] Klensin, J., "Internationalized Domain Names for
> Applications (IDNA):  Protocol", Internet draft (work in progress),
> draft-ietf-idnabis-protocol-17.txt, October 2009.
> 
> [IDNA2008-Tables] Falstrom, P., "The Unicode Code Points and IDNA",
> Internet draft (work in progress), draft-ietf-idnabis-tables-08.txt,
> November 2009.
> 
> [IDNA2008-Bidi] Alvestrand, H., Karp, C., "Right-to-left scripts for
> IDNA", Internet draft (work in progress),
> draft-ietf-idnabis-bidi-06.txt, September 2009.

If we decide that we need to "upgrade" to IDNA2008, yes. I don't see how
we could *not* do that, but it might have some nasty implications (e.g.,
perhaps existing deployments use both A-labels and U-labels).

> Section 13
> 
> "As specified under Section 3, a server MUST support and enforce [IDNA]
> for domain identifiers..."
> 
> [BA] Suggest that [IDNA] be changed to [IDNA2008-Protocol] here.

Yes, with the foregoing proviso.

> Section 3.2
> 
> A domain identifier MUST be an "internationalized domain name" as
> defined in [IDNA], that is, "a domain name in which every label is an
> internationalized label".
> 
> [BA]  My suggestion is that the [IDNA] reference here be changed to
> [IDNA2008-Defs].  In terms of the definition, I think you may want to be
> more specific, such as using the term "U-label" instead of
> internationalized label in the above sentence.

Yes, the IDNA2008 specs are much clearer in their terminology. Ideally
we'd recommend the use of U-labels only, but we probably have deployment
of A-labels on the XMPP network, which makes things messier.

> Note that [IDNA2008-Defs] Section 2.3.2.3 defines an "internationalized
> domain name" a bit differently than [IDNA] did:
> 
> 2.3.2.3.  Internationalized Domain Name
> 
>    An "internationalized domain name" (IDN) is a domain name that
>    contains at least one A-label or U-label, but that otherwise may
>    contain any mixture of NR-LDH-labels, A-labels, or U-labels.

True, i18n applies only to a given label. Vint Cerf had a good post
about this a few days ago on the IDNA list.

> --------------------------------------------
> 
> When preparing a text label (consisting of a sequence of Unicode code
> points) for representation as an internationalized label in the process
> of constructing an XMPP domain identifier or comparing two XMPP domain
> identifiers, an application MUST ensure that for each text label it is
> possible to apply without failing the ToASCII operation specified in
> [IDNA] with the UseSTD3ASCIIRules flag set (thus forbidding ASCII code
> points other than letters, digits, and hyphens). If the ToASCII
> operation can be applied without failing, then the label is an
> internationalized label.
> 
> An internationalized domain name (and therefore an XMPP domain
> identifier) is constructed from its constituent internationalized labels
> by following the rules specified in [IDNA].
> 
> [BA] Suggest that this be changed to:
> 
> "When preparing a U-label (consisting of a sequence of Unicode code
> points) for representation as an A-label in the process of constructing
> an XMPP domain identifier or comparing two XMPP
> domain identifiers, an application MUST ensure that each U-label can be
> successfully converted to an A-label according to the "Punycode"
> specification [RFC3492], and that other constraints on the validity
> of a U-label are met, as described in [IDNA2008-Defs] Section 2.3.2.1.
> 
> An internationalized domain name (and therefore an XMPP domain
> identifier) is constructed from its constitutent internationalized
> labels by following the rules specified in [IDNA2008-Protocol]."
> 
> "Note: The ToASCII operation includes application of the [NAMEPREP]
> profile of [STRINGPREP] and encoding using the algorithm specified in
> [PUNYCODE]; for details, see [IDNA]. Although the output of the ToASCII
> operation is not used in XMPP, it MUST be possible to apply that
> operation without failing. "
> 
> [BA] As described in  draft-ietf-idnabis-rationale Section 7.3, IDNA2008
> is not dependent on Nameprep.  Therefore it is not clear to me that the
> Note is still necessary.

I think the note is no longer needed, and that your suggested text is
consistent with IDNA2008 (though I'll need to double-check it against my
grasp of IDNA2008, which always feels rather tenuous).

One point of clarification: I think we need to be especially careful
about statements such as "a sequence of Unicode code points" because (as
was pointed out at the IRI BoF in Hiroshima) Unicode code points are
just identifiers for characters, for example:

U+13AB U+13AA U+13F4 U+13F4 U+13AC U+13D2

Those aren't encoded at all -- I could write a sequence of Unicode code
points in pen or pencil or whatever. So we need to specify more
carefully that for XMPP the sequence of code points is encoded as UTF-8.

> 7.3.  Character Mapping
> 
>    As discussed at length in Section 6, IDNA2003, via Nameprep (see
>    Section 7.5), mapped many characters into related ones.  Those
>    mappings no longer exist as requirements in IDNA2008.  These
>    specifications strongly prefer that only A-labels or U-labels be used
>    in protocol contexts and as much as practical more generally.
>    IDNA2008 does anticipate situations in which some mapping at the time
>    of user input into lookup applications is appropriate and desirable.
>    The issues are discussed in Section 6 and specific recommendations
>    are made in [IDNA2008-Mapping].
> 
> ------------------------------------------------------------------------

I think we would prefer U-labels above A-levels (XMPP is pure UTF-8, so
why use A-label representations at all?). And this text feels a bit
vague to me, but I don't have clearer text to offer at the moment.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/