Re: [apps-discuss] i18n intro, Sunday 14:00-16:00

Joe Hildebrand <> Thu, 21 July 2011 16:29 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D7E9421F86BE; Thu, 21 Jul 2011 09:29:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -104.082
X-Spam-Status: No, score=-104.082 tagged_above=-999 required=5 tests=[AWL=-0.450, BAYES_00=-2.599, J_CHICKENPOX_31=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4, RCVD_NUMERIC_HELO=2.067, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id iPTjcFG3MUSS; Thu, 21 Jul 2011 09:29:02 -0700 (PDT)
Received: from ( []) by (Postfix) with SMTP id 68DA721F858C; Thu, 21 Jul 2011 09:29:02 -0700 (PDT)
Received: from SRV-EXSC03.webex.local ([]) by with Microsoft SMTPSVC(6.0.3790.4675); Thu, 21 Jul 2011 09:29:01 -0700
Received: from ([]) by SRV-EXSC03.webex.local ([]) with Microsoft Exchange Server HTTP-DAV ; Thu, 21 Jul 2011 16:29:01 +0000
User-Agent: Microsoft-Entourage/
Date: Thu, 21 Jul 2011 10:28:59 -0600
From: Joe Hildebrand <>
To: "Martin J. =?ISO-8859-1?B?RPxyc3Q=?=" <>, Peter Saint-Andre <>
Message-ID: <>
Thread-Topic: [apps-discuss] i18n intro, Sunday 14:00-16:00
Thread-Index: AcxHw0sJitjZtEfii0GYJg2U1Dy4Bw==
In-Reply-To: <>
Mime-version: 1.0
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable
X-OriginalArrivalTime: 21 Jul 2011 16:29:01.0589 (UTC) FILETIME=[4C946450:01CC47C3]
Subject: Re: [apps-discuss] i18n intro, Sunday 14:00-16:00
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 21 Jul 2011 16:29:03 -0000

On 7/21/11 1:03 AM, "Martin J. Dürst" <> wrote:

> Slide 123: Good to see that. By the way, I seem to remember both John 
and me
> begging you for an explanation of why Jabber wants to use NFD a 
few months
> ago, and I'm not sure I have seen an answer. Now might be a 
good time (if you
> already sent one, a pointer would be appreciated).

Let me try.  First some assumptions:
- Stringprep is currently one of the performance hotspots of some XMPP
- XMPP does not guarantee that the original form of the address that is
entered by the user or sent on the first hop is transmitted without
modification to other hops in the system.
- As such, many XMPP servers optimize by performing canonicalization at the
edges of their system and even store the canonical version for future
- If the spec is written that clients SHOULD perform canonicalization, many
in our community will, particularly if they know that they will get better
performance from the server.

The property of NFK?D that we like is that if you have a string of
codepoints that is already in NFK?D, you can check that the string is in the
correct normalization form without having to allocate memory.  With NFK?C,
you'll have to decompose (allocating memory), recompose (at some finite CPU
cost), then recompose (possibly allocating *again*) just to check if you
have already done the normalization.

For the K portion, I have found John's argument compelling that codepoints
with compatibility decompositions should just be prohibited in our
localparts.  In our resourceparts, I'm of the opinion that we don't need to
compatibility map -- it's fine for all of those codepoints to stay distinct.

The idea is that clients SHOULD normalize, servers double-check inputs from
non-trusted sources (like clients and other servers), then always store and
forward the normalized version.

Joe Hildebrand