Re: [apps-discuss] i18n intro, Sunday 14:00-16:00

Joe Hildebrand <joe.hildebrand@webex.com> Thu, 21 July 2011 16:29 UTC

Return-Path: <Joe.Hildebrand@webex.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D7E9421F86BE; Thu, 21 Jul 2011 09:29:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.082
X-Spam-Level:
X-Spam-Status: No, score=-104.082 tagged_above=-999 required=5 tests=[AWL=-0.450, BAYES_00=-2.599, J_CHICKENPOX_31=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4, RCVD_NUMERIC_HELO=2.067, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iPTjcFG3MUSS; Thu, 21 Jul 2011 09:29:02 -0700 (PDT)
Received: from gw1.webex.com (gw1.webex.com [64.68.122.208]) by ietfa.amsl.com (Postfix) with SMTP id 68DA721F858C; Thu, 21 Jul 2011 09:29:02 -0700 (PDT)
Received: from SRV-EXSC03.webex.local ([192.168.252.197]) by gw1.webex.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 21 Jul 2011 09:29:01 -0700
Received: from 64.101.74.200 ([64.101.74.200]) by SRV-EXSC03.webex.local ([192.168.252.200]) with Microsoft Exchange Server HTTP-DAV ; Thu, 21 Jul 2011 16:29:01 +0000
User-Agent: Microsoft-Entourage/12.24.0.100205
Date: Thu, 21 Jul 2011 10:28:59 -0600
From: Joe Hildebrand <joe.hildebrand@webex.com>
To: "Martin J. =?ISO-8859-1?B?RPxyc3Q=?=" <duerst@it.aoyama.ac.jp>, Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <CA4DAFEB.BECC%joe.hildebrand@webex.com>
Thread-Topic: [apps-discuss] i18n intro, Sunday 14:00-16:00
Thread-Index: AcxHw0sJitjZtEfii0GYJg2U1Dy4Bw==
In-Reply-To: <4E27CF30.5050205@it.aoyama.ac.jp>
Mime-version: 1.0
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable
X-OriginalArrivalTime: 21 Jul 2011 16:29:01.0589 (UTC) FILETIME=[4C946450:01CC47C3]
Cc: xmpp@ietf.org, apps-discuss@ietf.org
Subject: Re: [apps-discuss] i18n intro, Sunday 14:00-16:00
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2011 16:29:03 -0000

On 7/21/11 1:03 AM, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote:

> Slide 123: Good to see that. By the way, I seem to remember both John 
and me
> begging you for an explanation of why Jabber wants to use NFD a 
few months
> ago, and I'm not sure I have seen an answer. Now might be a 
good time (if you
> already sent one, a pointer would be appreciated).


Let me try.  First some assumptions:
- Stringprep is currently one of the performance hotspots of some XMPP
servers.
- XMPP does not guarantee that the original form of the address that is
entered by the user or sent on the first hop is transmitted without
modification to other hops in the system.
- As such, many XMPP servers optimize by performing canonicalization at the
edges of their system and even store the canonical version for future
comparison.
- If the spec is written that clients SHOULD perform canonicalization, many
in our community will, particularly if they know that they will get better
performance from the server.

The property of NFK?D that we like is that if you have a string of
codepoints that is already in NFK?D, you can check that the string is in the
correct normalization form without having to allocate memory.  With NFK?C,
you'll have to decompose (allocating memory), recompose (at some finite CPU
cost), then recompose (possibly allocating *again*) just to check if you
have already done the normalization.

For the K portion, I have found John's argument compelling that codepoints
with compatibility decompositions should just be prohibited in our
localparts.  In our resourceparts, I'm of the opinion that we don't need to
compatibility map -- it's fine for all of those codepoints to stay distinct.

The idea is that clients SHOULD normalize, servers double-check inputs from
non-trusted sources (like clients and other servers), then always store and
forward the normalized version.

-- 
Joe Hildebrand