Re: [apps-discuss] i18n intro, Sunday 14:00-16:00

Bjoern Hoehrmann <derhoermi@gmx.net> Thu, 21 July 2011 16:56 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 28CF421F8743 for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 09:56:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.458
X-Spam-Level:
X-Spam-Status: No, score=-3.458 tagged_above=-999 required=5 tests=[AWL=-1.459, BAYES_00=-2.599, J_CHICKENPOX_31=0.6]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wEgRQERpzvED for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 09:56:37 -0700 (PDT)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by ietfa.amsl.com (Postfix) with SMTP id 40E6621F8663 for <apps-discuss@ietf.org>; Thu, 21 Jul 2011 09:56:37 -0700 (PDT)
Received: (qmail invoked by alias); 21 Jul 2011 16:56:35 -0000
Received: from dslb-094-223-187-169.pools.arcor-ip.net (EHLO HIVE) [94.223.187.169] by mail.gmx.net (mp046) with SMTP; 21 Jul 2011 18:56:35 +0200
X-Authenticated: #723575
X-Provags-ID: V01U2FsdGVkX1+MbhRYPi2MHqepkUvc+ojR4LsbnjjORBRWxQhKlR X/GRP6jgps6aeT
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: Joe Hildebrand <joe.hildebrand@webex.com>
Date: Thu, 21 Jul 2011 18:57:02 +0200
Message-ID: <a3lg275sr9j8bnrkb3cdr3e4ap9kh0n0dk@hive.bjoern.hoehrmann.de>
References: <4E27CF30.5050205@it.aoyama.ac.jp> <CA4DAFEB.BECC%joe.hildebrand@webex.com>
In-Reply-To: <CA4DAFEB.BECC%joe.hildebrand@webex.com>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Y-GMX-Trusted: 0
Cc: apps-discuss@ietf.org, xmpp@ietf.org
Subject: Re: [apps-discuss] i18n intro, Sunday 14:00-16:00
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2011 16:56:40 -0000

* Joe Hildebrand wrote:
>The property of NFK?D that we like is that if you have a string of
>codepoints that is already in NFK?D, you can check that the string is in the
>correct normalization form without having to allocate memory.  With NFK?C,
>you'll have to decompose (allocating memory), recompose (at some finite CPU
>cost), then recompose (possibly allocating *again*) just to check if you
>have already done the normalization.

The set of strings that is in Normalization Form C is a regular language
see <http://lists.w3.org/Archives/Public/www-archive/2009Feb/0071.html>,
so recognizing NFC strings is just as easy as recognizing NFD strings if
ignore that automata for NFC are bigger and harder to make than for NFD.
It's easier to use the simple heuristic in the specification and then do
what you suggest above for complicated strings, but it's not necessary.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/