[idn] Unicode categories

Erik van der Poel <erik@vanderpoel.org> Fri, 11 March 2005 23:51 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA21317 for <idn-archive@lists.ietf.org>; Fri, 11 Mar 2005 18:51:38 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D9tmB-000ITm-3c for idn-data@psg.com; Fri, 11 Mar 2005 23:42:39 +0000
Received: from [207.115.63.101] (helo=pimout2-ext.prodigy.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D9tm8-000ITI-Rl for idn@ops.ietf.org; Fri, 11 Mar 2005 23:42:37 +0000
Received: from [10.1.1.2] (adsl-64-174-147-206.dsl.sntc01.pacbell.net [64.174.147.206]) by pimout2-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id j2BNgRMW350298; Fri, 11 Mar 2005 18:42:31 -0500
Message-ID: <42322CE2.4040509@vanderpoel.org>
Date: Fri, 11 Mar 2005 15:42:26 -0800
From: Erik van der Poel <erik@vanderpoel.org>
User-Agent: Mozilla Thunderbird 1.0 (X11/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: John C Klensin <klensin@jck.com>
CC: idn@ops.ietf.org
Subject: [idn] Unicode categories
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com>
In-Reply-To: <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_05 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

John C Klensin wrote:
> the view that, at the time, the Unicode
> classifications of characters were considered a little soft

FYI, I asked about Unicode category stability on the Unicode list, and 
received the following info:

From: "Andrew C. West" <andrewcwest@alumni.princeton.edu>

According to my calculations, the number of characters which changed 
their General Category from one version of Unicode to the next is :

1.1.5 -> 2.0.14 = 474 (1.384%)
2.0.14 -> 2.1.2 = 1 (0.0025%)
2.1.2 -> 2.1.5 = 16 (0.0410%)
2.1.5 -> 2.1.8 = 18 (0.0462%)
2.1.8 -> 2.1.9 = 3 (0.0077%)
2.1.9 -> 3.0.0 = 85 (0.2182%)
3.0.0 -> 3.0.1 = 0 (0%)
3.0.1 -> 3.1.0 = 3 (0.0061%)
3.1.0 -> 3.2.0 = 7 (0.0074%)
3.2.0 -> 4.0.0 = 16 (0.0168%)
4.0.0 -> 4.0.1 = 1 (0.0010%)
4.0.1 -> 4.1.0 = 12 (0.0124%)

I don't know what this tells you about the stability of the UCD data though.