Re: [idn] Re: Unicode categories

"Mark Davis" <mark.davis@jtcsv.com> Sat, 12 March 2005 18:24 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA27251 for <idn-archive@lists.ietf.org>; Sat, 12 Mar 2005 13:24:09 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1DABDx-0006TD-Hu for idn-data@psg.com; Sat, 12 Mar 2005 18:20:29 +0000
Received: from [32.97.110.130] (helo=e32.co.us.ibm.com) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.44 (FreeBSD)) id 1DABDu-0006SE-90 for idn@ops.ietf.org; Sat, 12 Mar 2005 18:20:26 +0000
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2CIKP5j328886 for <idn@ops.ietf.org>; Sat, 12 Mar 2005 13:20:25 -0500
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2CIKO3A185374 for <idn@ops.ietf.org>; Sat, 12 Mar 2005 11:20:24 -0700
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2CIKOFW020688 for <idn@ops.ietf.org>; Sat, 12 Mar 2005 11:20:24 -0700
Received: from markdavis (sig-9-48-112-114.mts.ibm.com [9.48.112.114]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with SMTP id j2CIKM9P020620; Sat, 12 Mar 2005 11:20:22 -0700
Message-ID: <003801c52730$2660d1c0$72703009@sanjose.ibm.com>
From: Mark Davis <mark.davis@jtcsv.com>
To: John C Klensin <klensin@jck.com>, Erik van der Poel <erik@vanderpoel.org>, idn@ops.ietf.org
Cc: Kenneth Whistler <kenw@sybase.com>
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <42322CE2.4040509@vanderpoel.org> <4232B2FD.1080104@vanderpoel.org> <59DD38FB83B7216C06E61E59@scan.jck.com>
Subject: Re: [idn] Re: Unicode categories
Date: Sat, 12 Mar 2005 10:15:18 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1437
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 8bit

We do not make any absolute guarantees of stability for the general category
and many other properties, because a miscategorized character would cause
incorrect behavior in computers for the languages that use it. And as new,
increasingly obscure characters are added to the standard, it may take some
time to get really accurate information.

However, I do want to call attention to certain properties of the Unicode
Standard that may be relevant -- characterizing identifer and pattern syntax
characters -- which do have strict requirements on stability. There is a
draft of the newest specification on
http://www.unicode.org/reports/tr31/tr31-5.html. Programming identifiers
have a bit different requirements from IDN, but they are related enough that
the information may be useful.

The data for Pattern_Syntax and Pattern_White_Space is in
http://www.unicode.org/Public/4.1.0/ucd/PropList-4.1.0d12.txt. For XID_Start
and XID_Continue, it is in
http://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties-4.1.0d12.txt

All of these will be finalized and released as part of Unicode 4.1.0.

‎Mark

----- Original Message ----- 
From: "John C Klensin" <klensin@jck.com>
To: "Erik van der Poel" <erik@vanderpoel.org>; <idn@ops.ietf.org>
Cc: "Kenneth Whistler" <kenw@sybase.com>
Sent: Saturday, March 12, 2005 08:40
Subject: [idn] Re: Unicode categories


>
>
> --On Saturday, 12 March, 2005 01:14 -0800 Erik van der Poel
> <erik@vanderpoel.org> wrote:
>
> > All,
> >
> > Please do not draw any conclusions from the raw Unicode
> > category stability data that I sent earlier. Ken Whistler, a
> > Technical Director at the Unicode Consortium, was so kind to
> > provide further information to put the data into their proper
> > perspective. See below.
> >...
>
> Erik, Ken, and others,
>
> The difficulty here is not, IMO, the specific numbers or
> percentages.  It is an important difference in perspective.
> >From the standpoint of UTC, these changes are few, minor, and
> corrections to obscure errors.  That is a perfectly sensible
> position.
>
> >From the standpoint of the IETF, or anyone else worried about a
> piece of protocol that must support many applications, the
> problem is a little different.  Some of the recent developments
> in automatic updating tools notwithstanding, IDNA (and its
> supporting tables) are designed to be embedded in and used from
> clients.  Many of those clients, and the associated operating
> systems, have been historically updated only when the machine in
> which they run is replaced.   That argues for an extremely
> conservative view of protocol design and compatibility, with
> very high thresholds for justifying incompatible changes of any
> sort.  From that viewpoint, the differences between 0.01%
> changes and 5% changes is like measures of being partially
> pregnant: perhaps helpful in some types of risk assessment, but
> less so in making the next design decision.
>
>       john
>
>
>
>