Re: draft-klensin-net-utf8-06
"Frank Ellermann" <nobody@xyzzy.claranet.de> Mon, 15 October 2007 08:39 UTC
Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
by megatron.ietf.org with esmtp (Exim 4.43)
id 1IhLTu-0003zj-Dr; Mon, 15 Oct 2007 04:39:22 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43)
id 1IhLTs-0003wi-KP for discuss-confirm+ok@megatron.ietf.org;
Mon, 15 Oct 2007 04:39:20 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
by megatron.ietf.org with esmtp (Exim 4.43) id 1IhLTr-0003mQ-J1
for discuss@apps.ietf.org; Mon, 15 Oct 2007 04:39:19 -0400
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org)
by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IhLTg-0007XX-6K
for discuss@apps.ietf.org; Mon, 15 Oct 2007 04:39:14 -0400
Received: from list by ciao.gmane.org with local (Exim 4.43)
id 1IhLJS-00067E-Lf
for discuss@apps.ietf.org; Mon, 15 Oct 2007 08:28:34 +0000
Received: from mail.st-michaelis.de ([217.86.170.58])
by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
id 1AlnuQ-0007hv-00
for <discuss@apps.ietf.org>; Mon, 15 Oct 2007 08:28:34 +0000
Received: from nobody by mail.st-michaelis.de with local (Gmexim 0.1 (Debian))
id 1AlnuQ-0007hv-00
for <discuss@apps.ietf.org>; Mon, 15 Oct 2007 08:28:34 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: "Frank Ellermann" <nobody@xyzzy.claranet.de>
Subject: Re: draft-klensin-net-utf8-06
Date: Mon, 15 Oct 2007 10:17:33 +0200
Lines: 51
Message-ID: <fev7so$bt0$1@ger.gmane.org>
References: <93F25E18AB3DA3EB0599F092@p3.JCK.COM>
Mime-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: mail.st-michaelis.de
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 4d87d2aa806f79fed918a62e834505ca
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols
<discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>,
<mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>,
<mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org
John C Klensin wrote: > less ready than the unicode-escapes one, partially because of > the addition of a lot of new or revised text, but I hope we > are converging Looking only at the diff, yes. You've added a recommendation to avoid private use code points, because they "do not have standard definitions or normalization interpretations". I think the latter is incorrect, they're by decree normalized. Maybe what you want to say is that they can't have canonical or compatible decompositions. IMO the obvious "no standard definition" alone is clearer and good enough to justify your recommendation. Not directly related to our draft, in 5.2 you write: | The latter is important because an unassigned code point | always normalizes to itself. Something's really odd with this in the Unicode standard: For some unassigned code points it's "almost obvious" that they'll never end up in any NFX, unless they also get the non-character property. What I have in mind are "obvious gaps" in dingbats, mathematical symbols, and other blocks, where the abstract character corresponding to the given unassigned code point was already encoded elsewhere. One of the earlier examples is u+2073. In theory they could say "whatever we'll do with that code point, it will be either a non-character, or have a canonical decomposition, or stay as is (unassigned) forever". Back to the draft, s/have been be tied/have been tied/ in 5.2. > I've added some words about both HT and FF. One may need to > look in the appendices to find all of them. Fine now. Over the weekend I read a related chapter in the Unicode standard, and I think you have to mention LS u+2028 and maybe also PS u+2029 somewhere. Admittedly banning LS might upset some folks. IMO it's similar to the NEL case, LS might be even more obscure. OTOH I think that you shouldn't discuss IND u+0084, it's dead. TUS 5.0 says "formerly known as INDEX". Frank
- draft-klensin-net-utf8-06 John C Klensin
- Re: draft-klensin-net-utf8-06 Stephane Bortzmeyer
- Re: draft-klensin-net-utf8-06 Frank Ellermann
- Re: draft-klensin-net-utf8-06 John C Klensin
- Re: draft-klensin-net-utf8-06 Frank Ellermann
- Re: draft-klensin-net-utf8-06 Stephane Bortzmeyer
- Re: draft-klensin-net-utf8-06 Bill McQuillan
- Re: draft-klensin-net-utf8-06 Tim Bray
- Re: draft-klensin-net-utf8-06 Julian Reschke
- Re: draft-klensin-net-utf8-06 Frank Ellermann
- Re: draft-klensin-net-utf8-06 Tony Finch
- Re: draft-klensin-net-utf8-06 John C Klensin