Encoding of small characters in draft-klensin-unicode-escapes-04

Stephane Bortzmeyer <bortzmeyer@nic.fr> Fri, 05 October 2007 14:24 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ido6m-0006ri-W8; Fri, 05 Oct 2007 10:24:52 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1Ido6m-0006qd-1b for discuss-confirm+ok@megatron.ietf.org; Fri, 05 Oct 2007 10:24:52 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ido6l-0006qR-Jv for discuss@apps.ietf.org; Fri, 05 Oct 2007 10:24:51 -0400
Received: from mx2.nic.fr ([192.134.4.11]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Ido6f-0007cN-E3 for discuss@apps.ietf.org; Fri, 05 Oct 2007 10:24:51 -0400
Received: from mx2.nic.fr (localhost [127.0.0.1]) by mx2.nic.fr (Postfix) with SMTP id 5F9351C0108 for <discuss@apps.ietf.org>; Fri, 5 Oct 2007 16:24:34 +0200 (CEST)
Received: from relay2.nic.fr (relay2.nic.fr [192.134.4.163]) by mx2.nic.fr (Postfix) with ESMTP id 5BAFD1C00F7 for <discuss@apps.ietf.org>; Fri, 5 Oct 2007 16:24:34 +0200 (CEST)
Received: from bortzmeyer.nic.fr (batilda.nic.fr [192.134.4.69]) by relay2.nic.fr (Postfix) with ESMTP id 4ADED58ECC6 for <discuss@apps.ietf.org>; Fri, 5 Oct 2007 16:24:34 +0200 (CEST)
Date: Fri, 5 Oct 2007 16:24:34 +0200
From: Stephane Bortzmeyer <bortzmeyer@nic.fr>
To: discuss@apps.ietf.org
Subject: Encoding of small characters in draft-klensin-unicode-escapes-04
Message-ID: <20071005142434.GA26901@nic.fr>
References: <3A8797AD0BB8B1EF4FAA7DE8@p3.JCK.COM>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3A8797AD0BB8B1EF4FAA7DE8@p3.JCK.COM>
X-Operating-System: Debian GNU/Linux 4.0
X-Kernel: Linux 2.6.18-4-686 i686
Organization: NIC France
X-URL: http://www.nic.fr/
User-Agent: Mutt/1.5.13 (2006-08-11)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: d6b246023072368de71562c0ab503126
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

On Thu, Oct 04, 2007 at 05:10:43PM -0400,
 John C Klensin <john-ietf@jck.com> wrote 
 a message of 49 lines which said:

> It now recommends either \u'NNNN..' 

[Small syntax detail, do not spend too much time on it.]

Section 5.1 describes the content of the "Backslash-U with Delimiters"
form as "4*6HEXDIG". Why not "2*6HEXDIG"? It would be more compact for
the first characters and would be more consistent with the other forms
such as the "XML and HTML" one.

My personal taste is that \u'20' is better than \u'0020'.

Section 6.2 contains a note which seems related (the risk that small
numbers are thought to represent octets, in the current locale) but I
do not think it is a serious risk since section 2 clearly states that
we encode Unicode code points, not octets.