Re: draft-klensin-unicode-escapes-02.txt

Frank Ellermann <nobody@xyzzy.claranet.de> Mon, 19 February 2007 23:10 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HJHek-0005PP-PU; Mon, 19 Feb 2007 18:10:50 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HJHek-0005PK-Ao for discuss@apps.ietf.org; Mon, 19 Feb 2007 18:10:50 -0500
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HJHeh-0005J1-Iy for discuss@apps.ietf.org; Mon, 19 Feb 2007 18:10:50 -0500
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1HJGRY-0002Dt-UM for discuss@apps.ietf.org; Mon, 19 Feb 2007 22:53:09 +0100
Received: from 212.82.251.170 ([212.82.251.170]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Mon, 19 Feb 2007 22:53:08 +0100
Received: from nobody by 212.82.251.170 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Mon, 19 Feb 2007 22:53:08 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: Re: draft-klensin-unicode-escapes-02.txt
Date: Mon, 19 Feb 2007 22:34:44 +0100
Organization: <URL:http://purl.net/xyzzy>
Lines: 41
Message-ID: <45DA17F4.4857@xyzzy.claranet.de>
References: <74711BCF624DBEC4F2C000C5@p3.JCK.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 212.82.251.170
X-Mailer: Mozilla 3.0 (OS/2; U)
X-Spam-Score: 1.6 (+)
X-Scan-Signature: a7d6aff76b15f3f56fcb94490e1052e4
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

John C Klensin wrote:

> Anyone who prefers the latter should please send the ABNF they would
> like to see.

Here's what I'd do with the existing ABNF:

   EmbeddedUnicodeChar =  BMP-form / Full-form
   BMP-form  =  %x5C.75 4HEXDIG   ; starting with lower case "\u"
   Full-form =  %x5C.55 8HEXDIG   ; starting with upper case "\U"

IOW no more <Hex-quad> because you didn't use it elsewhere, and
replacing <HexDigit> by <HEXDIG>, because the latter is already
defined in RFC 4234.

For the XML version I propose to adopt and fix the RFC 4646 ABNF:

   UNICHAR    = %x26.23.78 2*6HEXDIG ";"    ; starts with "&#x"

With a remark in the prose, that a literal "&" can be expressed by
&#x26; when using this style.  Otherwise folks could be tempted to
use &amp; - but we don't want them to try that.

You don't need ABNF in 5.4, it's in essence the same as in 5.1.

For 5.3 (perl) I don't know the correct syntax, if it's like 5.2:

  UNICODEPOINT = %x5C.78 "{" 2*6HEXDIG "}"  ; starts with "\x"

If the x is case insensitive it's simply:

  UNICODEPOINT = "\x{" 2*6HEXDIG "}"

The name "UNICODEPOINT" is horrible, please find something better, I
tried to use a new name.

Please add the [CharMod] reference (informative) with a note about
its C042..C048 conformance criteria in ch. 4.6 (character escaping).

Frank