I-D.klensin-unicode-escapes (was: New Draft)

Frank Ellermann <nobody@xyzzy.claranet.de> Fri, 02 February 2007 13:34 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HCyYG-0007OO-Id; Fri, 02 Feb 2007 08:34:04 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HCyYF-0007Mb-0E for discuss@apps.ietf.org; Fri, 02 Feb 2007 08:34:03 -0500
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HCyYD-0001Cb-Lu for discuss@apps.ietf.org; Fri, 02 Feb 2007 08:34:02 -0500
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1HCyCi-0000QJ-5h for discuss@apps.ietf.org; Fri, 02 Feb 2007 14:11:48 +0100
Received: from d252203.dialin.hansenet.de ([80.171.252.203]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Fri, 02 Feb 2007 14:11:48 +0100
Received: from nobody by d252203.dialin.hansenet.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Fri, 02 Feb 2007 14:11:48 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: I-D.klensin-unicode-escapes (was: New Draft)
Date: Fri, 02 Feb 2007 14:05:34 +0100
Organization: <URL:http://purl.net/xyzzy>
Lines: 34
Message-ID: <45C3371E.330F@xyzzy.claranet.de>
References: <875A124D75A8B481E176CF06@p3.JCK.COM> <uppsr2hs59srbd7eufbcul5a1ekl7i09nl@hive.bjoern.hoehrmann.de> <EF59DA6FD89C4F19750C68C3@p3.JCK.COM> <20070202114658.GX7742@finch-staff-1.thus.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: d252203.dialin.hansenet.de
X-Mailer: Mozilla 3.0 (OS/2; U)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: bb8f917bb6b8da28fc948aeffb74aa17
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

Clive D.W. Feather wrote:

> If you check the HTML specification (section 5.3), it says that SGML
> allows the semicolon to be omitted in certain contexts, but "strongly
> suggest" not to do that.

Yes, never ever mention that HTML exists, it's horrible.  The [Charmod]
bible requires no (SGML) nonsense in http://www.w3.org/TR/charmod/#C044

The I-D should IMO adopt and cite [Charmod] C042 up to C048 verbatim.

A few other conformance criteria in [Charmod] might be also interesting:
http://www.w3.org/TR/charmod/#C070  Don't exclude arbitrary code points
http://www.w3.org/TR/charmod/#C077  Don't allow anything above U+10FFFF
http://www.w3.org/TR/charmod/#C078  Don't (ab)use surrogates
http://www.w3.org/TR/charmod/#C079  Don't (ab)use non-characters

http://www.w3.org/TR/charmod/#C015  n/a (covered by the better RFC 2277)
http://www.w3.org/TR/charmod/#C016  n/a (covered by the better RFC 2277)
http://www.w3.org/TR/charmod/#C017  Stick to working encoding rules
http://www.w3.org/TR/charmod/#C018  n/a (covered by the better RFC 2277)

http://www.w3.org/TR/charmod/#C049  n/a (for the I-D US-ASCII is given)
http://www.w3.org/TR/charmod/#C026  n/a (covered by the better RFC 2277)

Etc.  The "better RFC 2277" idea is a single default UTF-8, instead of a
choice between UTF-8, UTF-16, UTF-16LE, UTF16-BE, UTF-32, UTF-32LE, and
UTF-32BE in [Charmod], let alone hypothetical UTF-32 "2143" or "3412".

Probably the I-D should mention that one famous exception from its rule
to avoid encoded UTF-8 is the URL form of IRIs.

Frank