Re: draft-klensin-unicode-escapes-05 and draft-klensin-net-utf8-05

"Frank Ellermann" <> Mon, 08 October 2007 09:33 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1IeozQ-0007G3-Dw; Mon, 08 Oct 2007 05:33:28 -0400
Received: from discuss by with local (Exim 4.43) id 1IeozP-0007FC-AB for; Mon, 08 Oct 2007 05:33:27 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1IeozP-0007F4-0U for; Mon, 08 Oct 2007 05:33:27 -0400
Received: from ([] by with esmtp (Exim 4.43) id 1IeozI-00006q-Ky for; Mon, 08 Oct 2007 05:33:26 -0400
Received: from list by with local (Exim 4.43) id 1Ieoyt-0005gE-9d for; Mon, 08 Oct 2007 09:32:55 +0000
Received: from ([]) by with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <>; Mon, 08 Oct 2007 09:32:55 +0000
Received: from nobody by with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <>; Mon, 08 Oct 2007 09:32:55 +0000
From: "Frank Ellermann" <>
Subject: Re: draft-klensin-unicode-escapes-05 and draft-klensin-net-utf8-05
Date: Mon, 8 Oct 2007 11:30:03 +0200
Lines: 150
Message-ID: <fectfr$6v1$>
References: <D88739D9B4DB164FDD94809C@p3.JCK.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Spam-Score: 0.0 (/)
X-Scan-Signature: c54bc2f42d02429833c0ca4b8725abd7
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

John C Klensin wrote:

> New versions of these documents have been posted for your
> reading pleasure

The unicode-escapes-05 are fine.  One observation:  In 5.2
you mention how to encode a literal '&' as "&#x26;", that's
enough to solve all issues with literal "&#x" constructs.

In section 5.1 you don't mention a similar trick for "\u'"
constructs, and "implementors" (i.e. protocol designers)
might pick some '\\' convention to get a literal '\'.  Is
that as it should be?

In net-utf8-05 2.1 I'm not sure about this recommendation:
| CR SHOULD NOT appear except when followed by LF.

There's no similar statement about a bare LF.  If you want
to permit bare LF for cursor control on dumb terminals,
then you could also permit bare CR for this purpose.  

If you want to prohibit bare CR as "line ending" (the topic
of item 2 in section 2.1), then you could also prohibit a
bare LF for this purpose, maybe you could also mention NEL.

Combined with item 3 in 2.1 I think what you really want
is to ban both bare CR _and_ bare LF, something like this:

| CR SHOULD NOT appear except when followed by LF, and v.v.

In 2.1 item 3 please s/FF/Formfeed/ and simplify this part:
"FF should be used only with caution: if its use assumes a
 page length, such assumptions may not be appropriate in
 international contexts (e.g., considering 8.5x11 inch
 paper versus A4)".

There's more about this (fonts, font sizes), you could
simply state:  "Formfeeds should be used only with
caution, e.g. in legacy formats like Internet Drafts".

At the end of 2.1 you use a "MUST NOT" about the CR.
Either it's MUST NOT or SHOULD NOT.  ITYM the latter,
otherwise the following CR NUL "SHOULD" makes no 
sense (for me).  

There's no section 2.2 and no intro before section 
2.1, please get rid of the unused subsection title.

Section 3:
| The section above requires that all Net-Unicode
| strings be transmitted in normalized form.

s/requires/recommends/ (SHOULD in 2.1 item 4)

Section 4:
| The normalization specified here, NFC

s/specified/recommended/ (you don't specify NFC,
section 3 explicitly says that NFC is specified
by Unicode in UAX #15).

Section 5.2:
| IETF Standard UTF-8 is dependent on some definitions
| not changing after Unicode Version 4.0.

I'm not aware of such dependencies.  UTF-8 depends on 
two points:  The will to limit Unicode to 21 bits, and 
the will to outlaw "overlong" encodings.  Both points
are guaranteed in STD 63, they don't depend on anything
defined by say UTC.

The complete IDNA issue is IMO unrelated to "net-utf8".
I'd delete the complete section 5.2, rewrite it from
scratch, or something.  

Appendix A:
s/2068/2616/ or s/2068/1945/ (?)  ITYM 1945 HTTP/1.0.

Back to section 2.1:
Before discussing FF (Formfeed) at all you should talk
about HT.  Many decent protocols use WSP as defined in
the future ABNF 4234bis standard, i.e. SP or HT.