Re: draft-klensin-net-utf8-06

"Frank Ellermann" <> Mon, 22 October 2007 11:48 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1Ijvlu-0000cN-1r; Mon, 22 Oct 2007 07:48:38 -0400
Received: from discuss by with local (Exim 4.43) id 1Ijvls-0000bJ-4P for; Mon, 22 Oct 2007 07:48:36 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1Ijvlr-0000ac-79 for; Mon, 22 Oct 2007 07:48:35 -0400
Received: from ([] by with esmtp (Exim 4.43) id 1Ijvlj-00045J-OE for; Mon, 22 Oct 2007 07:48:33 -0400
Received: from list by with local (Exim 4.43) id 1IjvlJ-0008Ae-RP for; Mon, 22 Oct 2007 11:48:01 +0000
Received: from ([]) by with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <>; Mon, 22 Oct 2007 11:48:01 +0000
Received: from nobody by with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <>; Mon, 22 Oct 2007 11:48:01 +0000
From: "Frank Ellermann" <>
Subject: Re: draft-klensin-net-utf8-06
Date: Mon, 22 Oct 2007 13:45:18 +0200
Lines: 44
Message-ID: <ffi2lc$rl1$>
References: <93F25E18AB3DA3EB0599F092@p3.JCK.COM> <>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Spam-Score: 0.0 (/)
X-Scan-Signature: cab78e1e39c4b328567edb48482b6a69
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

Tim Bray wrote:

> - Lots of payload formats have their own rules about line ends

The draft is about Internet protocols based on or closely related
to "telnet" like "whois", it's not about payload formats like XML.

 [your "code point" proposal]
> introduces the term "code point", and is more precise:

>| Unicode identifies each character by an integer, called its
>| "code point", in the range 0-0x10ffff.

That might backfire, not all code points are characters, and not
all abstract characters can be encoded by a single code point.
The glossary in TUS 5.0 offers:

| Code point: Any value in [...] the range of integers from 0
| to 10FFFF [hex.]. 
> It might be worth adding a note something along these lines: "The
> BOM is useful in establishing the endian-ness of UTF-16 and UTF-32
> encodings, but serves no useful purpose in the context of UTF-8."

It's an important "signature" for plain text files on platforms 
where UTF-8 is not the local codepage.  One of these platforms is
quite popular. :-)  For the purpose of the net-utf8 I-D just saying
NO to BOMs is IMO good enough, RFC 3629 contains the fine print.

> I'd ruthlessly whack about 80% of the history and suchlike
> explication.

Oops, no, the history is essential to understand old telnet issues
like IAC without reading pre-historic RFCs with numbers below 821.

> Future consumers of this document, of which I predict there will
> be many, will need to consult and then cite the meat

I think the future consumers are standard developers and protocol
designers trying to update say RFC 3912.  Appendix C lists future
work, indirectly that defines the main audience for net-utf8.