Re: draft-klensin-net-utf8-06

"Frank Ellermann" <nobody@xyzzy.claranet.de> Mon, 22 October 2007 11:48 UTC

To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: Re: draft-klensin-net-utf8-06
Date: Mon, 22 Oct 2007 13:45:18 +0200
Lines: 44
Message-ID: <ffi2lc$rl1$1@ger.gmane.org>
References: <93F25E18AB3DA3EB0599F092@p3.JCK.COM> <517bf110710220323l493c61ccrcc2d72ee3801f60a@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Precedence: list
Errors-To: discuss-bounces@apps.ietf.org

Tim Bray wrote:

> - Lots of payload formats have their own rules about line ends

The draft is about Internet protocols based on or closely related
to "telnet" like "whois", it's not about payload formats like XML.

 [your "code point" proposal]
> introduces the term "code point", and is more precise:

>| Unicode identifies each character by an integer, called its
>| "code point", in the range 0-0x10ffff.

That might backfire, not all code points are characters, and not
all abstract characters can be encoded by a single code point.
The glossary in TUS 5.0 offers:

| Code point: Any value in [...] the range of integers from 0
| to 10FFFF [hex.]. 

 [BOM]
> It might be worth adding a note something along these lines: "The
> BOM is useful in establishing the endian-ness of UTF-16 and UTF-32
> encodings, but serves no useful purpose in the context of UTF-8."

It's an important "signature" for plain text files on platforms 
where UTF-8 is not the local codepage.  One of these platforms is
quite popular. :-)  For the purpose of the net-utf8 I-D just saying
NO to BOMs is IMO good enough, RFC 3629 contains the fine print.

> I'd ruthlessly whack about 80% of the history and suchlike
> explication.

Oops, no, the history is essential to understand old telnet issues
like IAC without reading pre-historic RFCs with numbers below 821.

> Future consumers of this document, of which I predict there will
> be many, will need to consult and then cite the meat

I think the future consumers are standard developers and protocol
designers trying to update say RFC 3912.  Appendix C lists future
work, indirectly that defines the main audience for net-utf8.

 Frank

Re: draft-klensin-net-utf8-06 Frank Ellermann
Re: draft-klensin-net-utf8-06 Stephane Bortzmeyer
draft-klensin-net-utf8-06 John C Klensin
Re: draft-klensin-net-utf8-06 Stephane Bortzmeyer
Re: draft-klensin-net-utf8-06 Frank Ellermann
Re: draft-klensin-net-utf8-06 John C Klensin
Re: draft-klensin-net-utf8-06 Bill McQuillan
Re: draft-klensin-net-utf8-06 Tim Bray
Re: draft-klensin-net-utf8-06 Julian Reschke
Re: draft-klensin-net-utf8-06 Frank Ellermann
Re: draft-klensin-net-utf8-06 Tony Finch
Re: draft-klensin-net-utf8-06 John C Klensin