[Ltru] [OT] Re: UTF-8
"Doug Ewell" <dewell@adelphia.net> Fri, 15 September 2006 06:37 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GO7Kc-0007VZ-MH; Fri, 15 Sep 2006 02:37:46 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GO7Kb-0007VU-HM for ltru@ietf.org; Fri, 15 Sep 2006 02:37:45 -0400
Received: from mta13.adelphia.net ([68.168.78.44]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GO7Ka-00018K-8E for ltru@ietf.org; Fri, 15 Sep 2006 02:37:45 -0400
Received: from DGBP7M81 ([68.67.66.131]) by mta10.adelphia.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with SMTP id <20060915063232.UCHE27224.mta10.adelphia.net@DGBP7M81>; Fri, 15 Sep 2006 02:32:32 -0400
Message-ID: <007501c6d890$b96ff410$6401a8c0@DGBP7M81>
From: Doug Ewell <dewell@adelphia.net>
To: LTRU Working Group <ltru@ietf.org>
References: <E1GNzAK-0005WV-Uf@megatron.ietf.org>
Date: Thu, 14 Sep 2006 23:32:31 -0700
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 9ed51c9d1356100bce94f1ae4ec616a9
Cc: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: [Ltru] [OT] Re: UTF-8
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org
Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote: >> up to 11: [ & # x 1 0 F F F F ; ], although in practice nothing more >> than 10 will ever be required. > > My "7" was for a Latin-1 friendly UTF-4, not escape sequences: > > Hex. 86 91 90 9F 9F 9F 9F (lead byte + 6 hex. digits), in that form a > legacy text viewer or editor could display 191 visible Latin-1 > characters as is, instead of "only" 95 ASCII for UTF-8. First, you said UTF-8, not this. Second, highly non-standard, but a fun thought experiment of the kind I used to indulge in. I'm not sure why you chose 0x86 as your sequence introducer (U+0086 START OF SELECTED AREA, or in Wnidows-1252, U+2020 DAGGER). If all C1 control bytes are available, you could make each sequence 1 byte shorter by marking the lead or trail byte specially: 81 90 9F 9F 9F 9F <- lead byte 8x, all others 9x 81 80 8F 8F 8F 9F <- trail byte 9x, all others 8x "Terminal jockeys" such as Frank da Cruz (inventor of the Kermit protocol) would argue that C1 controls are also part of "Latin-1" and hence this format is still not Latin-1-friendly (a complaint that also used to be brought against UTF-8). > Of course we can't use this for the registry, and we also won't try > BOCU-1. But maybe IANA should offer a gzip-ped version. BOCU-1 text tends to include a lot of bytes in the C1 range (0x80 through 0x9F) and might not travel through e-mail very well. And I still haven't received a clear answer about how patent-encumbered BOCU-1 might be. I like the gzip idea. -- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/ RFC 4645 * UTN #14 _______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] [OT] Re: UTF-8 Doug Ewell
- [Ltru] Re: [OT] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 Addison Phillips
- RE: [Ltru] Re: UTF-8 McDonald, Ira
- [Ltru] Re: UTF-8 Frank Ellermann
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: DOCTYPE ltru Doug Ewell
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 John Cowan
- Re: [Ltru] Re: UTF-8 Addison Phillips
- [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: UTF-8 Frank Ellermann
- [Ltru] Re: UTF-8 Stephane Bortzmeyer
- [Ltru] Re: UTF-8 Stephane Bortzmeyer
- Re: [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: RFC 4646 production "grandfathered" co… Doug Ewell
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 John Cowan
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 John Cowan
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 John Cowan
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: RFC 4646 production "grandfathered" co… Frank Ellermann
- Re: [Ltru] Re: RFC 4646 production "grandfathered… John Cowan
- [Ltru] Re: UTF-8 Frank Ellermann
- [Ltru] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: RFC 4646 production "grandfathered" co… Frank Ellermann
- Re: [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Frank Ellermann
- RE: [Ltru] Re: UTF-8 Peter Constable
- [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] Re: UTF-8 Martin Duerst
- RE: [Ltru] Re: UTF-8 Martin Duerst
- RE: [Ltru] Re: UTF-8 Peter Constable
- Re: [Ltru] Re: UTF-8 Addison Phillips
- [Ltru] Re: UTF-8 Stephane Bortzmeyer
- Re: [Ltru] UTF-8 Reshat Sabiq (Reşat)
- RE: [Ltru] UTF-8 McDonald, Ira
- [Ltru] UTF-8 Reshat Sabiq (Reşat)
- Re: [Ltru] UTF-8 John Cowan
- Re: [Ltru] UTF-8 Randy Presuhn
- Re: [Ltru] UTF-8 John Cowan
- Re: [Ltru] UTF-8 GerardM
- Re: [Ltru] UTF-8 John Cowan
- Re: [Ltru] UTF-8 Randy Presuhn
- Re: [Ltru] UTF-8 Addison Phillips
- Re: [Ltru] UTF-8 Addison Phillips
- RE: [Ltru] UTF-8 Peter Constable
- Re: [Ltru] UTF-8 Reshat Sabiq (Reşat)
- [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] UTF-8 Reshat Sabiq (Reşat)