[Ltru] Re: [OT] Re: UTF-8
Frank Ellermann <nobody@xyzzy.claranet.de> Fri, 15 September 2006 11:07 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GOBXe-0005qN-FK; Fri, 15 Sep 2006 07:07:30 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GOBXd-0005qI-B5 for ltru@lists.ietf.org; Fri, 15 Sep 2006 07:07:29 -0400
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GOBXa-0001I6-Sg for ltru@lists.ietf.org; Fri, 15 Sep 2006 07:07:29 -0400
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1GOBXP-0005po-NY for ltru@lists.ietf.org; Fri, 15 Sep 2006 13:07:15 +0200
Received: from pd9fbad2d.dip0.t-ipconnect.de ([217.251.173.45]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <ltru@lists.ietf.org>; Fri, 15 Sep 2006 13:07:15 +0200
Received: from nobody by pd9fbad2d.dip0.t-ipconnect.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <ltru@lists.ietf.org>; Fri, 15 Sep 2006 13:07:15 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: ltru@lists.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Fri, 15 Sep 2006 13:03:35 +0200
Organization: <URL:http://purl.net/xyzzy>
Lines: 64
Message-ID: <450A8887.EAB@xyzzy.claranet.de>
References: <E1GNzAK-0005WV-Uf@megatron.ietf.org> <007501c6d890$b96ff410$6401a8c0@DGBP7M81>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: pd9fbad2d.dip0.t-ipconnect.de
X-Mailer: Mozilla 3.0 (OS/2; U)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 52f7a77164458f8c7b36b66787c853da
Cc:
Subject: [Ltru] Re: [OT] Re: UTF-8
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org
Doug Ewell wrote: > I'm not sure why you chose 0x86 as your sequence introducer Six is the number of trailing octets: 91909F9F9F9F (for John's example u+10FFFF). > you could make each sequence 1 byte shorter by marking the > lead or trail byte specially Yes, but then only 2 octets (80+81) would never occur (instead of 11), and lost 9x bytes won't cause an error. UTF-8 has now 13 "impossible" octets, and similar features. > "Terminal jockeys" such as Frank da Cruz (inventor of the > Kermit protocol) would argue that C1 controls are also part > of "Latin-1" and hence this format is still not > Latin-1-friendly (a complaint that also used to be brought > against UTF-8). Yes, you can only use UTF-8 or UTF-4 with legacy applications for windows-1252, not with applications needing the C1 control codes as is. He could use UTF-1 (it protects C0, SP, DEL, C1) or UTF-7. For my local purposes UTF-4 is fine, my text editor supports hex. I'm less good with modulo 64 for UTF-8 by heart, I need extra macros to decode / encode it. >> we can't use this for the registry, and we also won't try >> BOCU-1. But maybe IANA should offer a gzip-ped version. > BOCU-1 text tends to include a lot of bytes in the C1 range > (0x80 through 0x9F) and might not travel through e-mail very > well. It's as good or bad as UTF-8 (or in theory UTF-4) for that purpose, with 8BITMIME or news you need no CTE, otherwise you need B64 or QP. If e-mail has a problem with 8 bits these problems aren't limited to C1, it could be anything, likely a parity bit. > I still haven't received a clear answer about how patent- > encumbered BOCU-1 might be. IBM's statement in UTS #40 is "royalty-free". I didn't ask them for a US-license. In the EU and some other parts of the world it's AFAIK (and IANAL) a complete waste of time and money to patent algorithms. Patenting modulo 243 arithmetic is an odd idea. For UTF-4 (= modulo 16 with 64 lines CharMapML) it would be ridiculous (but one of these 64 lines is a copyright, just in case). > I like the gzip idea. lstreg6.txt 82218 (2006-08-04) lstreg6.txt.gz 11788 lstreg6.xml 104627 (see my reply to Debbie) lstreg6.xml.gz 12141 Matches your observations in UniCompress. For the 4646bis registry it makes sense for some folks (for me it's less relevant, the V.90 bottleneck has its own compression) Frank _______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] [OT] Re: UTF-8 Doug Ewell
- [Ltru] Re: [OT] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 Addison Phillips
- RE: [Ltru] Re: UTF-8 McDonald, Ira
- [Ltru] Re: UTF-8 Frank Ellermann
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: DOCTYPE ltru Doug Ewell
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 John Cowan
- Re: [Ltru] Re: UTF-8 Addison Phillips
- [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: UTF-8 Frank Ellermann
- [Ltru] Re: UTF-8 Stephane Bortzmeyer
- [Ltru] Re: UTF-8 Stephane Bortzmeyer
- Re: [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: RFC 4646 production "grandfathered" co… Doug Ewell
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 John Cowan
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 John Cowan
- Re: [Ltru] Re: UTF-8 Addison Phillips
- Re: [Ltru] Re: UTF-8 John Cowan
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: RFC 4646 production "grandfathered" co… Frank Ellermann
- Re: [Ltru] Re: RFC 4646 production "grandfathered… John Cowan
- [Ltru] Re: UTF-8 Frank Ellermann
- [Ltru] Re: UTF-8 Frank Ellermann
- Re: [Ltru] Re: UTF-8 Martin Duerst
- [Ltru] Re: RFC 4646 production "grandfathered" co… Frank Ellermann
- Re: [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Doug Ewell
- [Ltru] Re: UTF-8 Frank Ellermann
- RE: [Ltru] Re: UTF-8 Peter Constable
- [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] Re: UTF-8 Martin Duerst
- RE: [Ltru] Re: UTF-8 Martin Duerst
- RE: [Ltru] Re: UTF-8 Peter Constable
- Re: [Ltru] Re: UTF-8 Addison Phillips
- [Ltru] Re: UTF-8 Stephane Bortzmeyer
- Re: [Ltru] UTF-8 Reshat Sabiq (Reşat)
- RE: [Ltru] UTF-8 McDonald, Ira
- [Ltru] UTF-8 Reshat Sabiq (Reşat)
- Re: [Ltru] UTF-8 John Cowan
- Re: [Ltru] UTF-8 Randy Presuhn
- Re: [Ltru] UTF-8 John Cowan
- Re: [Ltru] UTF-8 GerardM
- Re: [Ltru] UTF-8 John Cowan
- Re: [Ltru] UTF-8 Randy Presuhn
- Re: [Ltru] UTF-8 Addison Phillips
- Re: [Ltru] UTF-8 Addison Phillips
- RE: [Ltru] UTF-8 Peter Constable
- Re: [Ltru] UTF-8 Reshat Sabiq (Reşat)
- [Ltru] Re: UTF-8 Doug Ewell
- Re: [Ltru] UTF-8 Reshat Sabiq (Reşat)