[Ltru] Broken folding (was: (editor response) Review of 4646bis-10, sections 1 to 3.4)
"Frank Ellermann" <nobody@xyzzy.claranet.de> Sat, 08 December 2007 15:33 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1J11fz-0006Cm-W1; Sat, 08 Dec 2007 10:33:11 -0500
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1J11fz-00069t-3M for ltru-confirm+ok@megatron.ietf.org; Sat, 08 Dec 2007 10:33:11 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1J11fy-000692-Ow for ltru@lists.ietf.org; Sat, 08 Dec 2007 10:33:10 -0500
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1J11fy-00074A-5a for ltru@lists.ietf.org; Sat, 08 Dec 2007 10:33:10 -0500
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1J11fp-0006O7-Dr for ltru@lists.ietf.org; Sat, 08 Dec 2007 15:33:01 +0000
Received: from c-180-160-62.hh.dial.de.ignite.net ([62.180.160.62]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <ltru@lists.ietf.org>; Sat, 08 Dec 2007 15:33:01 +0000
Received: from nobody by c-180-160-62.hh.dial.de.ignite.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <ltru@lists.ietf.org>; Sat, 08 Dec 2007 15:33:01 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: ltru@lists.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Sat, 08 Dec 2007 16:34:59 +0100
Organization: <http://purl.net/xyzzy>
Lines: 70
Message-ID: <fjedf4$n21$1@ger.gmane.org>
References: <20071206163755.GP10807@mercury.ccil.org> <4759B2E9.5000106@yahoo-inc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: c-180-160-62.hh.dial.de.ignite.net
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1914
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1914
X-Spam-Score: -0.0 (/)
X-Scan-Signature: 92df29fa99cf13e554b84c8374345c17
Cc:
Subject: [Ltru] Broken folding (was: (editor response) Review of 4646bis-10, sections 1 to 3.4)
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org
Addison Phillips wrote: >> 3.1.1: change the definition of folding to "Folding is always done on >> Unicode default grapheme boundaries". That says what the current text >> says, and also prohibits folding in the middle of a Hangul syllable >> written as separate jamo. > :: shiver :: > Yes, I know. (laughing) I hoped to avoid implementing it in record-jar > though. But you're right. > The sentence was rewritten as follows, to include the example: > Folding is always done on Unicode default grapheme boundaries (that > is, never in the middle of a multibyte UTF-8 sequence nor in the > middle of a combining character sequence). Do they offer a list of combining characters somewhere, or is that a case of "grep 996 KB list for non-zero combining class" ? You need a note that "folding" is supposed to be replaced by NO space in your definition. In other words it MUST NOT occur where WSP is allowed (replacing a real WSP by folding, which is later unfolded into nothing, joins words): 1 - input was Example:<SP>fold<SP>me<CRLF> 2 - folded is Example:<SP>fold<CRLF><SP>me<CRLF> 3 - output is Example:<SP>foldme<CRLF> Your "folding" lost the space here, I see no way to protect it. The folks who created STD 11 etc. knew what they were up to, and the whole world knows how STD 11 folding works. [In another message =========================================] > IOW the registry is supposed to be viewable and readable as > a plain-text (UTF-8) file. The folding business should be similar with NCRs for graphemes, not folding "within" a NCR is obvious, like not folding within an UTF-8 code point. Record-jar is rather pointless for UTF-8, I always wanted "UTF-8 => XML", how about simply deleting the line length limit 72 ? "72 bytes" is a pointless concept for UTF-8, and 72 "graphemes" don't help with half- vs. full width. (Actually I've no clue, maybe NFC gets rid of the width hurdle) > They don't have WSP separated words in Korean, typically Ugh. Transform the 72 MUST in a SHOULD, and allow folding only at 1*WSP, allowing folding within a word can't work as expected. Violating the SHOULD for longer words is perfectly fine. We're not interested in 2047 / 2231 encoded "words" for the registry [ MIME has smart rules about SP between encoded words, but not trivial, this killed several EAI downgrade drafts from my POV ] >> Talking about "bytes" in conjunction with UTF-8 makes me nervous. > Why so? UTF-8 is a multibyte character encoding. So it's code > unit is the byte. Sure, but what are "72 bytes" supposed to do, if that could be 18 to 72 code points, and hopefully more than zero graphemes per line. The 72 only made sense for the NCR registry when viewed as ASCII. Frank _______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Re: Review of 4646bis-10, sections 1 to 3.4 John Cowan
- [Ltru] Review of 4646bis-10, sections 1 to 3.4 John Cowan
- [Ltru] Re: Review of 4646bis-10, sections 1 to 3.4 Stephane Bortzmeyer
- Re: [Ltru] Re: Review of 4646bis-10, sections 1 t… Mark Davis
- [Ltru] Re: Review of 4646bis-10, sections 1 to 3.4 Frank Ellermann
- Re: [Ltru] (editor response) Review of 4646bis-10… Addison Phillips
- Re: [Ltru] Re: Review of 4646bis-10, sections 1 t… Addison Phillips
- [Ltru] Broken folding (was: (editor response) Rev… Frank Ellermann
- [Ltru] Re: Broken folding (was: (editor response)… Stephane Bortzmeyer
- Re: [Ltru] Broken folding (was: (editor response)… John Cowan
- [Ltru] Corrections to 4646bis-11 (was: Review of … John Cowan
- Re: [Ltru] Corrections to 4646bis-11 Addison Phillips