Re: [Ltru] Re: no teleconference this week

"Doug Ewell" <dewell@roadrunner.com> Wed, 05 December 2007 06:54 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Izo9W-00063l-4E; Wed, 05 Dec 2007 01:54:38 -0500
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1Izo9V-00061F-GV for ltru-confirm+ok@megatron.ietf.org; Wed, 05 Dec 2007 01:54:37 -0500
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Izo9V-00060N-34 for ltru@ietf.org; Wed, 05 Dec 2007 01:54:37 -0500
Received: from mta9.adelphia.net ([68.168.78.199]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1Izo9U-0004t7-I8 for ltru@ietf.org; Wed, 05 Dec 2007 01:54:36 -0500
Received: from DGBP7M81 ([76.167.184.182]) by mta9.adelphia.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with SMTP id <20071205065436.OCNL55.mta9.adelphia.net@DGBP7M81>; Wed, 5 Dec 2007 01:54:36 -0500
Message-ID: <005a01c8370b$b29498f0$6601a8c0@DGBP7M81>
From: Doug Ewell <dewell@roadrunner.com>
To: LTRU Working Group <ltru@ietf.org>
References: <E1IzkOt-0006Yt-T7@megatron.ietf.org> <001101c836f9$261ed2d0$6601a8c0@DGBP7M81> <30b660a20712042051s28df0f50ib44c8beca615c5cc@mail.gmail.com> <47562F67.6060905@yahoo-inc.com> <004001c836fe$b495d310$6601a8c0@DGBP7M81> <47563861.7010701@yahoo-inc.com>
Subject: Re: [Ltru] Re: no teleconference this week
Date: Tue, 04 Dec 2007 22:54:34 -0800
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"; reply-type="response"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Spam-Score: 0.0 (/)
X-Scan-Signature: d0bdc596f8dd1c226c458f0b4df27a88
Cc:
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Addison Phillips <addison at yahoo dash inc dot com> wrote:

>> I'm opposed to spending time on a revision of RFC 4647, since we 
>> expect everyone to ignore it anyway and stick with remove-from-right.
>
> I'm completely mystified by this statement, which you've made several 
> times already. I don't at all expect that 4647 will be ignored. Quite 
> the contrary. However, "remove-from-right" *is* the heart of the 
> algorithms described in that document. Removal of extlang means we 
> don't have to modify the actual algorithms to do other than RFR. Or 
> did I miss something?

Remove-from-right is, indeed, the essence of basic filtering and basic 
lookup.  However, if RFR were all there was to it, we wouldn't have 
bothered creating a separate matching document that wasn't on the 
original LTRU 1.0 charter.  We would have covered matching within 4646, 
just as Harald did in 1766 and 3066.

But with the advent of script subtags, and the expectation of ISO 
639-3-based extlangs, we felt something more was needed.  John Cowan 
came up with the idea of "scored matching," which disassembled the tag 
and assigned point values to individual subtags, and this led to 
extended filtering, in which wildcard asterisks take the place of 
optional subtags.  These algorithms are *not* primarily based on RFR; 
they allow for possibilities like "hi-Deva-IN" matching "hi-IN", and 
they would have provided additional flexibility in matching "zh-yue" 
with "zh".

However, during the recent extlang debate, almost the entire focus was 
on whether extlangs would behave well in the presence of basic filtering 
and basic lookup.  Even John Cowan, in throwing in the towel, emphasized 
this:

"The strongest argument against extlang tags is that our simple 
remove-from-right lookup algorithm ends up losing too much information. 
If you ask for 'zh-yue-Hant', and there are no Cantonese resources, then 
by the time you have truncated the tag to just 'zh' you have forgotten 
that Traditional Han script was required."

Not necessarily if you use extended filtering.

"In order to prevent it, we would have to change 4647 to specify a 
sequence like zh-yue-Hant > zh-yue > zh-Hant > zh, where the extlang tag 
is treated magically; in essence, stripping it restores all other tags."

The way I read 4647, section 3.3.2, extended filtering would give you 
two options:

- "zh-*-Hant" would yield zh-yue-Hant > zh-Hant > zh
- "zh-yue-*" would yield zh-yue-Hant > zh-yue > zh

Which one you choose would depend on whether "Cantonese" or "Traditional 
Han" is more important to you.  This is no different from requesting 
"sr-Latn-ME" with the understanding that, if this exact combination is 
not available, a choice must be made as to whether "Latin script" or 
"Montenegro usage" is more important.

"That would be fine, except that no one actually does it or is likely to 
for some time to come."

And that is the crux: we don't expect people to take advantage of the 
advanced features of RFC 4647.  Meanwhile, the basic features of 4647 
are little changed from (though much better explained than) what we had 
in 3066.  Thus my claim that, by expecting people to stick with blind 
RFR, we are expecting them to ignore 4647.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://home.roadrunner.com/~dewell
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ



_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru