Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching

Florian Rivoal <florian@rivoal.net> Fri, 30 August 2019 03:55 UTC

Return-Path: <florian@rivoal.net>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F94112024E for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 20:55:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rivoal.net header.b=giYAEaQZ; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=lC41eBmt
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Td6S_-bX919c for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 20:55:24 -0700 (PDT)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A9524120236 for <ltru@ietf.org>; Thu, 29 Aug 2019 20:55:24 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 9F1F421B74; Thu, 29 Aug 2019 23:55:23 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 29 Aug 2019 23:55:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivoal.net; h= from:message-id:content-type:mime-version:subject:date :in-reply-to:cc:to:references; s=fm1; bh=BeP9oPqrhHBemwkAL42JCs7 t6NqbhRMSaJIhMcMN0QQ=; b=giYAEaQZILwgzz/eaQ9LGJuu6/mAJ1XNYcMCxzK C7luEuy0dweJdlhwd8N6j/t0Uc06Nx1kq/f9qI1U86hgmkF2+N9Lp/FkuuMUjWkh qw/uNYCoNNJ8zB4m108BMixAqDYNpokLSewRVKWnpZpskWyY3U9Hk/yzabdrM/aU TPIcT0AMIMEh+X/0aygh0V/7nmmkWyekw5dopAhJ7dDhtktuY/cwPhcbR6N6jySW uLfZYo1ewTuWq80RbQcaJ40XFlMz4FSvWeV2yEDvz8KqaoBJ6AvquelUJ1Jb09O/ mC4aTX71DSUeNAtkVBqnab8qhNeXd0JgGIMzusJ81tQVv/g==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=BeP9oP qrhHBemwkAL42JCs7t6NqbhRMSaJIhMcMN0QQ=; b=lC41eBmtEhQ4i6gi0gpWK/ g3ETe8WIf1UzkTJVnwfAYdYQzDb3WHS+wtcmJuQIKYglcGihkXpfSgexRZAsYfwk q8a29YAJAHblzU60IN3ilW/2WMtS2x8Me7JKPL6eZYfmAm8ZONYuJ8sfTnI97+Z3 Ew/qSWh5bchSh1r4jvn5SYWVCH5Nz+6b5/f64TqWgCNtjBEqQo6UAkDTrgcetvWc T2RRCHDK0PCL7IHi48ouVnaG+BmRVNE5IpJImG9pRttvnhAA7UMuoofZ9aQtGBr1 vU1NbnE2cKwLtkCAesYEmMUUyStrIqQz0TOT+J4Wfr5FkzUZ4op6YW6uI7nMcPsA ==
X-ME-Sender: <xms:Kp5oXcwwaxwQ7E8tF9nOs8avy6UtvF5vBZJ7JTRLOA5yFYTKxE5hUQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrudeifedgjeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffktgggufffjgfvfhfosegrtdhmrehhtdejnecuhfhrohhmpefhlhhorhhi rghnucftihhvohgrlhcuoehflhhorhhirghnsehrihhvohgrlhdrnhgvtheqnecukfhppe dukeefrdejiedrhedrvddtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehflhhorhhirghn sehrihhvohgrlhdrnhgvthenucevlhhushhtvghrufhiiigvpedt
X-ME-Proxy: <xmx:Kp5oXZswHnoxBAgrEf0_LamHWyUx-eTQ9o9_MYQfb703Wlecd3tHTw> <xmx:Kp5oXZhFScPE5bLHYrwIEcpufp4vFsLXqdc0IPOX_zS_vQJicpApkw> <xmx:Kp5oXRdWACBHG8hIH5xYHcq6gsKGY6n50BtmWsqsNugE1RM_6rYU1A> <xmx:K55oXenWcDNk3rK0ARskJRWrDtzo6r8dRGsat6tyjwr0NM8S6K-i5A>
Received: from [192.168.1.3] (ab005020.dynamic.ppp.asahi-net.or.jp [183.76.5.20]) by mail.messagingengine.com (Postfix) with ESMTPA id 96F36D6005D; Thu, 29 Aug 2019 23:55:21 -0400 (EDT)
From: Florian Rivoal <florian@rivoal.net>
Message-Id: <3B2C913C-DE8F-43E2-9819-55DD349EF0BA@rivoal.net>
Content-Type: multipart/alternative; boundary="Apple-Mail=_E062A478-AE77-4408-9CEB-59E908C20DB2"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Fri, 30 Aug 2019 12:55:17 +0900
In-Reply-To: <CAD2gp_TJcRiMQYPpV=R7XEd-wZS=UOgQEzCb6t=0iWSBMx2cvQ@mail.gmail.com>
Cc: LTRU Working Group <ltru@ietf.org>, Doug Ewell <doug@ewellic.org>
To: John Cowan <cowan@ccil.org>
References: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com> <910CB6C8-9F66-4255-B149-B146DA8E5695@rivoal.net> <CAJ2xs_GWQH=zOvzVqUqpFKHmLKWZTR=ybJOv+K_SMhCW==X23g@mail.gmail.com> <73BE5AF7-0C62-425F-834E-8759628D2C5F@rivoal.net> <CAD2gp_S+dDdgo9WsOixT_-jHkWZxmajWmx2MRKi0iDHVSwd-3g@mail.gmail.com> <94B0FC03-B793-43BB-B864-38A306E6B5CA@rivoal.net> <A8BF9658-CB3C-4DC5-AECB-840C0053A943@rivoal.net> <CAD2gp_TJcRiMQYPpV=R7XEd-wZS=UOgQEzCb6t=0iWSBMx2cvQ@mail.gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ltru/m06Ko7suh_YTdofg2LzbCqfyW8k>
Subject: Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ltru/>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Aug 2019 03:55:27 -0000


> On Aug 30, 2019, at 11:17, John Cowan <cowan@ccil.org> wrote:
> 
> Got it. Yes, with extended filtering, you need to canonicalize toward extlangs.  I thought you were talking about simple equality matching after canonicalization.
> 
> Note that almost all macrolanguages do not have extlang tags; the languages can be identified by the macrolanguage or by the individual language tag, but not both.  Indeed, only ar, kok, lv, ms, sw, uz, zh have extlang tags subordinated to them;

Ah, indeed, I had failed to notice that. That doesn't make what I am proposing to do wrong, but it does limit its usefulness to a subset of the macrolanguage families. So my Chinese example would work, but my Norwegian one wouldn't.

Is there a reason for the lack of extlang on these macrolanguages that don't have it? inferring the equivalent information by inverting the "Prefered value" information doesn't sound hard, but if it wasn't done, presumably there's a reason.

> sgn does too but is not a macrolanguage.

It doesn't sound terribly useful to me to do the extlang canonicalization for the sake of sgn, but neither does it sound harmful.

—Florian

> 
> On Thu, Aug 29, 2019 at 9:14 PM Florian Rivoal <florian@rivoal.net <mailto:florian@rivoal.net>> wrote:
> 
> 
> > On Aug 30, 2019, at 10:08, Florian Rivoal <florian@rivoal.net <mailto:florian@rivoal.net>> wrote:
> > 
> > 
> > 
> >> On Aug 29, 2019, at 22:44, John Cowan <cowan@ccil.org <mailto:cowan@ccil.org>> wrote:
> >> 
> >> 
> >> 
> >> On Thu, Aug 29, 2019 at 5:00 AM Florian Rivoal <florian@rivoal.net <mailto:florian@rivoal.net>> wrote:
> >> 
> >> In case that influences what you have to say, note that what I intend to do is not to store the canonicalized-to-extlang form anywhere. It would only be for internal processing: when performing an extended filtering operation, where it is unknown whether the ranges and tags are in extlang form or not, canonicalize both to extlang form do the extended filtering operation on that.
> >> 
> >> In that case you can equally canonicalize away from the extlang form as toward it.  I recommend that.
> > 
> > Can you?
> > 
> > Let's say you want to match (using extended filtering) the zh range against documents that may contain the zh-yue or yue tags (and possibly other zh-cmn, zh-hakka, zh, zh-HK…). This could be something a typesetter wants to do to use a particular font and set of line breaking rules for any chunk of Chinese (in the broad sense) text.
> > 
> > If we canonicalize to extlang form: 
> >  zh -> zh
> >  zh-yue -> zh-yue
> >  yue -> zh-yue
> > Therefore, the zh range will match both the documents that contained zh-yue or yue. This is what I want.
> > 
> > If we canonicalize away from extlang form: 
> >  zh -> zh
> >  zh-yue -> yue
> >  yue -> yue
> > Therefore, the zh range will match neither documents that contained zh-yue nor yue. This is not what I want, and is worse than not canonicalizing at all.
> > 
> > So it seems to me that no, we cannot canonicalize away from the extlang form and get the same results.
> > 
> > If the extended filtering operation did something smart with macrolanguages, then I wouldn't need canonicalization at all, but it doesn't, so I feel I need to canonicalize, and as described above, only canonicalization to extlang actually seems to help.
> > 
> > Am I missing something?
> > 
> > —Florian
> 
> Sorry for not including that in the previous message, but to give another example, if I want to use the no range to match any of: no, no-bok, no-nyn, nb, or nn, canonicalization to extlang form works, and canonicalization away from it doesn't.
> 
> —Florian