Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching

John Cowan <cowan@ccil.org> Fri, 30 August 2019 04:06 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F197312024E for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 21:06:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ccil-org.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OmFhKWW4bfRe for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 21:06:48 -0700 (PDT)
Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1AAF2120241 for <ltru@ietf.org>; Thu, 29 Aug 2019 21:06:48 -0700 (PDT)
Received: by mail-wr1-x42c.google.com with SMTP id s18so5529428wrn.1 for <ltru@ietf.org>; Thu, 29 Aug 2019 21:06:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccil-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=l0b7XRcd1J/PbKJvtSPz0Q+bTWSrjpefnXB7enfmDoE=; b=AUDci+Ud+BWzYGgD1RVpNFQQonuZdgov0/mmWf1+ownSKb+vMT74RxCb4TdzICEuTQ iIuqfYKhGd8jUbhGTolhmv6C8iJ4Syi/3ENBlQ6Q3ypbkgk7v+ow9CSIVfc7elcMnCu/ DSmJ6hUMmYpy1eziMpm6Oh24fv4F1wQrHccO59oIbG1YndOUMrKgrdnxNDFJre8Qd3+q q9p5jgn3oyMiRLqq8zg7/CxBCyx5kNDuNsNN3R7QHgdL4R6iBxgekSh92ASYDgLQjH0a KIwDXdZtu67NgWi+OddZ8va4UCkEmViMKf9FsRbK82IW0sXkd5PtHFO+KHsc99cY4WHN gFBw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=l0b7XRcd1J/PbKJvtSPz0Q+bTWSrjpefnXB7enfmDoE=; b=VUJi8efiVfdD4x2YBeGbJPHnQEUAXT1sI0qsy+eFRGgaY64NcnZdsZi6CsbZL9alg9 m3Tkldio/+r96Gfhwuk956gZjmaP8oSfloD40HAOR/7MaWA2ZyKIJzX5yOp4AwMhpprf VCuj99er2ZNw660/TkLo3hF/YgXh3Y6C0wXcPelZIeppWqqB+kCu3chA4JvVXNdPq4Q7 oQIGp6W3DiMDnVYdqd+kuSKRtJSHNG/HwMmhmOXjXb7xmaQhmxLsj8fBZtrB6PI7bk0K t6xWUsd4liAOeu61ck4KU63/mEvq2KWSCekzVGKuU0dKvT2dEQlwXr70Ggc69sMyQFy3 sXJw==
X-Gm-Message-State: APjAAAWjaOt6o3EaGwMbyya7dJgnV8IlfWDkMfoYRa5PuKy3bP/7njPG H1/Jr//AKLawo0H+NMZJpQ7QvqVVQSEp2Qq3NCWpg0E0
X-Google-Smtp-Source: APXvYqz3mPe7YnC0KlMjCj+Vk29/E90qoKkZZOv4yNpEbdGClou+T9YO4HC3o1Yo2DLtWTGxpzlCJxZBOXU3tWBNw2c=
X-Received: by 2002:a5d:678d:: with SMTP id v13mr7593212wru.133.1567138005482; Thu, 29 Aug 2019 21:06:45 -0700 (PDT)
MIME-Version: 1.0
References: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com> <910CB6C8-9F66-4255-B149-B146DA8E5695@rivoal.net> <CAJ2xs_GWQH=zOvzVqUqpFKHmLKWZTR=ybJOv+K_SMhCW==X23g@mail.gmail.com> <73BE5AF7-0C62-425F-834E-8759628D2C5F@rivoal.net> <CAD2gp_S+dDdgo9WsOixT_-jHkWZxmajWmx2MRKi0iDHVSwd-3g@mail.gmail.com> <94B0FC03-B793-43BB-B864-38A306E6B5CA@rivoal.net> <A8BF9658-CB3C-4DC5-AECB-840C0053A943@rivoal.net> <CAD2gp_TJcRiMQYPpV=R7XEd-wZS=UOgQEzCb6t=0iWSBMx2cvQ@mail.gmail.com> <3B2C913C-DE8F-43E2-9819-55DD349EF0BA@rivoal.net>
In-Reply-To: <3B2C913C-DE8F-43E2-9819-55DD349EF0BA@rivoal.net>
From: John Cowan <cowan@ccil.org>
Date: Fri, 30 Aug 2019 00:06:34 -0400
Message-ID: <CAD2gp_SywW8Fgpq7e747ses_JhqciwX=tqpqZ2TZ9rM+bfjvkA@mail.gmail.com>
To: Florian Rivoal <florian@rivoal.net>
Cc: LTRU Working Group <ltru@ietf.org>, Doug Ewell <doug@ewellic.org>
Content-Type: multipart/alternative; boundary="00000000000021a1c905914dc022"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ltru/x3fbuayK4FY_K_bN_4SSRMHtV_w>
Subject: Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ltru/>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Aug 2019 04:06:51 -0000

It was only done for macrolanguages that already had compound tags like
zh-yue, plus a few others that seemed likely to be treated similiarly,
often because they only a single written form between them.  Aymara is a
macrolanguage with two individual languages, Central and Southern, but only
Central has a written form and is an official language (of Bolivia).  If
you don't care about the diffference, you can use ay, or you can use ayc or
ayr to distinguish, but nobody has to worry about abstracting over them.



On Thu, Aug 29, 2019 at 11:55 PM Florian Rivoal <florian@rivoal.net> wrote:

>
>
> On Aug 30, 2019, at 11:17, John Cowan <cowan@ccil.org> wrote:
>
> Got it. Yes, with extended filtering, you need to canonicalize toward
> extlangs.  I thought you were talking about simple equality matching after
> canonicalization.
>
> Note that almost all macrolanguages do not have extlang tags; the
> languages can be identified by the macrolanguage or by the individual
> language tag, but not both.  Indeed, only ar, kok, lv, ms, sw, uz, zh have
> extlang tags subordinated to them;
>
>
> Ah, indeed, I had failed to notice that. That doesn't make what I am
> proposing to do wrong, but it does limit its usefulness to a subset of the
> macrolanguage families. So my Chinese example would work, but my Norwegian
> one wouldn't.
>
> Is there a reason for the lack of extlang on these macrolanguages that
> don't have it? inferring the equivalent information by inverting the
> "Prefered value" information doesn't sound hard, but if it wasn't done,
> presumably there's a reason.
>
> sgn does too but is not a macrolanguage.
>
>
> It doesn't sound terribly useful to me to do the extlang canonicalization
> for the sake of sgn, but neither does it sound harmful.
>
> —Florian
>
>
> On Thu, Aug 29, 2019 at 9:14 PM Florian Rivoal <florian@rivoal.net> wrote:
>
>>
>>
>> > On Aug 30, 2019, at 10:08, Florian Rivoal <florian@rivoal.net> wrote:
>> >
>> >
>> >
>> >> On Aug 29, 2019, at 22:44, John Cowan <cowan@ccil.org> wrote:
>> >>
>> >>
>> >>
>> >> On Thu, Aug 29, 2019 at 5:00 AM Florian Rivoal <florian@rivoal.net>
>> wrote:
>> >>
>> >> In case that influences what you have to say, note that what I intend
>> to do is not to store the canonicalized-to-extlang form anywhere. It would
>> only be for internal processing: when performing an extended filtering
>> operation, where it is unknown whether the ranges and tags are in extlang
>> form or not, canonicalize both to extlang form do the extended filtering
>> operation on that.
>> >>
>> >> In that case you can equally canonicalize away from the extlang form
>> as toward it.  I recommend that.
>> >
>> > Can you?
>> >
>> > Let's say you want to match (using extended filtering) the zh range
>> against documents that may contain the zh-yue or yue tags (and possibly
>> other zh-cmn, zh-hakka, zh, zh-HK…). This could be something a typesetter
>> wants to do to use a particular font and set of line breaking rules for any
>> chunk of Chinese (in the broad sense) text.
>> >
>> > If we canonicalize to extlang form:
>> >  zh -> zh
>> >  zh-yue -> zh-yue
>> >  yue -> zh-yue
>> > Therefore, the zh range will match both the documents that contained
>> zh-yue or yue. This is what I want.
>> >
>> > If we canonicalize away from extlang form:
>> >  zh -> zh
>> >  zh-yue -> yue
>> >  yue -> yue
>> > Therefore, the zh range will match neither documents that contained
>> zh-yue nor yue. This is not what I want, and is worse than not
>> canonicalizing at all.
>> >
>> > So it seems to me that no, we cannot canonicalize away from the extlang
>> form and get the same results.
>> >
>> > If the extended filtering operation did something smart with
>> macrolanguages, then I wouldn't need canonicalization at all, but it
>> doesn't, so I feel I need to canonicalize, and as described above, only
>> canonicalization to extlang actually seems to help.
>> >
>> > Am I missing something?
>> >
>> > —Florian
>>
>> Sorry for not including that in the previous message, but to give another
>> example, if I want to use the no range to match any of: no, no-bok, no-nyn,
>> nb, or nn, canonicalization to extlang form works, and canonicalization
>> away from it doesn't.
>>
>> —Florian
>
>
>