Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching

John Cowan <cowan@ccil.org> Fri, 30 August 2019 02:17 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 281B4120116 for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 19:17:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ccil-org.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gMK2n1kXDIKe for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 19:17:32 -0700 (PDT)
Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7D4811200E5 for <ltru@ietf.org>; Thu, 29 Aug 2019 19:17:32 -0700 (PDT)
Received: by mail-wm1-x32a.google.com with SMTP id y135so3311476wmc.1 for <ltru@ietf.org>; Thu, 29 Aug 2019 19:17:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccil-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H+s2WgIooqhd1D7MPLRCmGNhFw83yrenoogTdNIsYS8=; b=ct+1FXomSQx9mUHVgRKzC8e8n0S6PIpu3hKRSOddu6PlXD9eeSi0ktjwKktfYXiw+J 2X2YyOTwuhDeug2exnhId8uKlUyXnSH+cH7ngOjvNu5hugskZP4n2QZERa+dFdozMkki cyOXZ6O24gJvhLmTEWrJI2LmjyXqVVKVJIHcUYvjv66qHuRu1wdJwC5jGHu8pWXDhS2d JK+NtoBaJ6QK3i91N04OV+NQsUMjWiECOYhVUpA+KLA4qub3SO0ME3+91ANeYUw2EWt+ w3rY6HpJN4h/LnU24j74tyOUH0xVlFrveOk8oFXz5l4xDcez5vkCfNbxL/dh1GquO74D xt7Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H+s2WgIooqhd1D7MPLRCmGNhFw83yrenoogTdNIsYS8=; b=n+N4I8hWGp3gqdR3bVoYCkiHbcZZR6qpaMWkBaplqi8w5P5re21Jq9639B/gcV8oJy UHkFlPZmbnZCNVGAjOsfp18LvoxwbgXpGxeNL9Hq/Usx35Yi3sJIgvaSmTFw3dWDao4v REHvp9xeeFU5UIAuE1QoRWaWMcwfIbuIaDIkT1iKz2pV6zsRvfhQ5LPUfhmoy2O3+K1j lEoTO+jdRppJizhizCQukUNoMyGXfyL9wk17qZJlK8tL47UYJQ93XMkKqDLpLsBijC+D Pvt0drXyLS7qsSFCvQ84yfN0Xfmtvhc9SwAc7Skty55tjylnBpMnQt5jxUc8SSVbu8AZ GR9g==
X-Gm-Message-State: APjAAAU3eK54WymFmuTrFEPNTiw8ysy/uXZiOkaHcwX6sKUJMrva4Hr0 N/6ocVy3pCX5e64+CIQ/vFkojEobaxdLLusTCmbCUzuY
X-Google-Smtp-Source: APXvYqw2Gm1vA+/rhmckrsduEmUaWHmcwWDuvxzejDC7srB+PvQsg1d+CKVgnfSP15YfJKOGpSCAJ3Mo9DnMgvkhwQA=
X-Received: by 2002:a1c:96c6:: with SMTP id y189mr8458218wmd.160.1567131450932; Thu, 29 Aug 2019 19:17:30 -0700 (PDT)
MIME-Version: 1.0
References: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com> <910CB6C8-9F66-4255-B149-B146DA8E5695@rivoal.net> <CAJ2xs_GWQH=zOvzVqUqpFKHmLKWZTR=ybJOv+K_SMhCW==X23g@mail.gmail.com> <73BE5AF7-0C62-425F-834E-8759628D2C5F@rivoal.net> <CAD2gp_S+dDdgo9WsOixT_-jHkWZxmajWmx2MRKi0iDHVSwd-3g@mail.gmail.com> <94B0FC03-B793-43BB-B864-38A306E6B5CA@rivoal.net> <A8BF9658-CB3C-4DC5-AECB-840C0053A943@rivoal.net>
In-Reply-To: <A8BF9658-CB3C-4DC5-AECB-840C0053A943@rivoal.net>
From: John Cowan <cowan@ccil.org>
Date: Thu, 29 Aug 2019 22:17:19 -0400
Message-ID: <CAD2gp_TJcRiMQYPpV=R7XEd-wZS=UOgQEzCb6t=0iWSBMx2cvQ@mail.gmail.com>
To: Florian Rivoal <florian@rivoal.net>
Cc: LTRU Working Group <ltru@ietf.org>, Doug Ewell <doug@ewellic.org>
Content-Type: multipart/alternative; boundary="0000000000007312a405914c398f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ltru/2VQDaPiI7RA6j-0xuHFoq4lcVqY>
Subject: Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ltru/>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Aug 2019 02:17:35 -0000

Got it. Yes, with extended filtering, you need to canonicalize toward
extlangs.  I thought you were talking about simple equality matching after
canonicalization.

Note that almost all macrolanguages do not have extlang tags; the languages
can be identified by the macrolanguage or by the individual language tag,
but not both.  Indeed, only ar, kok, lv, ms, sw, uz, zh have extlang tags
subordinated to them; sgn does too but is not a macrolanguage.




On Thu, Aug 29, 2019 at 9:14 PM Florian Rivoal <florian@rivoal.net> wrote:

>
>
> > On Aug 30, 2019, at 10:08, Florian Rivoal <florian@rivoal.net> wrote:
> >
> >
> >
> >> On Aug 29, 2019, at 22:44, John Cowan <cowan@ccil.org> wrote:
> >>
> >>
> >>
> >> On Thu, Aug 29, 2019 at 5:00 AM Florian Rivoal <florian@rivoal.net>
> wrote:
> >>
> >> In case that influences what you have to say, note that what I intend
> to do is not to store the canonicalized-to-extlang form anywhere. It would
> only be for internal processing: when performing an extended filtering
> operation, where it is unknown whether the ranges and tags are in extlang
> form or not, canonicalize both to extlang form do the extended filtering
> operation on that.
> >>
> >> In that case you can equally canonicalize away from the extlang form as
> toward it.  I recommend that.
> >
> > Can you?
> >
> > Let's say you want to match (using extended filtering) the zh range
> against documents that may contain the zh-yue or yue tags (and possibly
> other zh-cmn, zh-hakka, zh, zh-HK…). This could be something a typesetter
> wants to do to use a particular font and set of line breaking rules for any
> chunk of Chinese (in the broad sense) text.
> >
> > If we canonicalize to extlang form:
> >  zh -> zh
> >  zh-yue -> zh-yue
> >  yue -> zh-yue
> > Therefore, the zh range will match both the documents that contained
> zh-yue or yue. This is what I want.
> >
> > If we canonicalize away from extlang form:
> >  zh -> zh
> >  zh-yue -> yue
> >  yue -> yue
> > Therefore, the zh range will match neither documents that contained
> zh-yue nor yue. This is not what I want, and is worse than not
> canonicalizing at all.
> >
> > So it seems to me that no, we cannot canonicalize away from the extlang
> form and get the same results.
> >
> > If the extended filtering operation did something smart with
> macrolanguages, then I wouldn't need canonicalization at all, but it
> doesn't, so I feel I need to canonicalize, and as described above, only
> canonicalization to extlang actually seems to help.
> >
> > Am I missing something?
> >
> > —Florian
>
> Sorry for not including that in the previous message, but to give another
> example, if I want to use the no range to match any of: no, no-bok, no-nyn,
> nb, or nn, canonicalization to extlang form works, and canonicalization
> away from it doesn't.
>
> —Florian