Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching

Florian Rivoal <florian@rivoal.net> Fri, 30 August 2019 01:08 UTC

Return-Path: <florian@rivoal.net>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9DAC0120C61 for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 18:08:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rivoal.net header.b=a4uywvUy; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=Cq4Fa+B1
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 38b6G7BgjFPs for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 18:08:40 -0700 (PDT)
Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 52893120C64 for <ltru@ietf.org>; Thu, 29 Aug 2019 18:08:40 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 9A59F21B5A; Thu, 29 Aug 2019 21:08:39 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 29 Aug 2019 21:08:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivoal.net; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm1; bh=6 ZV/rvNl7pUvK6xtdxNYhAMSmtx3KEqloL1zTOAnKvQ=; b=a4uywvUyp2YnCKuut hNH/1k5QVdty5QJXkoT/H7NA77pRNe/V78I4Aa0h3FHH2DRjuO5MoVw0VG8sWh1D NLdkD6wGddY4ndKmkOr83vGsS0xC69Jq4YSsH80MWQgi7Dea/RogoIZOzdy9rT+h 5dGVTC2S8z0014HuUpxRYIaVTpkWOGNc/KBr4MpxyvNAEEk0QzBO7orOc6MRkakF UUykWGajupZweY01qJDv7Hgdcv5f9k3HiicvZGjk2vny3BbxKRpuVgo0CHPV9WQm IcGIQQ9Lc35b+xVQfdf5D2tsQMDOl918wb9Pva27UX22FAodUNmwjW41P3ye96pc 27djQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=6ZV/rvNl7pUvK6xtdxNYhAMSmtx3KEqloL1zTOAnK vQ=; b=Cq4Fa+B1M9Hmzi3jV5TXDglElrrQsJE5oOOG5pdc5RKqjXWh4Xmisznbd 78T1NPQEPdJ9jT3h3I+wbxLIKGWiBmpgqJpd9TvXgVvXvXv3YKFU8T1xESxSW9s1 60/nrzsUGNJCcHY020CSttEpSlLcMf5Se8VPkjQa71rqvatYqK3ZyFqmsPgCi6xq csKAjUEUznMu5vRMby77nO70Rh0lihPLQzc9ERt17nWlOYmJB1MDX5vI/vDS4fmz kkoJ4YeuFX2L6/SBGXm+I9BRAtuL4PgIHtWcz/R1bAqNYdr5SjXAoDe6MuO0+2yW +W/0eTZq5hVYLywfLQWTu/QoC+Rfg==
X-ME-Sender: <xms:F3doXXC3ZZAjB0C6scXaTdq7Vl8cbknflLOZvkTOF-uT5ow9iYuxFg>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrudeifedgfeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptggguffhjgffgffkfhfvofesthhqmhdthhdtjeenucfhrhhomhephfhlohhr ihgrnhcutfhivhhorghluceofhhlohhrihgrnhesrhhivhhorghlrdhnvghtqeenucfkph epudekfedrjeeirdehrddvtdenucfrrghrrghmpehmrghilhhfrhhomhepfhhlohhrihgr nhesrhhivhhorghlrdhnvghtnecuvehluhhsthgvrhfuihiivgeptd
X-ME-Proxy: <xmx:F3doXdGAThi5g5Cue01hTg3lHtMj32pl-FbWjQ4CgSh7-uXDzPyyBQ> <xmx:F3doXVl_wRThPrDMyPpMFDTy78Eoq5Ja-ffAfpqPfi3y8jVcYfmzLg> <xmx:F3doXVwlG4sF9CVHVA2ZlAFfeygOHwUd4c2A-OmiRrBfUPhDmDZ03g> <xmx:F3doXS2MTnaaPCmCbNitrGANCsUSWmvqtW-AAR3X01plFs7ehX6IQg>
Received: from [192.168.1.3] (ab005020.dynamic.ppp.asahi-net.or.jp [183.76.5.20]) by mail.messagingengine.com (Postfix) with ESMTPA id 88B9B80059; Thu, 29 Aug 2019 21:08:37 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Florian Rivoal <florian@rivoal.net>
In-Reply-To: <CAD2gp_S+dDdgo9WsOixT_-jHkWZxmajWmx2MRKi0iDHVSwd-3g@mail.gmail.com>
Date: Fri, 30 Aug 2019 10:08:33 +0900
Cc: Mark Davis ☕️ <mark@macchiato.com>, LTRU Working Group <ltru@ietf.org>, Doug Ewell <doug@ewellic.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <94B0FC03-B793-43BB-B864-38A306E6B5CA@rivoal.net>
References: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com> <910CB6C8-9F66-4255-B149-B146DA8E5695@rivoal.net> <CAJ2xs_GWQH=zOvzVqUqpFKHmLKWZTR=ybJOv+K_SMhCW==X23g@mail.gmail.com> <73BE5AF7-0C62-425F-834E-8759628D2C5F@rivoal.net> <CAD2gp_S+dDdgo9WsOixT_-jHkWZxmajWmx2MRKi0iDHVSwd-3g@mail.gmail.com>
To: John Cowan <cowan@ccil.org>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ltru/iC3oPLeFIrmxMdLNKNe8b6U6DaA>
Subject: Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ltru/>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Aug 2019 01:08:48 -0000


> On Aug 29, 2019, at 22:44, John Cowan <cowan@ccil.org> wrote:
> 
> 
> 
> On Thu, Aug 29, 2019 at 5:00 AM Florian Rivoal <florian@rivoal.net> wrote:
> 
> In case that influences what you have to say, note that what I intend to do is not to store the canonicalized-to-extlang form anywhere. It would only be for internal processing: when performing an extended filtering operation, where it is unknown whether the ranges and tags are in extlang form or not, canonicalize both to extlang form do the extended filtering operation on that.
> 
> In that case you can equally canonicalize away from the extlang form as toward it.  I recommend that.

Can you?

Let's say you want to match (using extended filtering) the zh range against documents that may contain the zh-yue or yue tags (and possibly other zh-cmn, zh-hakka, zh, zh-HK…). This could be something a typesetter wants to do to use a particular font and set of line breaking rules for any chunk of Chinese (in the broad sense) text.

If we canonicalize to extlang form: 
  zh -> zh
  zh-yue -> zh-yue
  yue -> zh-yue
Therefore, the zh range will match both the documents that contained zh-yue or yue. This is what I want.

If we canonicalize away from extlang form: 
  zh -> zh
  zh-yue -> yue
  yue -> yue
Therefore, the zh range will match neither documents that contained zh-yue nor yue. This is not what I want, and is worse than not canonicalizing at all.

So it seems to me that no, we cannot canonicalize away from the extlang form and get the same results.

If the extended filtering operation did something smart with macrolanguages, then I wouldn't need canonicalization at all, but it doesn't, so I feel I need to canonicalize, and as described above, only canonicalization to extlang actually seems to help.

Am I missing something?

—Florian