Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching

Florian Rivoal <florian@rivoal.net> Thu, 29 August 2019 09:00 UTC

Return-Path: <florian@rivoal.net>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 78C87120103 for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 02:00:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rivoal.net header.b=YQ3s75gE; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=Cdnzvq10
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oFSLaKtl_gD4 for <ltru@ietfa.amsl.com>; Thu, 29 Aug 2019 02:00:21 -0700 (PDT)
Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DFECA120020 for <ltru@ietf.org>; Thu, 29 Aug 2019 02:00:20 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id C60F521EC3; Thu, 29 Aug 2019 05:00:19 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 29 Aug 2019 05:00:19 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivoal.net; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm1; bh=o 6Doyjy2JxIh1Ipn5I/KKlOZwLEwdWjOhSjgcfx4ZiA=; b=YQ3s75gEAuA1ayVwZ vgwgt6U+wWw/BZtmMHl1RhdhXU/BFmQnV72W0UWJYER1h32xbkHOSOaCh87qBohC FzG59SJmiCdel/kXwB8kCg+9A0juu276i0LrSqYnVUWgpY64GYB5MWZ7lSifPPjP sQaeMuaow+9GPF3euPCCiU8/0bA8PAQ3fRrUZn+FhujUDT51D0joPml3PC9o+9Cs 9qftLrce0rT3r9PGMg4OiyiW0FH4zJirUaZN4eIKuw1BBz7p1jQeSS4X7lG7MMdX +6YRIUTS1bXGwiEM056P62LVxC18MHPDHm9AHwTZnCmu2NT9j6C4sYXFT4RauvFy +HiWw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=o6Doyjy2JxIh1Ipn5I/KKlOZwLEwdWjOhSjgcfx4Z iA=; b=Cdnzvq10DGEDRwYcJTIFaUfoW8EIeJk0uB5zG/FH99+exBz87Xxw1loie 4JO/VhCKEhLuijWvunc/lA5d0gW7q/K8ptQVJlsnM36As93A/bJzc8+kM24ZLMRC 6g69q/Eewv+o4o5S4TDYzl1UXDPn0trVap7ks30qsC4UoxtNehPHCin7jqgeLB37 pUuv+Sfhm3b159DukCbUwVPHDPZiBWOhBi6PMYUslWNtVqWdz2hnUIvS1gayBb8O vcNxoFmmpyNcEZcuQmbiT5wv3DI6KrNlEycXihnAPg+G11szK7op0zINxJ6J7k1h 0XUNVcriODn/3QvpigzYqtyQxMHRQ==
X-ME-Sender: <xms:I5RnXShiYPn21IoQq8E8RiS9FS1rypWsSFOjbbGsUL5J8z8aFsvYvQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrudeivddgtdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptggguffhjgffgffkfhfvofesthhqmhdthhdtjeenucfhrhhomhephfhlohhr ihgrnhcutfhivhhorghluceofhhlohhrihgrnhesrhhivhhorghlrdhnvghtqeenucffoh hmrghinhepihgvthhfrdhorhhgpdhirghnrgdrohhrghenucfkphepudekfedrjeeirdeh rddvtdenucfrrghrrghmpehmrghilhhfrhhomhepfhhlohhrihgrnhesrhhivhhorghlrd hnvghtnecuvehluhhsthgvrhfuihiivgeptd
X-ME-Proxy: <xmx:I5RnXbkP4k_TTPSHknlN8wv0lrS-EEn3Wcu3nE4kdznYcBymKSgfsw> <xmx:I5RnXRKM5qZQzMP3a9vea7R01irU1XE4ezKsRty_GKE3VeIcfqZzag> <xmx:I5RnXQUtbQvADBtO9lS6KkrS2GG9VY3SstUERGxf-ASs1kL7Q2x3mg> <xmx:I5RnXWODJlQnzIYMMZ7lOOaCZhtel4EbLhPCFyqVA-zjPLjlknT8KA>
Received: from [192.168.1.3] (ab005020.dynamic.ppp.asahi-net.or.jp [183.76.5.20]) by mail.messagingengine.com (Postfix) with ESMTPA id E4168D60065; Thu, 29 Aug 2019 05:00:17 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Florian Rivoal <florian@rivoal.net>
In-Reply-To: <CAJ2xs_GWQH=zOvzVqUqpFKHmLKWZTR=ybJOv+K_SMhCW==X23g@mail.gmail.com>
Date: Thu, 29 Aug 2019 18:00:14 +0900
Cc: Doug Ewell <doug@ewellic.org>, LTRU Working Group <ltru@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <73BE5AF7-0C62-425F-834E-8759628D2C5F@rivoal.net>
References: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com> <910CB6C8-9F66-4255-B149-B146DA8E5695@rivoal.net> <CAJ2xs_GWQH=zOvzVqUqpFKHmLKWZTR=ybJOv+K_SMhCW==X23g@mail.gmail.com>
To: Mark Davis ☕️ <mark@macchiato.com>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ltru/6rYYxIBxVtyL3YuH80AvJvlxrbU>
Subject: Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ltru/>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Aug 2019 09:00:24 -0000


> On Aug 29, 2019, at 15:19, Mark Davis ☕️ <mark@macchiato.com> wrote:
> 
> Canonicalizing to the extlang form has a number of disadvantages, and I would recommend strongly against it. Don't have time to discuss now, will see about next week.

Very interested in what you have to say about this.

In case that influences what you have to say, note that what I intend to do is not to store the canonicalized-to-extlang form anywhere. It would only be for internal processing: when performing an extended filtering operation, where it is unknown whether the ranges and tags are in extlang form or not, canonicalize both to extlang form do the extended filtering operation on that.

That way, a zh-*-Hant selector would match both a zh-yue-Hant document element and a yue-Hant one.

—Florian


> On Wed, Aug 28, 2019 at 11:56 AM Florian Rivoal <florian@rivoal.net> wrote:
> 
> 
> > On Aug 28, 2019, at 2:47, Doug Ewell <doug@ewellic.org> wrote:
> > 
> > On July 27, Florian Rivoal wrote:
> > 
> >> However, RFC5646 Section 4.5, which defines canonicalization, only
> >> does so for language tags, not for language ranges. Presumably, the
> >> process is largely the same, with wildcards in the language subtag
> >> being preserved, and I suppose wildcards in other subtags would likely
> >> be dropped. But as it stands, that seems undefined.
> > 
> > I think you are on the right track by assuming that ranges are
> > canonicalized just like tags, with asterisks left alone.
> 
> Thanks for confirming.
> 
> > It's not very likely that most LTRU participants will be eager to start
> > up a new IETF project to update 5646 for something like this. Best to go
> > with your assumption.
> > 
> >> Also, while giving recommendations about canonicalization for the
> >> purpose of filtering, it would seem useful to mention (and possibly to
> >> recommend) canonicalizing to the "extlang form". The definition of the
> >> extlang form itself (in  RFC5646 Section 4.5) mentions that it is
> >> useful for matching and selecting, but that information isn't relayed
> >> anywhere RFC4647.
> > 
> > At the time these documents were written, there was a strong sentiment
> > around de-emphasizing extlangs in general. It's good to know that
> > there's a real-world use case for using them here. Again, it's unlikely
> > that people will want to rev 4647 for this.
> 
> The use case is CSS selectors, when writing rules for typography/styling in a document. On the one hand, the document gets marked up which part of it are in which language. On the other side, the style sheet describes which part of the document must be styled which way, and can make that styling dependent on the language.
> 
> The need for normalization comes from the fact that stylesheet authoring and document authoring are not coordinated in the general case, so a stylesheet author cannot know, generally speaking, if a document will be marked up with, for example, zh-yue or yue. The stylesheet author is then faced with two options, both unattractive for different reasons:
> * use the deprecated tag: it's more likely to be found in existing documents due to being older. The first downside is that it doesn't always work. The second one is that this slows down adoption of the newer preferred tag, as document authors wanting to be compatible with existing stylesheets will keep on using the deprecated one as well for compatibility, and we get into a vicious cycle of everybody continuing to use the deprecated variant.
> 
> * Use both the deprecated and the preferred tag in the stylesheet's selector. This works, but it means that stylesheet authors need to be aware of, and manually replicate the information in https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry. Asking people to manually do what software could isn't great, as they tend not to, or to do it with bugs, or to not update when the registry updates, etc.
> 
> So it seems preferable, given that this correspondence is maintained in a neatly usable format, to have CSS renderers deal with the correspondence between deprecated and preferred tags by way of canonicalizing to the extlang form and doing the selector matching on that.
> 
> In the long run, both document authors and stylesheet authors should use the preferred tag without the extlang prefix, and the canonicalization to extang form will be invisible to them. But even if some don't, everything works.
> 
> —Florian
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www.ietf.org/mailman/listinfo/ltru