Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching

Florian Rivoal <florian@rivoal.net> Wed, 28 August 2019 09:56 UTC

Return-Path: <florian@rivoal.net>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DAFD91200B4 for <ltru@ietfa.amsl.com>; Wed, 28 Aug 2019 02:56:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rivoal.net header.b=gPHVdwwe; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=RS1rQnoB
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t1sN4wPEcVEa for <ltru@ietfa.amsl.com>; Wed, 28 Aug 2019 02:56:47 -0700 (PDT)
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 44514120096 for <ltru@ietf.org>; Wed, 28 Aug 2019 02:56:47 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 6A80621F15; Wed, 28 Aug 2019 05:56:45 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Wed, 28 Aug 2019 05:56:45 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivoal.net; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm1; bh=b xWb0JVBYSK/E0KMfWvQIo68gRSz1/EU7ItvaNy26Pg=; b=gPHVdwweLHO+Xrrjw v33C0k5OpphC27kjPXbvffXUamYnyIE6YTcvU+NyC00HI2AbEBEbfAba4bIY5EOl vsO/CDGR/bewf+OVtwT4kGZYCeBlENsxbsg3r7+E+w7N2c2B+/pu2bhbkTMDI+AC b5FGCXOJWYPUxWBdsuTuJbJ/DbssoZ5iUz4bd/WS+HIFx1u57IHm+a+6Qy+V+rUO /TmjVw9mXO63UXRBcQQ3kiNtC8YtpUeOO4RJ/ENnlpCTLCpmRQQS3JQaS/4VmV4K s1w946QNy7gq+oAMJa4mI6WO6gc62/Y1kQZWCZaw7csc+0ACkG2U2uS19bIburFT fvGBg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=bxWb0JVBYSK/E0KMfWvQIo68gRSz1/EU7ItvaNy26 Pg=; b=RS1rQnoBPHZ6YPJisUJ6zzfVOKrzPsKloSUsHC/AcNKeinlHoDDGKC8vG 5r086JzfXxsVZp5VX6Ejl9es6kiAVynER80BR9DwMUdf4KPX1oHOo71XTjLbydqQ o8I6Olq2BaRahFKFeZsvsZWlWq3UaVO4wb5xKGylfMBqoz5F1xAR1f2yjT1VQ6PX YUx4BPSn2r4bJmwpGkr4sAkrD5HIgYfxEmqlethAqj5TfjSArIqUpoQdKBkwFRKd qRFh4sxx9sfeO20gF8gxNCylYUW5XogemjzXRuI/3Rx44eZgTWc5hHJbWIt6Tfm+ pbJ74/Ocl/G/Qekzfqf1vp6gy72Hw==
X-ME-Sender: <xms:3E9mXdAfJCiNiQGpT2NWT4duzaqcINIOwq8o0mOnpIewpFxN6Envyg>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrudeitddgvddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptggguffhjgffgffkfhfvofesthhqmhdthhdtjeenucfhrhhomhephfhlohhr ihgrnhcutfhivhhorghluceofhhlohhrihgrnhesrhhivhhorghlrdhnvghtqeenucffoh hmrghinhepihgrnhgrrdhorhhgnecukfhppedukeefrdejiedrhedrvddtnecurfgrrhgr mhepmhgrihhlfhhrohhmpehflhhorhhirghnsehrihhvohgrlhdrnhgvthenucevlhhush htvghrufhiiigvpedt
X-ME-Proxy: <xmx:3E9mXXnAZCGmSkT0ZNnQx-4MYxCVPFJ4d0Kdcv0TfHwlgo3MXMngYA> <xmx:3E9mXVT4nk0y40ejAJlfaeVa4b6s27h0N73xy5JtAIWc9wUOpsSWfg> <xmx:3E9mXeFhZMbypu_axaKKyFVZsVV7JEfiop-zegjPRa87H-gbtDwvig> <xmx:3U9mXSYRlqQntZKsrICqUrbjX2bwOfvBXKoHUyEZJRUQ9rcOp3I-Hg>
Received: from [192.168.1.3] (ab005020.dynamic.ppp.asahi-net.or.jp [183.76.5.20]) by mail.messagingengine.com (Postfix) with ESMTPA id F0A7CD6005D; Wed, 28 Aug 2019 05:56:43 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Florian Rivoal <florian@rivoal.net>
In-Reply-To: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com>
Date: Wed, 28 Aug 2019 18:56:40 +0900
Cc: ltru@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <910CB6C8-9F66-4255-B149-B146DA8E5695@rivoal.net>
References: <20190827104755.665a7a7059d7ee80bb4d670165c8327d.0f79efb126.wbe@email03.godaddy.com>
To: Doug Ewell <doug@ewellic.org>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ltru/8rczTT5G1vz5gJdAm42jkh5njws>
Subject: Re: [Ltru] Mail regarding draft-ietf-ltru-4646bis and draft-ietf-ltru-matching
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ltru/>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Aug 2019 09:56:50 -0000


> On Aug 28, 2019, at 2:47, Doug Ewell <doug@ewellic.org> wrote:
> 
> On July 27, Florian Rivoal wrote:
> 
>> However, RFC5646 Section 4.5, which defines canonicalization, only
>> does so for language tags, not for language ranges. Presumably, the
>> process is largely the same, with wildcards in the language subtag
>> being preserved, and I suppose wildcards in other subtags would likely
>> be dropped. But as it stands, that seems undefined.
> 
> I think you are on the right track by assuming that ranges are
> canonicalized just like tags, with asterisks left alone.

Thanks for confirming.

> It's not very likely that most LTRU participants will be eager to start
> up a new IETF project to update 5646 for something like this. Best to go
> with your assumption.
> 
>> Also, while giving recommendations about canonicalization for the
>> purpose of filtering, it would seem useful to mention (and possibly to
>> recommend) canonicalizing to the "extlang form". The definition of the
>> extlang form itself (in  RFC5646 Section 4.5) mentions that it is
>> useful for matching and selecting, but that information isn't relayed
>> anywhere RFC4647.
> 
> At the time these documents were written, there was a strong sentiment
> around de-emphasizing extlangs in general. It's good to know that
> there's a real-world use case for using them here. Again, it's unlikely
> that people will want to rev 4647 for this.

The use case is CSS selectors, when writing rules for typography/styling in a document. On the one hand, the document gets marked up which part of it are in which language. On the other side, the style sheet describes which part of the document must be styled which way, and can make that styling dependent on the language.

The need for normalization comes from the fact that stylesheet authoring and document authoring are not coordinated in the general case, so a stylesheet author cannot know, generally speaking, if a document will be marked up with, for example, zh-yue or yue. The stylesheet author is then faced with two options, both unattractive for different reasons:
* use the deprecated tag: it's more likely to be found in existing documents due to being older. The first downside is that it doesn't always work. The second one is that this slows down adoption of the newer preferred tag, as document authors wanting to be compatible with existing stylesheets will keep on using the deprecated one as well for compatibility, and we get into a vicious cycle of everybody continuing to use the deprecated variant.

* Use both the deprecated and the preferred tag in the stylesheet's selector. This works, but it means that stylesheet authors need to be aware of, and manually replicate the information in https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry. Asking people to manually do what software could isn't great, as they tend not to, or to do it with bugs, or to not update when the registry updates, etc.

So it seems preferable, given that this correspondence is maintained in a neatly usable format, to have CSS renderers deal with the correspondence between deprecated and preferred tags by way of canonicalizing to the extlang form and doing the selector matching on that.

In the long run, both document authors and stylesheet authors should use the preferred tag without the extlang prefix, and the canonicalization to extang form will be invisible to them. But even if some don't, everything works.

—Florian