Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP

John C Klensin <john-ietf@jck.com> Tue, 04 February 2020 19:24 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2012A120847 for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 11:24:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lRK-xsQpxI1A for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 11:24:41 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CBCBD1201EF for <i18ndir@ietf.org>; Tue, 4 Feb 2020 11:24:41 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1iz3oM-000MUv-QT; Tue, 04 Feb 2020 14:24:38 -0500
Date: Tue, 04 Feb 2020 14:24:33 -0500
From: John C Klensin <john-ietf@jck.com>
To: John R Levine <johnl@taugh.com>
cc: i18ndir@ietf.org
Message-ID: <4A65258034E64E1A97EFDF7A@PSB>
In-Reply-To: <alpine.OSX.2.21.99999.374.2002041149130.34062@ary.qy>
References: <20200203173404.88EE813AA055@ary.qy> <E2361F8BA970A15043416C2D@PSB> <alpine.OSX.2.21.99999.374.2002031653540.31381@ary.qy> <D03AE38116EF15538E10CFAF@PSB> <7D31FE0A-D4EC-4096-83FE-97D2BF4908F5@frobbit.se> <alpine.OSX.2.21.99999.374.2002041007110.33467@ary.qy> <47AEE7D582019051ACF36647@PSB> <alpine.OSX.2.21.99999.374.2002041149130.34062@ary.qy>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/TATOCsTNPAQNLjXQW0g-sLp2bCE>
Subject: Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2020 19:24:44 -0000


--On Tuesday, February 4, 2020 11:51 -0500 John R Levine
<johnl@taugh.com> wrote:

> On Tue, 4 Feb 2020, John C Klensin wrote:
>>> NFC is specific to UTF-8.  MIME doesn't have that as far as I
>>> know so I don't think it makes sense here.
>> 
>> Huh?   NFC is certainly specific to Unicode, but there is
>> nothing specific to UTF-8 (or any other encoding form) about
>> it.
> 
> Sorry, I meant specific to Unicode.  If there is some old page
> in big5 or ISO-2022-x, there aren't any normalization rules
> I'm aware of.

IIR, Big5 would not need one because it is strictly Traditional
Chinese Character.  As soon as one has something that would
contain both Traditional one for which Simplified forms exist
and those Simplified forms, one _may_ need a mechanism for
comparing them equal, but that wouldn't be normalization in the
Unicode sense either [1].   If by "ISO-2022-x" you mean either
ISO 8859/x or code pages that can be accessed or switched by ISO
2022 methods, definitely no normalization for the first case and
probably not the latter because at least most of them were
designed to avoid the problems (like composing character
sequences) that normalization addresses.

> I think we agree, normalization is a level up from what
> they're describing here.

Ack.

   best,
    john

[1] In retrospect, Unicode's using, and effectively preempting,
the terms "normalization" and, especially, "canonicalization"
for very specific purposes in the context of their standard
because it makes any use of, for example, "canonical form" in
describing a string or identifier ambiguous as to whether
Unicode canonicalization (as in NFKC) or some other set of rules
was intended.  Having, or converting to, some "normal form"
isn't much better.  Nothing can be done about that now, but it
means we should remind ourselves that we need to be very careful
to be clear when we use of those terms or similar ones.