Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP

John C Klensin <john@jck.com> Tue, 04 February 2020 16:27 UTC

Return-Path: <john@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C712512082A for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 08:27:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NLNwWrLgD6CE for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 08:27:15 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 83689120289 for <i18ndir@ietf.org>; Tue, 4 Feb 2020 08:27:15 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john@jck.com>) id 1iz12b-000MCV-MR; Tue, 04 Feb 2020 11:27:09 -0500
Date: Tue, 04 Feb 2020 11:27:04 -0500
From: John C Klensin <john@jck.com>
To: John R Levine <johnl@taugh.com>, Patrik Fältström <patrik@frobbit.se>
cc: i18ndir@ietf.org
Message-ID: <47AEE7D582019051ACF36647@PSB>
In-Reply-To: <alpine.OSX.2.21.99999.374.2002041007110.33467@ary.qy>
References: <20200203173404.88EE813AA055@ary.qy> <E2361F8BA970A15043416C2D@PSB> <alpine.OSX.2.21.99999.374.2002031653540.31381@ary.qy> <D03AE38116EF15538E10CFAF@PSB> <7D31FE0A-D4EC-4096-83FE-97D2BF4908F5@frobbit.se> <alpine.OSX.2.21.99999.374.2002041007110.33467@ary.qy>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/KQqjpmwXJ0zWBxfIqUSdwlfhdJ4>
Subject: Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2020 16:27:17 -0000


--On Tuesday, February 4, 2020 10:08 -0500 John R Levine
<johnl@taugh.com> wrote:

> On Tue, 4 Feb 2020, Patrik Fältström wrote:
>>> And omission of a clear model for specifying what is really
>>> what we call a "charset" elsewhere (not just an encoding) is
>>> the one clear defect in that particular discussion in the
>>> document.
>> 
>> Don't forget normalization or otherwise how comparison is to
>> be done of the UTF-8 encoded characters in the string.
>> Nothing about that either.
> 
> NFC is specific to UTF-8.  MIME doesn't have that as far as I
> know so I don't think it makes sense here.

Huh?   NFC is certainly specific to Unicode, but there is
nothing specific to UTF-8 (or any other encoding form) about it.
My highly abstracted (but I hope not far off the mark) view of
Patrik's comments is that, if one is going to compare or sort
strings --or do much else with them other than maybe displaying
them -- is that one needs a plan other than the equivalent of
"just use UTF-8".   Fairly recent W3C discussions, a few around
ICANN, and comments in TUS about script-specific or
language-specific rendering, strongly suggest that a more
specific plan is needed even to display strings reasonably in
some scripts and circumstances.

Note that this is another example of a situation in which the
problem exists even with basic Latin characters and moving
beyond them just makes the situation more important or worse or
the examples more dramatic.  The obvious case involves sorting
strings into alphabetic order: one has to decide what to do
about case mapping or ordering, what to do about spaces or other
punctuation, and so on.

Where Patrik and I disagree (if we disagree at all - I may have
misread his intent) is whether this particular document needs to
say anything on the subject or anything beyond "definitions for
individual header fields need to include an analysis and
guidance for those issues".

I also think that, in our capacity of giving advice to the ART
ADs, there is an important lesson here and that is that _any_
document that even hints at non-ASCII characters, even if the
hint is nothing more than "don't do that unless you have to"
MUST get serious review by experts in i18n issues, not just the
WG that happens to be writing the spec.  That, in turn, almost
certainly implies the need for an organized and functioning
directorate, not just one to which one of those on the list can
post a note (as Martin did) and hope for responses from the
usual suspects.  Or, if they can't make such a directorate work,
either some other plan to accomplish the same thing or explicit
acknowledgment that IETF work on these is not getting
sufficient, and sufficiently skilled, review to be trusted.

Watch for what may be a more complicated example soon.

   john