Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP

John C Klensin <john-ietf@jck.com> Tue, 04 February 2020 16:06 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ADF1312084A for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 08:06:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iWaI36p6jRKY for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 08:06:35 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B8189120844 for <i18ndir@ietf.org>; Tue, 4 Feb 2020 08:06:35 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1iz0iY-000MAU-1d; Tue, 04 Feb 2020 11:06:26 -0500
Date: Tue, 04 Feb 2020 11:06:20 -0500
From: John C Klensin <john-ietf@jck.com>
To: Patrik Fältström <patrik@frobbit.se>
cc: John R Levine <johnl@taugh.com>, i18ndir@ietf.org
Message-ID: <E5B773EBE912789255643EEF@PSB>
In-Reply-To: <7D31FE0A-D4EC-4096-83FE-97D2BF4908F5@frobbit.se>
References: <20200203173404.88EE813AA055@ary.qy> <E2361F8BA970A15043416C2D@PSB> <alpine.OSX.2.21.99999.374.2002031653540.31381@ary.qy> <D03AE38116EF15538E10CFAF@PSB> <7D31FE0A-D4EC-4096-83FE-97D2BF4908F5@frobbit.se>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/n3-29BpdG4LtJguncehSZsf2Kis>
Subject: Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2020 16:06:38 -0000


--On Tuesday, February 4, 2020 13:07 +0100 Patrik Fältström
<patrik@frobbit.se> wrote:

> On 4 Feb 2020, at 2:30, John C Klensin wrote:
> 
>> --On Monday, February 3, 2020 17:05 -0500 John R Levine
>> <johnl@taugh.com> wrote:
>> 
>>> ...
>>> In section 3.3.3 it says that non-ASCII text is to be encoded
>>> as base64, which seems reasonable, "along with a character
>>> encoding (preferably UTF-8)" but it doesn't have any
>>> convention for where the encoding goes. Experience with MIME
>>> suggests that life is simpler if there's a standard form for
>>> an encoded text blob so libraries can decode and if need be
>>> transliterate in one go.
>> 
>> And omission of a clear model for specifying what is really
>> what we call a "charset" elsewhere (not just an encoding) is
>> the one clear defect in that particular discussion in the
>> document.
> 
> Don't forget normalization or otherwise how comparison is to
> be done of the UTF-8 encoded characters in the string. Nothing
> about that either.

The omission was deliberate, so I should explain.  Let me take
advantage of John Levine's observation about email headers, with
which we have much more experience.  Very loosely and
informally, there are two types of header field data - the stuff
that is highly structured like addresses (e.g., "From:", "To:",
and "Cc:") and maybe even "Date:" and the stuff, like "Subject:"
fields, that is basically free text.  For the latter, in most
cases one doesn't care whether it can be compared to other
strings or not -- the important thing is that it can be read by
the user and not be confusing.  Of course, someone might want to
sort or select on unstructured fields, but that may not impose
as strong a requirement.   For HTTP headers that contain
structured material, strict normalization and comparison rules
might be less important -- as I suggested to John, one place
where the analogy breaks down is that many or most HTTP headers
are closer to envelope information in email than they are to
user presentation information.  

What is more important is that, as The Unicode Standard has been
suggesting all along and more recent W3C work addresses in a
more refined way, in many or most cases (and even for more
structured fields), the right answer is to normalize (and apply
other rules as needed) only at comparison time, normalizing both
(all) strings to be compared, and not even bothering to store
the normalized strings after the comparison is completed.  I
still believe that we got IDN labels right by requiring that
both the stored form and any lookup keys be normalized and
structured early, but they are, for several reasons, an unusual
case.  

So, out of concern that anything that looked like "one size fits
all" might turn out to be closer to "...fits none", I decided to
avoid touching those issues in the context of this particular
document.  Maybe that is wrong but, if it is, I'd be inclined to
suggest that any document that specifies a new HTTP header field
type and that allows characters in the value/data that are not
natively ASCII MUST include an Internationalization
Considerations section that explicitly addresses these issues.

Does that make sense?  Do we think that sort of requirement is
needed and would be helpful?

   john