Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP
John C Klensin <john-ietf@jck.com> Wed, 05 February 2020 02:32 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A8A6C1200F5 for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 18:32:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3lrLp82y3sTH for <i18ndir@ietfa.amsl.com>; Tue, 4 Feb 2020 18:32:41 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 55EA0120045 for <i18ndir@ietf.org>; Tue, 4 Feb 2020 18:32:41 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1izAUR-000NCB-V7; Tue, 04 Feb 2020 21:32:31 -0500
Date: Tue, 04 Feb 2020 21:32:25 -0500
From: John C Klensin <john-ietf@jck.com>
To: Patrik Fältström <patrik@frobbit.se>
cc: John R Levine <johnl@taugh.com>, i18ndir@ietf.org
Message-ID: <3E143C646E27AEB48F08B065@PSB>
In-Reply-To: <E3DA1665-DB13-46D2-9212-33E647D92716@frobbit.se>
References: <20200203173404.88EE813AA055@ary.qy> <E2361F8BA970A15043416C2D@PSB> <alpine.OSX.2.21.99999.374.2002031653540.31381@ary.qy> <D03AE38116EF15538E10CFAF@PSB> <7D31FE0A-D4EC-4096-83FE-97D2BF4908F5@frobbit.se> <alpine.OSX.2.21.99999.374.2002041007110.33467@ary.qy> <47AEE7D582019051ACF36647@PSB> <alpine.OSX.2.21.99999.374.2002041149130.34062@ary.qy> <4A65258034E64E1A97EFDF7A@PSB> <E3DA1665-DB13-46D2-9212-33E647D92716@frobbit.se>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/e74aCFNPChYfJprGLuIMQLA5kog>
Subject: Re: [I18ndir] Working Group Last Call: Structured Headers for HTTP
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2020 02:32:44 -0000
--On Tuesday, February 4, 2020 23:35 +0100 Patrik Fältström <patrik@frobbit.se> wrote: > On 4 Feb 2020, at 20:24, John C Klensin wrote: > >>> I think we agree, normalization is a level up from what >>> they're describing here. >> >> Ack. > > Ok, I have tried to really understand what they are doing > here, and it feels like if Unicode was pasted in after they > had designed the whole thing. Indeed. It feels as if they started with "all ASCII" and then someone said "well, how about cases that need non-ASCII characters", and they responded with a lot of circling around, much of which amounts to "just use UTF-8". That is something we know is almost always inadequate or worse and often a sign of deeper problems and lack of thought. > Look for example at B.1 when they compare with JSON. They say > one advantage of their format is that JSON do allow Unicode > data which gives interoperability issues, but they do allow it > themselves as well. > > Then this is 3.3.3: > >> Unicode is not directly supported in strings, because it >> causes a number of interoperability issues, and - with few >> exceptions - header values do not require it. > > How do they know header values do not need it -- with a few > exceptions? > >> When it is necessary for a field value to convey non-ASCII >> content, a byte sequence (Section 3.3.5) SHOULD be specified, >> along with a character encoding (preferably [UTF-8]). > > I think that the default encoding MUST be UTF-8 OR specified > explicitly. > > I further think it should be noted comparison of parameter > values is NOT specified in this base specification as > normalization might create non-interoperability. If needed, > the specification of the header must say how comparison is > managed. > > I also think 4.2.7 should mention the resulting sequence of > the parsing of a binary structure might be a UTF-8 encoded > string with a reference to 3.3.3. > > I.e. I find it hard to read (and understand why I > misunderstood this at first) that the only mentioning of > Unicode and UTF-8 is in "string" but the only thing it does is > to reference byte sequence, which in turn do never talk about > it. Neither at serialization or deserialization. So where > UTF-8 strings are described thet are not mentioned. > > But ok...you are right. Right about what? We may be in agreement after all. I think there are at least three different issues here. (1) I took Martin's note, especially the "in this day and age" part, as suggesting that the document should encourage, rather than discourage, non-ASCII strings in header field values. I suggested that, in most cases, discouraging such values was entirely appropriate although it was reasonable to make provision for them in special cases. (2) The we got to the question of whether non-ASCII strings should be allowed in header field names at all. I think we (at least you, John Levine, and me) are in agreement that it almost certainly a bad idea. (3) In the situations that non-ASCII characters are allowed in header field values, we are in agreement that the description is inadequate in multiple ways. They include that the spec should either require UTF-8 everywhere (e.g., "encoding MSUT be Unicode in UTF=8") or the description / container syntax must specify how and in what syntax the encoding (or "charset') is specified. They also include a requirement that any header files defined according to this spec, at least ones that allow non-ASCII characters MUST be required to include in those definitions. Those definitions would be required to have either an "Internationalization Considerations" section or or equivalent. And that section MUST either explain why comparison, searching, or ordering of all or part of the header field value is never going to be required or describe the mechanisms or conventions needed to carry out the operations. For (3), I first understood you to be suggesting a discussion of, e.g., normalization in the container/ format document that would apply to all header fields defined under it. I think that would be wrong because different header fields and their values might require different treatment. However, I certainly agree that information has to be provided somewhere and that this document is deficient in saying as little as it does about strings, byte sequences, and non-ASCII characters without spelling out the requirement for that type of information to be specified somewhere. I also think that the IETF must never again produce a technical specification or BCP that deal with or allows non-ASCII characters by saying the equivalent of "just use UTF-8". Does that put us closer together? john
- [I18ndir] Fwd: Working Group Last Call: Structure… Martin J. Dürst
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John Levine
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John R Levine
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John R Levine
- Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
- Re: [I18ndir] Working Group Last Call: Structured… John R Levine
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… John R Levine
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
- Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
- Re: [I18ndir] Working Group Last Call: Structured… John R Levine
- Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
- Re: [I18ndir] Working Group Last Call: Structured… John R Levine
- Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
- Re: [I18ndir] Working Group Last Call: Structured… Asmus Freytag
- [I18ndir] Fwd: Re: Working Group Last Call: Struc… Asmus Freytag
- Re: [I18ndir] Fwd: Re: Working Group Last Call: S… John C Klensin
- Re: [I18ndir] Fwd: Re: Working Group Last Call: S… Asmus Freytag (c)
- Re: [I18ndir] Fwd: Re: Working Group Last Call: S… John C Klensin
- Re: [I18ndir] Fwd: Re: Working Group Last Call: S… Asmus Freytag (c)