Re: AD Review of draft-ietf-httpbis-sfbis-05

Mark Nottingham <mnot@mnot.net> Sun, 11 February 2024 23:52 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=ietf.org@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 23884C14F5F7 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 11 Feb 2024 15:52:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.857
X-Spam-Level:
X-Spam-Status: No, score=-2.857 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=w3.org header.b="Gsh7SgI6"; dkim=pass (2048-bit key) header.d=w3.org header.b="V1GBdjhf"; dkim=pass (2048-bit key) header.d=mnot.net header.b="E+lP0oxb"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="d+a/PJqP"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I4_4C-Hw4Sv4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 11 Feb 2024 15:52:19 -0800 (PST)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 337AEC14F618 for <httpbisa-archive-bis2Juki@ietf.org>; Sun, 11 Feb 2024 15:52:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=Subject:To:References:Message-Id:Cc:Date:In-Reply-To:From: Mime-Version:Content-Type:Reply-To; bh=KOVw0gK9X6DelllHZgh4xnj6tO/eDCrft5LNkqjivvs=; b=Gsh7SgI6xGv3HCHf/041xVJfWt QWsocJvwA79bLJNLHd6PexkkqltaX6tOecrKbfKpHzqrM3Jf3UQVbkEQ1R9NstmJgTzpjAieQCigZ 0iCMeELS+qdfi69dNBwOYH6m+Fj2ptzKDmSc0N3U347rG7LheSwRo7BBC5b2W6LyfQATOdchP7Ui1 d1QpVzaC9JWFcJIUJtzQVl90qPyIvoAOPLqkePR5gff6bm0pAHNzXVF9t/aqqBDA/AX6avMMm4i49 YBcNLgUcJirkq9YKCymqhjlie/kSia2dNhxc8zR9mJgGsDSTL5il0+Wma61rJijIrmXfClEMsujgU gLKHijQw==;
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1rZJc4-00FX43-RY for ietf-http-wg-dist@listhub.w3.org; Sun, 11 Feb 2024 23:51:56 +0000
Resent-Date: Sun, 11 Feb 2024 23:51:56 +0000
Resent-Message-Id: <E1rZJc4-00FX43-RY@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <mnot@mnot.net>) id 1rZJc3-00FX2w-DJ for ietf-http-wg@listhub.w3.org; Sun, 11 Feb 2024 23:51:55 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=To:References:Message-Id:Cc:Date:In-Reply-To:From:Subject: Mime-Version:Content-Type:Reply-To; bh=KOVw0gK9X6DelllHZgh4xnj6tO/eDCrft5LNkqjivvs=; t=1707695515; x=1708559515; b=V1GBdjhfYd3qalToZKc2gn6+MElmPS5WFc7kSgoYGrC0NFLQWPcTJOBQWFgvUE8toH9BOBjgYy8 oCgb1bJbOWW8vPszuzQsrnva5rzct+bSA1SQMmAkazRYwvdmCeS1N94BDIj1HLT8/X4i7g/ArVx0j 4/1G76T+3AUNT2WvT97pRbkIfWPIK6M1g4o0zdkwbQfc61yayLS9dJHoPRrBBBzAjYayQC+oQVRvP u7iBtnsHxMr/168rqwCbgKrW+HZoiRAp5kMM+9CFNoK9NVEjnJyzFDSe7ZvyrvV7qAwNbEVkPhr18 v/D9dyLLfzOXHOrvfO3SYwsXxvlcjF2iQhKw==;
Received: from out5-smtp.messagingengine.com ([66.111.4.29]) by titan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <mnot@mnot.net>) id 1rZJc0-00FW7I-RA for ietf-http-wg@w3.org; Sun, 11 Feb 2024 23:51:55 +0000
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 4C67C5C0038; Sun, 11 Feb 2024 18:51:49 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Sun, 11 Feb 2024 18:51:49 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mnot.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1707695509; x=1707781909; bh=KOVw0gK9X6DelllHZgh4xnj6tO/eDCrft5LNkqjivvs=; b= E+lP0oxbrsVO6h77PqvNxtChD885CwYOpyliKXJSjF98csylj7gdqCbA4543scE5 poH1h070ACSvT+WxJNPjq+BhoHVIWR0PZ0B6OiErBD0szJCDLyJu7ncq6y6YLqox cpzbJxgJHM0SFRuNfp92E1rEuUXm2XeWlr2IyFNPYZ8pRxjEyLLLyWVXXQCV8zFI Al5awwURVU4rSVxrCNzuLN9d840oPrz2mszqKeuuy/QHuj2DInfs5zWnvTgZoQbT i3SR+9jE9mLstS1mCHtiP7PtxLkAkyHhgwr72x/qPD3oK6UptKprb7ZtpjZ9eqrh GbqSXhMNBfbBz3lAYYoPzA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1707695509; x= 1707781909; bh=KOVw0gK9X6DelllHZgh4xnj6tO/eDCrft5LNkqjivvs=; b=d +a/PJqPF85wxKgSrty1J6ePr2boZOTVQXVg1t6LjBtCQn9dxGhnxCu5ChggyNiUB jPO+HZyL0w+SmqE2Hm2vqU3MZrcRIe7UgNeHsrkUHEUXUm0FsV1xs1pCEXkMYEmw DOOtQLt5OG1gNEGK2idJj9BZPVqRMl3OwFLZEUOh5WVCZQuXU3WXyrX3s3xDezKL H6Vy0KNfSyXNPChL8uHA33ZKgbxF10UV2SXBb3RItBp3nEx+rDmONdvPy7BGlwff axyzKvAG4S/YnfoX99bCRy5iaeEtA+/lfttUnp22r4IWHaNsnYrcqNUHbuP+CvoL 1KvIBCVOjI1PbYIZBOY3w==
X-ME-Sender: <xms:lF3JZTrF00aV2nwwil0lkRGHKGfHcmpSrz0nXyC_icRN3qKdvsNFSQ> <xme:lF3JZdoxy9BHAdO30v5FsSL_Gvq0hLGDfVrX6S_HUpvMHuLIK4ODzDvdab5gP6sPN eXe7kiW0savhm8aOQ>
X-ME-Received: <xmr:lF3JZQNupm1rQcJj7LScs9UlIdUW3qU6Vyx0EdtwSHLHXFIAtZNlnbrEsv5eXCr6uF2TZ4ESBUHaAc55hSHwmh5FJQbvqkNOETy-oxwxL8MlU7XiWMAFk9Fo>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledruddvgdduhecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpegtggfuhfgjffevgffkfhfvofesthhqmhdthhdtjeenucfhrhhomhepofgrrhhk ucfpohhtthhinhhghhgrmhcuoehmnhhothesmhhnohhtrdhnvghtqeenucggtffrrghtth gvrhhnpeevfeefkeegvdefudelkeegvdfhjeejteetfeektdeggfdvheeukefgffelteeg ieenucffohhmrghinhepihgvthhfrdhorhhgpdhrfhgtqdgvughithhorhdrohhrghdpuh hnihgtohguvgdrohhrghdpmhhnohhtrdhnvghtnecuvehluhhsthgvrhfuihiivgeptden ucfrrghrrghmpehmrghilhhfrhhomhepmhhnohhtsehmnhhothdrnhgvth
X-ME-Proxy: <xmx:lF3JZW71WoAXuzotqQ9JDzFd-2FuqHhxxrOZdtydi-px7abMuYkCuQ> <xmx:lF3JZS4Y50Bx-caWzNFYft8Qqo-HB4C1FWoWCJoyMHVqmolXmePOCg> <xmx:lF3JZeg7bfE2R_3PCSOpSSb9oOVy78_Oj7NwFd1-FwcdG65BebYXIg> <xmx:lV3JZd3t58Y-3RYNHEAdiQh-syw0Vxk_xuOtLUt4nDOZrpN41tOmeA>
Feedback-ID: ie6694242:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 11 Feb 2024 18:51:46 -0500 (EST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.300.61.1.2\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <a5d724bd-d84d-4207-8bfd-3d18f1018cf0@it.aoyama.ac.jp>
Date: Mon, 12 Feb 2024 10:51:42 +1100
Cc: Carsten Bormann <cabo@tzi.org>, Francesca Palombini <francesca.palombini@ericsson.com>, "draft-ietf-httpbis-sfbis@ietf.org" <draft-ietf-httpbis-sfbis@ietf.org>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <751653E8-F2B4-47ED-B4E8-414945EA6D71@mnot.net>
References: <AM0PR07MB6019C5D8DF60CE53F27E0722987E2@AM0PR07MB6019.eurprd07.prod.outlook.com> <56617A72-D775-41DC-88E8-3A82DC5225C7@tzi.org> <5614883D-76A9-4157-A9B6-694AB1B5FB63@mnot.net> <7E5DCED8-557C-459D-A80F-B47BF3D09998@tzi.org> <a5d724bd-d84d-4207-8bfd-3d18f1018cf0@it.aoyama.ac.jp>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
X-Mailer: Apple Mail (2.3774.300.61.1.2)
Received-SPF: pass client-ip=66.111.4.29; envelope-from=mnot@mnot.net; helo=out5-smtp.messagingengine.com
X-W3C-Hub-DKIM-Status: validation passed: (address=mnot@mnot.net domain=mnot.net), signature is good
X-W3C-Hub-DKIM-Status: validation passed: (address=mnot@mnot.net domain=messagingengine.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-10.8
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=-1, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1rZJc0-00FW7I-RA 25a7968d081e8528e526b72dcbed0e5a
X-Original-To: ietf-http-wg@w3.org
Subject: Re: AD Review of draft-ietf-httpbis-sfbis-05
Archived-At: <https://www.w3.org/mid/751653E8-F2B4-47ED-B4E8-414945EA6D71@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51767
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/email/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Thanks, Martin. I'm happy to incorporate that approach if others don't have objections.

Cheers,


> On 2 Feb 2024, at 11:38 am, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote:
> 
> Hello Mark, others,
> 
> On 2024-02-01 17:02, Carsten Bormann wrote:
>> On 2024-02-01, at 08:05, Mark Nottingham <mnot@mnot.net> wrote:
>>> 
>>> Hi Carsten!
>>> 
>>>> On 30 Jan 2024, at 2:38 am, Carsten Bormann <cabo@tzi.org> wrote:
>>>> 
>>>> On 2024-01-29, at 15:41, Francesca Palombini <francesca.palombini@ericsson.com> wrote:
>>>>> 
>>>>> What parts of [I-D.draft-bray-unichars] is the reader supposed to look at? Or if it is the whole document, could we have some context around it?
>>>> 
>>>> It seems that sfbis refers to Unicode codepoints where it should have referred to Unicode scalar values (what are said to be codepoints now, need to allow encoding in UTF-8, which only applies to Unicode scalar values).
>>> 
>>> People seem to have strong and conflicting beliefs about the correct terminology here -- others have asserted the opposite in my recollection.
> 
> First, the strictly correct answer is that it doesn't matter; both terms would lead to the same result (assuming people read the specs). The reason why is that the spec says, in 4.1.11. Serializing a Display String (https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-sfbis-05#name-serializing-a-display-strin), point 2:
> 
> Let byte_array be the result of applying UTF-8 encoding (Section 3 of [UTF8]) to input_sequence. If encoding fails, fail serialization.
> 
> [UTF8] (RFC 3629) then says:
> 
>   The definition of UTF-8 prohibits encoding character numbers between
>   U+D800 and U+DFFF, which are reserved for use with the UTF-16
>   encoding form (as surrogate pairs) and do not directly represent
>   characters.  When encoding in UTF-8 from UTF-16 data, it is necessary
>   to first decode the UTF-16 data to obtain character numbers, which
>   are then encoded in UTF-8 as described above.
> 
> The net result of this is that if there are any non-Unicode scalar value codepoints, serialization will just fail.
> 
> However, not taking the assumption that people will read the specs (always a safe bet), I'd suggest adding a short note, maybe as follows:
> 
> Please note that [UTF8] prohibits the encoding of codepoints between U+D800 and U+DFFF (surrogates).
> 
> [short aside: It took me a while to figure out that section 3.3.8, entitled "Display Strings", didn't actually specify display strings, but was just a quick intro to display strings. To help future readers, I'd at a minimum change "3. Structured Data Types" to "3. Overview of Structured Data Types" or some such. Also, a pointer to later sections at the start of section 3 would be appreciated.]
> 
> Hope this helps,    Martin.
> 
> 
>>> So I'm afraid that before I'm willing to change the spec (again) I need see a reference supporting any assertions, and agreement on its interpretation.
>> Hi Mark,
>> as sfbis is based on UTF-8, your main reference should be STD63, specifically  RFC3629 [1].
>> Obviously, UTF-8 is based on Unicode standardization work, so the other reference is [2].
>> [1]: https://www.rfc-editor.org/rfc/rfc3629.html
>> [2]: https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf
>> Unicode terminology is sometimes confusing, and it doesn’t help that at the time RFC 3629 was written, there wasn’t a term defined for what the Unicode consortium now clumsily calls “Unicode scalar values”: the set of Unicode characters that Unicode encoding forms (nee Unicode transformation formats) such as UTF-8 can encode.  See this definition (page 119 of [2]:)
>> D76
>>   Unicode scalar value: Any Unicode code point except high-surrogate and low-surrogate code points.
>>   As a result of this definition, the set of Unicode scalar values consists of the ranges 0 to D7FF(16) and E000(16) to 10FFFF(16), inclusive.
>> When Unicode created the term “Unicode scalar values”, they thought that they could not use the more natural wording “Unicode characters” because Unicode scalar values include some code values that are called “non-characters” (*) in Unicode…  “Unicode characters” is still what most people would understand and what I therefore tend to use in informal conversation.
>> The term "Unicode code points” encompasses the Unicode scalar values as well as some code points that are used inside UTF-16 only.  Before “Unicode scalar values” was defined, “Unicode code points" was used often in its place because it is the encompassing concept, and often still is used because “Unicode scalar values” is so clumsy or simply because documentation is created by copying from old sources.
>> This seems all pretty obvious, until you encounter the problem that a number of platforms are living on a legacy character model that was created as a transition strategy from the original pure 16-bit Unicode they adopted early on.  Applications what work in this space tend to leak out UTF-16 internals, causing a lot of pain [3].  For interchange, we could (and should) ignore that, except that there are people who are convinced that we should share that pain.
>> RFC 3629 [1] calls out specifically that the Unicode code points that are not Unicode scalar values (today’s words) cannot be encoded in UTF-8 on page 5 (mid of Section 3, [4]).
>> To minimize the confusion (and to reduce the number of hooks that the pain-sharers can use to muddy the issue) a standard like yours should try to avoid the generalism “Unicode code points” and talk about “Unicode scalar values” throughout, possibly after copying D76.
>> [3]: https://www.ietf.org/archive/id/draft-bormann-dispatch-modern-network-unicode-03.html#name-history-legacy
>> [4]: https://www.rfc-editor.org/rfc/rfc3629.html#page-5
>> Grüße, Carsten
>> (*) There is lots of structure in the range covered by Unicode scalar values.  A specification that is not intricately bound to those details, but really mostly wants to encode Unicode, is best off to simply use the stable term “Unicode scalar values” in its explanations and ignore those details, which are evolving.


--
Mark Nottingham   https://www.mnot.net/