Re: Determining which fields are structured

Justin Richer <jricher@mit.edu> Fri, 19 November 2021 15:09 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2D4323A03AA for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 19 Nov 2021 07:09:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.648
X-Spam-Level:
X-Spam-Status: No, score=-2.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yun5DCORZq33 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 19 Nov 2021 07:09:22 -0800 (PST)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 209023A0365 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 19 Nov 2021 07:09:21 -0800 (PST)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1mo5Tc-00032Q-1i for ietf-http-wg-dist@listhub.w3.org; Fri, 19 Nov 2021 15:06:56 +0000
Resent-Date: Fri, 19 Nov 2021 15:06:56 +0000
Resent-Message-Id: <E1mo5Tc-00032Q-1i@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <jricher@mit.edu>) id 1mo5Ta-00031T-3q for ietf-http-wg@listhub.w3.org; Fri, 19 Nov 2021 15:06:54 +0000
Received: from outgoing-auth-1.mit.edu ([18.9.28.11] helo=outgoing.mit.edu) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <jricher@mit.edu>) id 1mo5TY-000067-2e for ietf-http-wg@w3.org; Fri, 19 Nov 2021 15:06:53 +0000
Received: from smtpclient.apple (static-71-174-62-56.bstnma.fios.verizon.net [71.174.62.56]) (authenticated bits=0) (User authenticated as jricher@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 1AJF6bMw009070 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 10:06:38 -0500
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
From: Justin Richer <jricher@mit.edu>
In-Reply-To: <773C8621-18CE-49DC-B8F4-1B4311282EDB@mnot.net>
Date: Fri, 19 Nov 2021 10:06:37 -0500
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C79AE57B-690F-4677-B4CF-23A858C56A44@mit.edu>
References: <68C54E31-A5F0-4E67-8FEA-0F555518DE5C@mit.edu> <773C8621-18CE-49DC-B8F4-1B4311282EDB@mnot.net>
To: Mark Nottingham <mnot@mnot.net>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-W3C-Hub-Spam-Status: No, score=-7.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1mo5TY-000067-2e f658a601f4b0bcba4b5ecca5225d3037
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Determining which fields are structured
Archived-At: <https://www.w3.org/mid/C79AE57B-690F-4677-B4CF-23A858C56A44@mit.edu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/39590
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi Mark,

> On Nov 17, 2021, at 5:32 PM, Mark Nottingham <mnot@mnot.net> wrote:
> 
> Hi Justin,
> 
>> On 18 Nov 2021, at 3:21 am, Justin Richer <jricher@mit.edu> wrote:
>> 
>> The question at hand: how do you know if a particular field is supposed to be structured or not?
> 
> We made an explicit decision that you'd have to have specific knowledge of the header in some way; you can't recognise whether a field is structured just by looking at it, nor can you tell which top-level type it is (list, dictionary, or item), necessarily. 

Absolutely — so this is why I think an explicit list would be really useful to pull from.

> 
>> While working on an implementation for HTTP Signatures, I put in a bit of code that simply tries to parse any field as a Dictionary, List, or Item, and if it doesn’t throw an error, marks it as whatever kind of structured field worked. This seemed to work for the most part, but I quickly hit one case that surprised me:
>> 
>> 	Host: example.org
>> 
>> This field parsed as sf-dictionary, which I wasn’t expecting at all because it doesn’t look or feel like a dictionary item. However, after talking with Mark Nottingham, it turns out that this fits the ABNF just fine: it’s a valid single-key dictionary with one key of “example.org” and no value, which is interpreted as a boolean “True” value.
> 
> Aside: the ABNF is only illustrative, the algorithms are normative.
> 
>> So with that I’d like to re-assert my support for the “Retrofit of Structured Fields for HTTP” draft: https://www.ietf.org/archive/id/draft-nottingham-http-structure-retrofit-00.html
>> 
>> In particular, if the WG picks up this document, I would also like to see us add a column to the HTTP Field Registry for SF type, and register all of the existing ones that we know work as different fields (leaving unknown or undefined ones as blank, maybe? Or something like “unstructured” if we know it’s specifically not meant to be structured, like Host): https://www.iana.org/assignments/http-fields/http-fields.xhtml
>> 
>> This resource would help code like mine, as I’d be able to pull in that table and have some sense of what to expect when trying to parse a given field.
>> 
>> This would also help push the goal of having any new fields be built using structured field types — new field definitions would be required to fill out that column when they register the field name.
> 
> I think that's reasonable. I could see adding an annotation as to whether the type is "native" (i.e., the field is specified as a structured field), or retrofit (i.e., it's not the normative parsing algorithm, but you might have some luck using it).
> 

I think that makes a lot of sense.

 — Justin


> Cheers,
> 
> 
> --
> Mark Nottingham   https://www.mnot.net/
>