Determining which fields are structured

Justin Richer <jricher@mit.edu> Wed, 17 November 2021 16:24 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E54573A0652 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 17 Nov 2021 08:24:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.647
X-Spam-Level:
X-Spam-Status: No, score=-2.647 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id chcLc91a5GKY for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 17 Nov 2021 08:24:24 -0800 (PST)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 68DEA3A07E3 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 17 Nov 2021 08:24:23 -0800 (PST)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1mnNh9-00058r-Jt for ietf-http-wg-dist@listhub.w3.org; Wed, 17 Nov 2021 16:21:59 +0000
Resent-Date: Wed, 17 Nov 2021 16:21:59 +0000
Resent-Message-Id: <E1mnNh9-00058r-Jt@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <jricher@mit.edu>) id 1mnNh7-00057v-GJ for ietf-http-wg@listhub.w3.org; Wed, 17 Nov 2021 16:21:57 +0000
Received: from outgoing-auth-1.mit.edu ([18.9.28.11] helo=outgoing.mit.edu) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <jricher@mit.edu>) id 1mnNh3-0003Dl-Nt for ietf-http-wg@w3.org; Wed, 17 Nov 2021 16:21:57 +0000
Received: from smtpclient.apple (static-71-174-62-56.bstnma.fios.verizon.net [71.174.62.56]) (authenticated bits=0) (User authenticated as jricher@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 1AHGLgWG017016 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <ietf-http-wg@w3.org>; Wed, 17 Nov 2021 11:21:42 -0500
From: Justin Richer <jricher@mit.edu>
Content-Type: multipart/alternative; boundary="Apple-Mail=_53825D9F-D6E7-4102-9686-CD4D5F8E8639"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Message-Id: <68C54E31-A5F0-4E67-8FEA-0F555518DE5C@mit.edu>
Date: Wed, 17 Nov 2021 11:21:41 -0500
To: HTTP Working Group <ietf-http-wg@w3.org>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-W3C-Hub-Spam-Status: No, score=-7.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1mnNh3-0003Dl-Nt 8ff6f2698e70906addf1765887b2da86
X-Original-To: ietf-http-wg@w3.org
Subject: Determining which fields are structured
Archived-At: <https://www.w3.org/mid/68C54E31-A5F0-4E67-8FEA-0F555518DE5C@mit.edu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/39580
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

The question at hand: how do you know if a particular field is supposed to be structured or not?

While working on an implementation for HTTP Signatures, I put in a bit of code that simply tries to parse any field as a Dictionary, List, or Item, and if it doesn’t throw an error, marks it as whatever kind of structured field worked. This seemed to work for the most part, but I quickly hit one case that surprised me:

	Host: example.org

This field parsed as sf-dictionary, which I wasn’t expecting at all because it doesn’t look or feel like a dictionary item. However, after talking with Mark Nottingham, it turns out that this fits the ABNF just fine: it’s a valid single-key dictionary with one key of “example.org <http://example.org/>” and no value, which is interpreted as a boolean “True” value. 

So with that I’d like to re-assert my support for the “Retrofit of Structured Fields for HTTP” draft: https://www.ietf.org/archive/id/draft-nottingham-http-structure-retrofit-00.html <https://www.ietf.org/archive/id/draft-nottingham-http-structure-retrofit-00.html>

In particular, if the WG picks up this document, I would also like to see us add a column to the HTTP Field Registry for SF type, and register all of the existing ones that we know work as different fields (leaving unknown or undefined ones as blank, maybe? Or something like “unstructured” if we know it’s specifically not meant to be structured, like Host): https://www.iana.org/assignments/http-fields/http-fields.xhtml <https://www.iana.org/assignments/http-fields/http-fields.xhtml>

This resource would help code like mine, as I’d be able to pull in that table and have some sense of what to expect when trying to parse a given field.

This would also help push the goal of having any new fields be built using structured field types — new field definitions would be required to fill out that column when they register the field name.

 — Justin