Re: draft-ietf-httpbis-header-structure: handling multiple field values

Ian Clelland <iclelland@google.com> Tue, 12 May 2020 18:48 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 85E373A093A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 May 2020 11:48:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.248
X-Spam-Level:
X-Spam-Status: No, score=-10.248 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HI62nszzRZKH for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 May 2020 11:48:09 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D2A053A0938 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 May 2020 11:48:09 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1jYZuC-0008GD-1r for ietf-http-wg-dist@listhub.w3.org; Tue, 12 May 2020 18:45:28 +0000
Resent-Date: Tue, 12 May 2020 18:45:28 +0000
Resent-Message-Id: <E1jYZuC-0008GD-1r@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <iclelland@google.com>) id 1jYZuB-0008FR-5q for ietf-http-wg@listhub.w3.org; Tue, 12 May 2020 18:45:27 +0000
Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by mimas.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <iclelland@google.com>) id 1jYZu9-0008FL-Dz for ietf-http-wg@w3.org; Tue, 12 May 2020 18:45:27 +0000
Received: by mail-ej1-x636.google.com with SMTP id o10so11893606ejn.10 for <ietf-http-wg@w3.org>; Tue, 12 May 2020 11:45:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Fymz+7Rc3lzSls5iMHLw6Wwz8bL6WupCuRI+91eZAcA=; b=BXGHDKmI9B8kbQ0ossykQPLt1GCWYHnYuC7ZrgJSfIXcDZGwtUbPs9fZ7OuZjxbREO cpqeaF6VoZ14d4Pzg3AGx2LhUzhO26tSt/bAZzvCGIAKSj2Mr5Y/EeopvO3Q4jAz47D1 ed1ml+XezUVS/k/4Dg5c4EUuhrR6NaD0kYcAzmrOpAshXcf1d5p6OVX10/7rsYeLvgFK 75CM4qC7inIJVCa2DxRXMgaHFPR30dMklKXpZbEWXl8XZlDXtOcsNplBhi2P7BkDNJrq TZDeQHuEdrdsxp7zhJR5yt+tjdBrp3U4VDfWlGU3rSZ4pCL2o4tYAH+4A3XT28iLnqFA /fiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Fymz+7Rc3lzSls5iMHLw6Wwz8bL6WupCuRI+91eZAcA=; b=CoYpDbKlDw53uY/jn1ZOoCEl4mwJ+TFDDIYClDmkNxI+iExpUNLDAlyve7fPIpPBsR 66UCS+J3DeUxFQwT3y/3ooJpGmlDWb/N6WpccmMjvjg9bi0Vu0wlW6UU5dpP84cfm3ui wgjVErn1y62nusBtDeBHT29c0CjLGlVfjJxpbCTKqlPAN09klCRIhJGYKmwEwtvwZPqO a4anbusJFqhTcJC1tU5Yy8uzoL1Jq2ByrPN9Mto4/BHCeMqkoBksNgYkavP9jl3t14Qz g/qV94sKXcvkEYayg0PRC2ZHHXZ+135dDwauBkOshS+Kaet4cLadDfK41/AbNktehB3N sHaw==
X-Gm-Message-State: AGi0PuaZXVpQl6wy7KQIa4hnfY38bAwlqjlQe+hCXFsDbweJA7QG6Mna H4NxboBvYNOyZ+MGNjD9GH99TeGRUz7pjhnHu1TSNA==
X-Google-Smtp-Source: APiQypJ1YIbQW6mKebNV/L/BQFKbLYaT/4vInE1YL64coBF1z3uwzvAkDtbro1jvCsch/Ja5qndMsQBjttzJIKdW2I4=
X-Received: by 2002:a17:906:5695:: with SMTP id am21mr13544714ejc.223.1589309113792; Tue, 12 May 2020 11:45:13 -0700 (PDT)
MIME-Version: 1.0
References: <f55521dd-e1d3-d925-688c-c472ad67bfb4@gmx.de> <20200512172347.GB4817@1wt.eu> <CAK_TSXJ3o7F9x63MSYyEhr7de0vO1Yu2s8JnjkhT7n4BQiQp+A@mail.gmail.com> <706ee02a-2ecc-6cce-0754-909d6b9f4edd@gmx.de>
In-Reply-To: <706ee02a-2ecc-6cce-0754-909d6b9f4edd@gmx.de>
From: Ian Clelland <iclelland@google.com>
Date: Tue, 12 May 2020 14:45:02 -0400
Message-ID: <CAK_TSXJxex1t32EnfPqYUKhTdqJZFbRf36_FLKJeP2Tqu7RXMg@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Willy Tarreau <w@1wt.eu>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="0000000000002b0dc205a577dd29"
Received-SPF: pass client-ip=2a00:1450:4864:20::636; envelope-from=iclelland@google.com; helo=mail-ej1-x636.google.com
X-W3C-Hub-Spam-Status: No, score=-21.6
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1jYZu9-0008FL-Dz c074bcab7ff8bae08760f2780cea6335
X-Original-To: ietf-http-wg@w3.org
Subject: Re: draft-ietf-httpbis-header-structure: handling multiple field values
Archived-At: <https://www.w3.org/mid/CAK_TSXJxex1t32EnfPqYUKhTdqJZFbRf36_FLKJeP2Tqu7RXMg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/37605
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Tue, May 12, 2020 at 1:47 PM Julian Reschke <julian.reschke@gmx.de>
wrote:

> On 12.05.2020 19:39, Ian Clelland wrote:
> > This is mentioned in
> >
> https://httpwg.org/http-extensions/draft-ietf-httpbis-header-structure.html#rfc.section.4.2
>  --
> > "parsers MUST combine all lines in the same section (header or trailer)
> > that case-insensitively match the field name into one comma-separated
> > field-value", (with the warning given that strings split across multiple
> > field values will have "unpredictable results") -- So I don't think
> > you're allowed to parse them separately. If both exist in the same
> > message, they must be combined before parsing.
> > ...
>
> Indeed. Looking at this again, I realize that a paragraph below then
> confused me:
>
> "Strings split across multiple field lines will have unpredictable
> results, because comma(s) and whitespace inserted upon combination will
> become part of the string output by the parser. Since concatenation
> might be done by an upstream intermediary, the results are not under the
> control of the serializer or the parser."
>
> I read this to mean that errors might be detected early or not, but
> maybe this is just a warning that the actual string used for
> concatenation can vary?
>
> If that's the intent, I'd call that a spec bug. A string value split
> across multiple field instances is very clearly a violation of what HTTP
> says about list-shaped header fields, and not allowing a recipient to
> detect that seems incorrect to me.
>

Definitely a spec bug -- not sure which spec though.
7230 reads:

> A sender MUST NOT generate multiple header fields with the same field name
> in a message unless either the entire field value for that header field is
> defined as a comma-separated list [i.e., #(values)] or the header field is
> a well-known exception (as noted below).


Perhaps what it should also mention is that the header must be defined as a
comma-separated list, *and* the split must be between list elements, in
cases where the field value can contain commas with other semantic meanings.

It goes on to say:

> A recipient MAY combine multiple header fields with the same field name
> into one "field-name: field-value" pair, without changing the semantics of
> the message, by appending each subsequent field value to the combined field
> value in order, separated by a comma.


and maybe the phrase "without changing the semantics of the message" means
that the server is only free to join the fields if it doesn't change the
semantics (implying indirectly that the field shouldn't have been split up
within a quoted string in the first place), but it doesn't really read that
way.

Ian