Re: draft-ietf-httpbis-header-structure: handling multiple field values

Julian Reschke <julian.reschke@gmx.de> Tue, 12 May 2020 19:29 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C4B523A0998 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 May 2020 12:29:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.648
X-Spam-Level:
X-Spam-Status: No, score=-2.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 56WZiX_eVAxY for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 May 2020 12:29:14 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C3CD83A044A for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 May 2020 12:29:14 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1jYaXj-0005CC-QZ for ietf-http-wg-dist@listhub.w3.org; Tue, 12 May 2020 19:26:19 +0000
Resent-Date: Tue, 12 May 2020 19:26:19 +0000
Resent-Message-Id: <E1jYaXj-0005CC-QZ@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <julian.reschke@gmx.de>) id 1jYaXi-0005BQ-B8 for ietf-http-wg@listhub.w3.org; Tue, 12 May 2020 19:26:18 +0000
Received: from mout.gmx.net ([212.227.15.15]) by mimas.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <julian.reschke@gmx.de>) id 1jYaXg-0000rY-Al for ietf-http-wg@w3.org; Tue, 12 May 2020 19:26:18 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1589311542; bh=nHZ1ovXvU77covn68QzUqJyzrCrrnkRHZySgL5ku6Z8=; h=X-UI-Sender-Class:Subject:To:Cc:References:From:Date:In-Reply-To; b=Cy/kNDbuXLgpY72Ld12PoVXUwRXPqWrecVh3eQ/r/VicfmIaC0m5lB2u3AxDLKso3 tBcR+FOwu/WLmN7QgggMLyDYOTdpELENrgThsGbjMIN+b1aQ4sykPYilZboLq6cfvO NL8fANoktZ+dYYboxQwI7bBgKHWi2K3UE95s7I8w=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.178.182] ([84.171.146.181]) by mail.gmx.com (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1Mf07E-1is52i2Xfu-00gVXq; Tue, 12 May 2020 21:25:42 +0200
To: Ian Clelland <iclelland@google.com>
Cc: Willy Tarreau <w@1wt.eu>, HTTP Working Group <ietf-http-wg@w3.org>
References: <f55521dd-e1d3-d925-688c-c472ad67bfb4@gmx.de> <20200512172347.GB4817@1wt.eu> <CAK_TSXJ3o7F9x63MSYyEhr7de0vO1Yu2s8JnjkhT7n4BQiQp+A@mail.gmail.com> <706ee02a-2ecc-6cce-0754-909d6b9f4edd@gmx.de> <CAK_TSXJxex1t32EnfPqYUKhTdqJZFbRf36_FLKJeP2Tqu7RXMg@mail.gmail.com>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <4045931b-06b3-9b76-106f-773499b8374b@gmx.de>
Date: Tue, 12 May 2020 21:25:38 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <CAK_TSXJxex1t32EnfPqYUKhTdqJZFbRf36_FLKJeP2Tqu7RXMg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:76IcwexIHDQHTDv0/LnkwWuuYclvXRnPoJFZRKRYMdnosh994fY SCQCRHggOfTjfI7vzmYZSvLiUfr3/U/SGDYnoyOZUah/KT7BR9IrluSvVwUlD0C6QXsPmv7 Tge3QXGTxX+G65IEpgv1V+D1FmdkqjlA5kz6qLL4j1QizwwSmqLhAknw+x6rZAOi1+MknOd WuIE/veFxeGemVQLxpY/Q==
X-UI-Out-Filterresults: notjunk:1;V03:K0:lPho9xgZn4M=:Io7MN9n7ShZmMq01NDIqgT o9aCE6+i9eDaMHi7UO9RCd28l4OJuId1OM5hFpdoRs2ej+m5xoS3bVAFKJJt0QmLRoFOPnk4T 7sW7eoDdh0pcQcUmqLN7TY+UdgYgeCN/3V2quRnG1vNV7vhvE4zxR+COGP2XLu8biukRV9RHr 5nept2MNTczopSDyTOF2NPHfQE9zS6498zWkuwETC4nhGmn6HNRINDOlm1obv2zuhe+xi/PrV Kz2mggLehHRJbRMqaX1LkO5vxVlUeQursf8q4ZNzXwVVqY2qMv9rYAVhdDHMZuBf9XRitOo40 +sLlSgtxBtIVp9UBZStwfmBkRuTgyi8joJ6gIupPGk3b7HcFDtWiD4qoGX2mirDGM4rv0N1D+ NkehRLKpTaGy0M911AN/pr1T9TUPapmZCFdWFRkfcffUIFLHP2AK1URW+N8BIy5phjehNgIBW jL8rg/6+3JEgRTLxgyNFalj4Y+GhT0MLlUC4HWPlR2OLfEiIfL5EsfzeFmNQesnCPvw3PQ199 7tV/nwSSWHrYf7ZUbGLMzVk5/6jCWbM8Ijdmy2Y9BFAbPM4Qg1+OIzEC5qWw8eM8VjZA/7NS9 I4KaoYz4SXm2dMU8MIF/ZzpJvXOsSiMtGeLx0xmh8J3HRlBlmMq7Y8Wf9Sj93DmytwOHuB5QP K0qLIJ8hKUZd/uDNWysutnBcl5YveHmO1u6jBz8WDx4ueskHbazvKCkd/GC5s1EWDATi4pcYh dFXQNarq81inXeP0yl087oo6fAqXGnPoUv0g9Cz7r3rBhrxHa2Sn7a6JqmB+w5xiL6OkYl2aY FjSoWi7RTkOVaDx7vfj6+0MHQkOJeGDPDkELUaIroT5G8MnzCmgLYhR0v1tfz5UUuv6zFumgj EnzIsKzugoz+XYbWV+4805B3pICGxlJHnWDDtg6oShYUvmryUoX46uCGkc9iAgBsLbQRaCVRS a2C+jnf+pkvHTfGSpL5fnQkGWTTalDgIKMXqgii2rF6ghV6ewjkj2fMvvrpHOrqww2IE2+tUc AWERE5NK0HhMAyAkObOpW/yyZP2RCz1/351RuW3DfftFSXSonFkTtYjWaN2XuxqCc3fvtiGno T3MfulXwLpt/2ywGjV17eL8R4fSwwpoAAcY8YjBheErfWBHf4El9E6lgaTJskIY0RXYAPab+H nFmLRDqjegzSkFePLyLqj5Tp95u0eBdpwq8WRB5SL0O+wE2tzNx0nOKiXxnj9eGEwGLyMiws/ CKXB2z+WUNZqy4+YK
Received-SPF: pass client-ip=212.227.15.15; envelope-from=julian.reschke@gmx.de; helo=mout.gmx.net
X-W3C-Hub-Spam-Status: No, score=-5.6
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1jYaXg-0000rY-Al c008a89ee905778f7b515df2b7ef7e16
X-Original-To: ietf-http-wg@w3.org
Subject: Re: draft-ietf-httpbis-header-structure: handling multiple field values
Archived-At: <https://www.w3.org/mid/4045931b-06b3-9b76-106f-773499b8374b@gmx.de>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/37606
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 12.05.2020 20:45, Ian Clelland wrote:
>
>
> On Tue, May 12, 2020 at 1:47 PM Julian Reschke <julian.reschke@gmx.de
> <mailto:julian.reschke@gmx.de>> wrote:
>
>     On 12.05.2020 19:39, Ian Clelland wrote:
>      > This is mentioned in
>      >
>     https://httpwg.org/http-extensions/draft-ietf-httpbis-header-structure.html#rfc.section.4.2 --
>      > "parsers MUST combine all lines in the same section (header or
>     trailer)
>      > that case-insensitively match the field name into one comma-separated
>      > field-value", (with the warning given that strings split across
>     multiple
>      > field values will have "unpredictable results") -- So I don't think
>      > you're allowed to parse them separately. If both exist in the same
>      > message, they must be combined before parsing.
>      > ...
>
>     Indeed. Looking at this again, I realize that a paragraph below then
>     confused me:
>
>     "Strings split across multiple field lines will have unpredictable
>     results, because comma(s) and whitespace inserted upon combination will
>     become part of the string output by the parser. Since concatenation
>     might be done by an upstream intermediary, the results are not under the
>     control of the serializer or the parser."
>
>     I read this to mean that errors might be detected early or not, but
>     maybe this is just a warning that the actual string used for
>     concatenation can vary?
>
>     If that's the intent, I'd call that a spec bug. A string value split
>     across multiple field instances is very clearly a violation of what HTTP
>     says about list-shaped header fields, and not allowing a recipient to
>     detect that seems incorrect to me.
>
>
> Definitely a spec bug -- not sure which spec though.
> 7230 reads:
>
>     A sender MUST NOT generate multiple header fields with the same
>     field name in a message unless either the entire field value for
>     that header field is defined as a comma-separated list [i.e.,
>     #(values)] or the header field is a well-known exception (as noted
>     below).
>
>
> Perhaps what it should also mention is that the header must be defined
> as a comma-separated list, *and* the split must be between list
> elements, in cases where the field value can contain commas with other
> semantic meanings.

AFAIU, that was the intent in RFC 2616 and 7230: every single field
value must conform to the header field's grammar.

> It goes on to say:
>
>     A recipient MAY combine multiple header fields with the same
>     field name into one "field-name: field-value" pair, without changing
>     the semantics of the message, by appending each subsequent field
>     value to the combined field value in order, separated by a comma.
>
>
> and maybe the phrase "without changing the semantics of the message"
> means that the server is only free to join the fields if it doesn't
> change the semantics (implying indirectly that the field shouldn't have
> been split up within a quoted string in the first place), but it doesn't
> really read that way.

No, whoever joins the header fields does not need to know the syntax of
the field (because that would defeat extensibility). IOW, if the input
is garbage, so will be the output.

Going back to the SH spec: I'm afraid that the spec *disallows* to fail
early on garbage - is this *really* the intent?

Best regards, Julian