Re: Clarification regarding URI (RFC3986) spec followed by HTTP (RFC9110)

Julian Reschke <julian.reschke@gmx.de> Wed, 25 January 2023 09:47 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4066AC14CE2C for <ietf@ietfa.amsl.com>; Wed, 25 Jan 2023 01:47:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.097
X-Spam-Level:
X-Spam-Status: No, score=-7.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BF9Vn0Yh8q8A for <ietf@ietfa.amsl.com>; Wed, 25 Jan 2023 01:47:37 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D2540C152564 for <ietf@ietf.org>; Wed, 25 Jan 2023 01:47:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1674640054; bh=83xE7mUEF9/qSjWvBa2m6oYBPkSiihUhTIamuelljeE=; h=X-UI-Sender-Class:Date:Subject:To:References:From:In-Reply-To; b=aYerL2UlWlEJhAloA739gqt5tAQ3S7ROvvKa+lZptK8JMl3g0HsVyN/iM7sFXRbVX tGvG1et3adHGPMSFnDsEdS6gEkKcXiQYVNpH7YpCRGmNNW3lWqLLA2CQ/JSH59+baM bUOQyJRt3DEwLuCQn8TF2DuMUjzxBQ4bzbTnUR8eZ/tWJDWRb3Y6pByc+wjiTCWrd9 aHd4K8XmVRi80WU9gIp4SqY+neTm2bgr6DBqTQFvhaYX2atVxw1lxdqOnlgjcglkVt YNl/yT4unVFSe4V0LCx4xfDCTAVog8xODtwltSGxglRsuE92FlwvN8pp1GYI1ClBRn O0SBlwrk1QuUg==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from [192.168.178.20] ([217.251.133.228]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MvK4f-1oTtId2jjt-00rDST for <ietf@ietf.org>; Wed, 25 Jan 2023 10:47:34 +0100
Message-ID: <634ad97a-9081-1831-9c07-999a3c8e1bbf@gmx.de>
Date: Wed, 25 Jan 2023 10:47:34 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1
Subject: Re: Clarification regarding URI (RFC3986) spec followed by HTTP (RFC9110)
Content-Language: en-US
To: ietf@ietf.org
References: <MEYP282MB3564A385B6CECB0E9E92A630A3CE9@MEYP282MB3564.AUSP282.PROD.OUTLOOK.COM>
From: Julian Reschke <julian.reschke@gmx.de>
In-Reply-To: <MEYP282MB3564A385B6CECB0E9E92A630A3CE9@MEYP282MB3564.AUSP282.PROD.OUTLOOK.COM>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:owZA7qFyg0d1GZWasgLXtpwBAmhA6BzA5FPy+sdC5hom4sL4WxZ zmn9SUOs3diMbBT8TPdlNXfUIrsSrE1FyOKd0c25VtOQQci1RbWwGt1C4hLfQD8qmzVXqXf r3TpsOUITxqZ4NN+rIdydI2W6Vl7kZNJ4JCJP0OfrwlaxDnX5sffHpTQJhmkUO7vYoDLE1z Yh2qzKo8JlO/qPc94GfDQ==
UI-OutboundReport: notjunk:1;M01:P0:GANnjEBioz0=;eFiAOPYjdIEARnvnc+IG2eIOEQP vRReOVpmWLVGBMVjX73wm5/uTNm13Yl6IiztpANc3USOjI0egccfz74Z0cEsYJBlySQXqzTPU 08so7+HKnOuvAahx0M3+7tFPIAgyQNG7p7GV4RWjXY1GZGHwJPbbkdIEVULC+ZZmHiaQYwdv4 ZUjN7Oj/4scqIyTt9S8PLUJZBx9DkxkF5tRC0Bkh+x66nW9AUti/eM5AOV8yPMaUFu02gOKwc +UUYCN2Guy9Z63/+aF+fcjqn1WxbuDSsx1qhm3hrodi1+ixEnligOsrzb+1FiADZsDwBofYWy lKU/56EDmQ8gOIQ5RcvxHxAC+eZBmMu+CmcMiIOls5UaCJN6SRv04RRIZ1sMm4bp2zQqsLzIg Db7DTOj5Jae4/sN8/7Q7iV2yZ6nkZopWDfQCijJxLMi743dxEb5MTrNaWlBDQoaaUpFubSVN8 dQDYMcgGIpJFQJREicNSS/6mdiRmANNiYc/7M2S6JhcEgKTptIW3twcOwTN5sNbz31zEsnCIY N6eRi2HVAeADQUxb3yITPDxhY77g184Ez0/zdg31EbdpUkUTGtbQLsIAA3Ct7UMwTEo13/7Cp kCY6rUCCEDQATBFT0XhKLq5jac823APviaPlu6mMZo2xw6AEZGiJYp4dtLz0e4Bd3aQ39I8Z0 f1qRq7eyxq5/bMfsQ27luqkitqQNre8ONVw8X1/Z5vVgA17FNc54m5zkqmLVviy7OhLMVy8FJ kIDeogNBieNMQeoOokyVOJ6XQmx48WVp6dyIY7KCL2cCUAMDZZzKFxog3mlmYZ+anIvEQOyjM DoT5nFpngd6Tmi990FwLViV057xWU2SHXwwrgWbQTR8bCeWL3wZNJYC0v/iwtClmr+SaIWzMS YIsBPud6v6IZOdcoq0toacCNPmpxnjNoILDHeLAQLfu3sBfEd8bLGrFEVrhVWW1RPZmd/FTBs 9h6Vn31mMs7nIKDpK7dFAz5oW8o=
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/0OydNWKPkHCsDE-z1N5rILhQBYo>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "IETF-Discussion. This is the most general IETF mailing list, intended for discussion of technical, procedural, operational, and other topics for which no dedicated mailing lists exist." <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jan 2023 09:47:41 -0000

On 25.01.2023 10:04, Raghu Saxena wrote:
> To whomever it may concern,
>
> I am writing to seek clarification regarding the URI spec (RFC3986)
> followed by HTTP, specifically about percent-encoding arbitrary octets
> (which do not comprise a valid UTF08 sequence). In the last paragraph of
> RFC3986 Section 2.5
> (https://www.rfc-editor.org/rfc/rfc3986.html#section-2.5), it says, quote:
>
>  >  When a new URI scheme defines a component that represents textual
>     data consisting of characters from the Universal Character Set [UCS],
>     the data should first be encoded as octets according to the UTF-8
>     character encoding [STD63]; then only those octets that do not
>     correspond to characters in the unreserved set should be percent-
>     encoded.
>
> This implies that URI schemes defined after RFC3986 must follow UTF-8
> encoding in their URIs. However, the original HTTP/1.1 RFC (2616) was
> dated June 1999, and so would not have had to "abide" by the UTF-8 rule.
>
> In fact, many web servers allow and process GET requests with
> percent-encoded octets, which they decode as raw bytes and have the
> application level logic handle how to process them.
>
> However, since HTTP's latest RFC is 9110, dated June 2022 (post
> RFC3986), does it mean the UTF-8 rule now applies to it? I would think
> not, since this would be a breaking change. But some comments on github
> indicate that this is as per the spec ()

Pointer?

> tl;dr - Is it compliant with the HTTP specification to send arbitrary
> bytes, which do not represent a valid UTF-8 sequence, via
> percent-encoding in the URL query parameter?

Yes.

The http scheme was not re-definey by RFCs after RFC 2616 (in fact, it
was defined even before that).

Best regards, Julian