Re: Clarification regarding URI (RFC3986) spec followed by HTTP (RFC9110)

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 30 January 2023 04:03 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E8411C14F749 for <ietf@ietfa.amsl.com>; Sun, 29 Jan 2023 20:03:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.101
X-Spam-Level: *
X-Spam-Status: No, score=1.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, GB_RUURL=3, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gYwH02QpiXlx for <ietf@ietfa.amsl.com>; Sun, 29 Jan 2023 20:03:50 -0800 (PST)
Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01on2104.outbound.protection.outlook.com [40.107.215.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1A5CDC135DF1 for <ietf@ietf.org>; Sun, 29 Jan 2023 20:03:40 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cv21fCsVrNjdZBXK8d2/RI6Eu8k5WaYL8hyBDD6r0kbwfb687dh+3HsRa7hUaxLcfJP6uNdMeGrcGBI20YoM0biu3cSO+jPguUUfw52jKQM0yYRUAaR6IasVvXvsA6glEUlsQsEu9sIbrdb9DhxwFVKU9zAIKVw8mkjapjRT2zQGiu7Ch9vwdvfGwufQ1VceSZ8udCF8+LbB0iH6Cq6+JoGY9eFK3N4biaURHAMzBCtk8qbDqoA0YN7MmqsDkXmEE9UUB5ma9De5qhb4Py7JgSBbPflT3oKYyXwY+6Mrv4BRlx6GDFJv5Ee1qlNux6Ty07i9GrdlUVbknww5iFYN8g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8kO+/ZsWPlNSJEIt1En9OqHXx8UK7UxLRBW0Ne5BlI4=; b=e5tazoA59/HckTLMUtcTITEq1Q/IHuRovurh92BImCMCp123wDNTQGc+UMOnLQfdFTsXlaouv3gL+DlilyH94UcPS7eCctkALt+cH7UQt+6Fh0CN+hWUQiY4Y02wcbooW+CoOYin7Agv13xb9X0cfXqTmQi995ayKdnKMl4QiosiO3M75+v7539f61391ba7S/cToqKZ/UsURInrdVWu0y6oLqttRcMq31c+RqDgZtv4K1spj5kbBjdWjm3NKG6ftE9ahvsUHBIjJOJtALATElGkuwgUmPRt/i6/Zoi+gWrtYCfHxamVs1DDZm6w3NxrWJe5O0SBRrJzy5DqKjJWDQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=it.aoyama.ac.jp; dmarc=pass action=none header.from=it.aoyama.ac.jp; dkim=pass header.d=it.aoyama.ac.jp; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector2-itaoyama-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8kO+/ZsWPlNSJEIt1En9OqHXx8UK7UxLRBW0Ne5BlI4=; b=FN21Co2hWlegM5e0Scj4J+yoE1ODpNHgMEFmJ9PgMfiCzTGa+Yt3S/E8FL1fs+XCFrcMUqhgs+ccO5C9d+R3EE9J9/eg4ELiId7Wgl1mqNbj97P86EubW9czllKg05SUFIRrJr3roVfmuuDDRQmJsBemFadAkNb5wD4bRAaWh1Y=
Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=it.aoyama.ac.jp;
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7) by TY1PR01MB10704.jpnprd01.prod.outlook.com (2603:1096:400:31d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.33; Mon, 30 Jan 2023 04:03:37 +0000
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::b8ae:9cb8:821b:ccc5]) by TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::b8ae:9cb8:821b:ccc5%7]) with mapi id 15.20.6043.033; Mon, 30 Jan 2023 04:03:37 +0000
Message-ID: <0830aa47-0dbb-5911-dfbe-26ca86f0b04a@it.aoyama.ac.jp>
Date: Mon, 30 Jan 2023 13:03:35 +0900
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1
Subject: Re: Clarification regarding URI (RFC3986) spec followed by HTTP (RFC9110)
Content-Language: en-US
To: Raghu Saxena <poiasdpoiasd@live.com>, ietf@ietf.org
References: <MEYP282MB3564A385B6CECB0E9E92A630A3CE9@MEYP282MB3564.AUSP282.PROD.OUTLOOK.COM> <634ad97a-9081-1831-9c07-999a3c8e1bbf@gmx.de> <MEYP282MB3564CAEFF922DFEEEE32813DA3CE9@MEYP282MB3564.AUSP282.PROD.OUTLOOK.COM>
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
In-Reply-To: <MEYP282MB3564CAEFF922DFEEEE32813DA3CE9@MEYP282MB3564.AUSP282.PROD.OUTLOOK.COM>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: TYCPR01CA0035.jpnprd01.prod.outlook.com (2603:1096:405:1::23) To TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: TYAPR01MB5689:EE_|TY1PR01MB10704:EE_
X-MS-Office365-Filtering-Correlation-Id: c7093ec8-e7d8-417c-bbe0-08db0276f68f
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: RkMNjKX3ycDNpI+FqK3DW3aP8qbiX9exyF5NufvboPyuXsrqcKw1RArfyFyi3f7eaVdebNagLKKInqtyfnFK6RxYBF5R6enYdnBLsyEq9jJ9R/fEviNzx3rVP3raez1Df/51D7gQNUzQ6IqUk/TOQ4cXRtQwhQbMaoePIUCQWM5pGJnzMEfbFYhxyppqqXnLz1sGK2GCS/Ky6XZu89J5vvNe5h9jBcnIfU5hBgh9PF9H/B7DV3HebgJGzukjUQ1cRKKyqgcXqCIalZDgtmrBb0aJU6wOXYpMA3mLcTMX+C6+DOQ0kb98NYirMncm/XuKP42ffT/JXsDJRmqyce6Eb+YHq+MQk0yZoTarG11XzY6hzKtJDW569oOCrIboqNDmrVlVVSrCk41e/7xNW01mFS1wHhyt+geQp9DLyGrlAWlSHRaJQRrXHpiplL0T10qBt4PeaCaKoyZSOK5hBpsQZpC8rsevYUK97Gg46/gjE5U1Fi7EUN9zs0A0QWmSJvL/57I6FY500pcLFhgwzj9bXszjBOdhPmRzLuN+YFVBZlkKVlijC8kFupIE7cdJ5eDQi7CIaxn3c70CzOKdGHAV4vNqAmWJBMCm+gCnj5QfDMddtqxyunIddMGapZyRGG5RQKd4fVfkxxCY19u7NtS1BJAPENB2HpS1Ppf4T9p02WL5Jvdxf6xYMqrI8l7ilo9Yay8RlEAhIBz0yt5Xs46Wy2PxuRjVITt2UIuj3orxElM9Gv/8fwU1JydC3OOmPj/EsfEwJxb3RP9cAhR1dHfv8iL0IXAw3No3HF/dfVUoLJ0=
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB5689.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(376002)(346002)(136003)(366004)(39850400004)(396003)(451199018)(38100700002)(38350700002)(86362001)(31696002)(41320700001)(41300700001)(2906002)(36916002)(478600001)(52116002)(6486002)(966005)(2616005)(8936002)(5660300002)(316002)(66556008)(66946007)(786003)(8676002)(83380400001)(26005)(6512007)(186003)(53546011)(66574015)(6506007)(66476007)(31686004)(43740500002)(45980500001); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 0VRoGoMe1cxgxwpcfIP8yCtrd/9WYaZg/A4Mwiz3NCkz0uvzjcbSnaAEwzaWqJmwf+XIYTMmdRiVyrzwslLU1a6iRhO7VhT8j/m5U/dC0iDhX7EUvuU7RhUm/jqeUec662hETmO02sz6lxrzP/pyj6jsfMO4X3Kv4CC8uDey5WfZU6WyqAOiCzBGyFw2gDE5qf4tqnO6Sdqxqpm/XryxI6O0hcQQyOiuHRNMilLyiIxGgwwAJGf87lQV2rnFLhD6T6LazTRAYg2ANNWrPcS/3tECzCaOObRiTEfYEkO/tYN+mPw4iHtNh0zasi7RRuzCLDQdbRwl2SEUl7PIoFra9nL3iwHKuPww3fVODb5M1F1Xr5rwvQYNLeq6nbNPMN5Xrq/Qo8yMhnNhrCDP7PeQfuNaeENjOvBtv+rfy4F6XEdd82dshgunFUrwJAYBHrjnNUEuxmemWsmMZvkiBz1VALKrOQlqWqJPc9a1deUtEWzLrrrOZLNTvWjC61yb7RHw/NoPjtQzhlJQexrYbcK6GNqu5YfIndeDve4k45WV0dhuv0gCr9bBuvmZuWHql+ti0vEmTq2WNfc0Zd25Y2fFsnc+ZRL+7DGHkSaokb1Pf/DwWs9CI4/RccAN1OATkdJhxBuCEV/+Z9qjPlgyOWn0KJn2nUZ9Fyzs0ZWBKfQ7jEsvZZ9vvH9wtApzsvfJDNv2dUGlb7HRpVgA4/Ymt2bXKOEjY1uKuSmvBTZnS3oAPDqjuzB9wkvFJhK//FDH2tWHg98L1XNzg/VGjc4t2dxnZgpMMaTt6eTi0Pva1mw5KZ2o/Gm4vOrpETml9qQSYTzLwSdGG87h+4i00/SBvxJkCURuf/OJwJHu5JXE6IX01LOCE5qtN6673H3nCTO4K387zrFZbTb0SGs1niooDOW15KHAFNV152SfSVlYHp2kkf+yavvyZNFeUnUN6KP5e+tPWf3HQloXO5jY12dpcs+K+bQxWNBtl+ijaF5QNUTvLN0DL8A6SZDMGBzJJMYzg8fcHMCxVS8KN6prWqzHtHLDytbwu4K4WdqNsdpF7kiWYyaIyqFkoSO4G7/fYLMXeSKIcMyohs0G1kEuaRqA5HxSYN9xL+UHffmEfm/RjM99JraxL+TajadAjG3qRMu3REt9K0iRklB+aEtayMXs1c6suEXU0X+06n7BsIAllaKqzHkHWiB0BZfq4FVctPIM37pF8mTurRjO2DFP/1fGZzCDJI8pdz4MiqFkYQJvSoLZUkgvi/R4DUY80cTZK+u4WLsMWG8ynHh8LPlNt2yxFNwD5aKSx19KjidT0EsBv+GSpltF+Xx1Xur9z0h4xgTkOlqkl4LgDR7J7RUxC55/+X2FrEstOBdh/fG8rCM6s30xnyArCpisVMhaFXsVsdgmlZb6nb7Xlo8TpRTa4P52vNFzvLy4+xJePjbclw8Ts+Kwb3ykcnQSEXLUOo2pEVuCe6IOfoz636SfJuRB4/0x318PN7W370SN6h31ZruByYCjl/YHm9uUih+GYcTHhXyRmLNosoVcdXm5rgoPbTx/9CNQAyab/ycYV8n+rYtTbc3rP2IQ3DHZNtgxrieAft8Q4Nf56yY/bQ7DVFXd5V+x24DWFg==
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: c7093ec8-e7d8-417c-bbe0-08db0276f68f
X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB5689.jpnprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jan 2023 04:03:37.2036 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: Wd5I1t3uagn2n8IV6ZIpNrXfFZbggz6vJ1p/LfvGXwQmiQWgFSBKbQ0KIRLrX+CBOwTOoo+633TEQnWMddfyfQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY1PR01MB10704
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/lgqNGBy2a_LGRjm-dO_pW5pWK7s>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "IETF-Discussion. This is the most general IETF mailing list, intended for discussion of technical, procedural, operational, and other topics for which no dedicated mailing lists exist." <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Jan 2023 04:03:55 -0000

For the record, here's what I posted to the relevant github issue, for 
those who aren't subscribed to it:

 >>>>
For a generic HTTP library, not enforcing http/https URLs to be UTF-8 is 
the right decision. But such a library should make it easy to use UTF-8 
for URIs, And wherever possible, servers should use UTF-8 for their URIs 
if they contain non-ASCII characters, and should use a suitable baseXX 
encoding for binary data such as digital signatures and the like.

Btw, contrary to what @brandon93 says at the start of this thread, 
https://www.kinopoisk.ru/community/city/%D2%E0%EB%EB%E8%ED/ is not in 
Windows-1252 (Western Europe), but in Windows-1251 (Russia). This of 
course makes sense because the site has a Russian domain name. The city 
is Таллин, in Latin letters this is Tallin. You can easily check this by 
using the URL in a browser. Using  Windows-1252 makes no sense because 
there is no language that contains words like "Òàëëèí" (accented vowels 
only).

This shows the advantage of using UTF-8. It avoids the mess of regional 
encodings, and because of its internal structure cannot easily be 
mistaken for some other encoding.
 >>>>

Regards,   Martin.

On 2023-01-25 19:54, Raghu Saxena wrote:
> 
> On 1/25/23 17:47, Julian Reschke wrote:
>> On 25.01.2023 10:04, Raghu Saxena wrote:
>>> To whomever it may concern,
>>>
>>> I am writing to seek clarification regarding the URI spec (RFC3986)
>>> followed by HTTP, specifically about percent-encoding arbitrary octets
>>> (which do not comprise a valid UTF08 sequence). In the last paragraph of
>>> RFC3986 Section 2.5
>>> (https://www.rfc-editor.org/rfc/rfc3986.html#section-2.5), it says, 
>>> quote:
>>>
>>>  >  When a new URI scheme defines a component that represents textual
>>>     data consisting of characters from the Universal Character Set 
>>> [UCS],
>>>     the data should first be encoded as octets according to the UTF-8
>>>     character encoding [STD63]; then only those octets that do not
>>>     correspond to characters in the unreserved set should be percent-
>>>     encoded.
>>>
>>> This implies that URI schemes defined after RFC3986 must follow UTF-8
>>> encoding in their URIs. However, the original HTTP/1.1 RFC (2616) was
>>> dated June 1999, and so would not have had to "abide" by the UTF-8 rule.
>>>
>>> In fact, many web servers allow and process GET requests with
>>> percent-encoded octets, which they decode as raw bytes and have the
>>> application level logic handle how to process them.
>>>
>>> However, since HTTP's latest RFC is 9110, dated June 2022 (post
>>> RFC3986), does it mean the UTF-8 rule now applies to it? I would think
>>> not, since this would be a breaking change. But some comments on github
>>> indicate that this is as per the spec ()
>>
>> Pointer?
>>
> My apologies, the comment is here: 
> https://github.com/sindresorhus/got/issues/420#issuecomment-345416645
> 
> 
>>> tl;dr - Is it compliant with the HTTP specification to send arbitrary
>>> bytes, which do not represent a valid UTF-8 sequence, via
>>> percent-encoding in the URL query parameter?
>>
>> Yes.
>>
>> The http scheme was not re-definey by RFCs after RFC 2616 (in fact, it
>> was defined even before that).
>>
>> Best regards, Julian
>>
> Thanks for the clarification regarding schemes not being re-defined. I 
> will ask the library author to reconsider
> 
> Regards,
> 
> Raghu Saxena
> 
> (P.S. Sorry for the personal reply prior to this - my first time using 
> mailing lists)
> 

-- 
Prof. Dr.sc. Martin J. Dürst
Department of Intelligent Information Technology
College of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan