Re: URL, URI and the w3c

"Roy T. Fielding" <fielding@gbiv.com> Wed, 15 June 2022 18:01 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ADE8FC15790C for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 15 Jun 2022 11:01:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.758
X-Spam-Level:
X-Spam-Status: No, score=-2.758 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gbiv.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id y0umwSZRzEK2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 15 Jun 2022 11:01:18 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0F31CC157B41 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 15 Jun 2022 11:01:17 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1o1XHa-0005Zg-7p for ietf-http-wg-dist@listhub.w3.org; Wed, 15 Jun 2022 17:58:22 +0000
Resent-Date: Wed, 15 Jun 2022 17:58:22 +0000
Resent-Message-Id: <E1o1XHa-0005Zg-7p@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <fielding@gbiv.com>) id 1o1XHY-0005Yn-Mw for ietf-http-wg@listhub.w3.org; Wed, 15 Jun 2022 17:58:20 +0000
Received: from olivedrab.birch.relay.mailchannels.net ([23.83.209.135]) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <fielding@gbiv.com>) id 1o1XHW-0001Uk-TD for ietf-http-wg@w3.org; Wed, 15 Jun 2022 17:58:20 +0000
X-Sender-Id: dreamhost|x-authsender|fielding@gbiv.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 3CA305A06A3; Wed, 15 Jun 2022 17:58:04 +0000 (UTC)
Received: from pdx1-sub0-mail-a288.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id A57E65A1892; Wed, 15 Jun 2022 17:58:03 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1655315883; a=rsa-sha256; cv=none; b=HYnfG8OlGj6ZF8xdy4d8RSLo7jqQCC6tK67uksDyse+GcAq8IoOzb4ZzTg7vWUkon4Txlu h32EkebF6yDUnaMLO1XnfDm9yjhuQ/ZF4rcYv906+xRacmoVvitA2MOwGDV8Uw4dOlPlDp VPm0SE0KeR499O848ac098leKgrhIQ8EW0wGBvb7YCcx6TizHHP1UerunGH+pq5Z8thLQP imE5USSr57gG8nw7AMUlVdbqrrPh2jXM0m4/E92fgsoJVKQvdoeuu5nmgIGYdZT/vcl5Ay wHrkgwev3PZtkXpcbCqvMgqOUgxNc4Bn80IEj/S8luD2ec5BkRMowYL82xVWgQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1655315883; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rB78DvYJEteyXUAXpf4DevVFAXd9HoESQAbInCIt6OI=; b=8NUumysFS2HYiwsLHh7nvAeI71h15MivLFmthXdQGZ1O58g0k9yVOubo/tNTh9V3xQaXd2 qyB6TNPWJl6xGP2DY5BXczDkAOIws4RsAita4Qpg3RJoLoRPe+JmFAw4q+E98bN4HZy3ZA eij1onX1OeAnhUWdzUWS08mF091tzCYzUIGc0cT/YKcCZrQmDHaWUpu90/y0p14ScffXj+ 8Tn4MDk5k4NFwg+a5qsQjA5IFo7cnA4MlLYqu/T0IwfP+UNcxRQ/S8iMBNe62D4ydiBUEk XqL1FwsJFGuvJ5Bw7mPF6ialT3ZbXbzF4HXUYDUgXTaGvsxf8xAfAVpu8t3TxQ==
ARC-Authentication-Results: i=1; rspamd-848669fb87-df6c2; auth=pass smtp.auth=dreamhost smtp.mailfrom=fielding@gbiv.com
X-Sender-Id: dreamhost|x-authsender|fielding@gbiv.com
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|fielding@gbiv.com
X-MailChannels-Auth-Id: dreamhost
X-Name-Left: 521dbbaf73c66110_1655315884002_1203264017
X-MC-Loop-Signature: 1655315884002:508711063
X-MC-Ingress-Time: 1655315884002
Received: from pdx1-sub0-mail-a288.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.125.123.1 (trex/6.7.1); Wed, 15 Jun 2022 17:58:03 +0000
Received: from smtpclient.apple (ip72-194-77-117.oc.oc.cox.net [72.194.77.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: fielding@gbiv.com) by pdx1-sub0-mail-a288.dreamhost.com (Postfix) with ESMTPSA id 4LNY1k6rc7zLH; Wed, 15 Jun 2022 10:58:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gbiv.com; s=dreamhost; t=1655315883; bh=0JEcQaB6BKddPeroaGVtWVRSR8zAPuapQkYHoSmve2I=; h=Content-Type:Subject:From:Date:Cc:Content-Transfer-Encoding:To; b=QCcVZF0ITo7K8vRkGPQisJeoLfiSKrl+0haW96jkXPWC8uE8Z9Qje9c/lXgPy+dwV 6siiD3B1N4qIrriN1u6iZcF8nDrhYYRHl80IkVGYCRnwKn//8LFy0zQaYPUSyeUNRi 2SbnzLVX6jJc+j748qa21Nv9mtE07omqonNmaqDsEijVxxXOtbzRkdR7ullIDVBADF a6MhIByUWPvA5BoDY6RHm8X/tyNV62llbbH+lmdZhd5oLQukkJspZ1gRwjvjYJa6w5 eVCuWw13zolEO7yeL3zAFtJXzYKPFKxdD8P9zwmyfHSBTG+AuZTvdiD20+NtZntY2d EKQ/S5YnEWHGQ==
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
From: "Roy T. Fielding" <fielding@gbiv.com>
In-Reply-To: <80p9o6s5-5627-6311-o95r-457oq42pr9p@unkk.fr>
Date: Wed, 15 Jun 2022 10:58:01 -0700
Cc: Roberto Polli <robipolli@gmail.com>, Daniel Stenberg <daniel@haxx.se>
Content-Transfer-Encoding: quoted-printable
Message-Id: <1F8B2CCF-2B6D-4449-A912-19E0B08F4F13@gbiv.com>
References: <CAP9qbHVQ7B423jc7tHQo70ZAeXmFHdZo-=JvTSj5L2D6uTTQ9A@mail.gmail.com> <3n57428s-3052-66q7-prp8-118s19q41461@unkk.fr> <A90EB729-EA13-42E1-94F1-4410334E907E@tzi.org> <CAP9qbHUDSGDWyDUzLykFkE_C+GKZGy5SYxNsX=feEnrAmkE1Bw@mail.gmail.com> <80p9o6s5-5627-6311-o95r-457oq42pr9p@unkk.fr>
To: HTTP Working Group <ietf-http-wg@w3.org>
X-Mailer: Apple Mail (2.3696.100.31)
Received-SPF: pass client-ip=23.83.209.135; envelope-from=fielding@gbiv.com; helo=olivedrab.birch.relay.mailchannels.net
X-W3C-Hub-DKIM-Status: validation passed: (address=fielding@gbiv.com domain=gbiv.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-9.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1o1XHW-0001Uk-TD 6861d0beb0e915cce50b0d26d811ceaa
X-Original-To: ietf-http-wg@w3.org
Subject: Re: URL, URI and the w3c
Archived-At: <https://www.w3.org/mid/1F8B2CCF-2B6D-4449-A912-19E0B08F4F13@gbiv.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/40109
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

> On Jun 14, 2022, at 2:35 PM, Daniel Stenberg <daniel@haxx.se> wrote:
> 
> On Tue, 14 Jun 2022, Roberto Polli wrote:
> 
>> I am curious then what a specification like OAuth-Somethin which relies on both browsers and generic user agents should adopt...
> 
> There are as many answers to that as there are URL/URI parser authors (= many). URL interop is very poor these days. I say this as someone who works with this challenge on a daily basis.
> 
> The current state of URLs and URIs cannot be described as anything less than a horrible mess, a security nightmare [*] and an infected area that lots of persons will not go near due to the past experiences and personal conflicts.
> 
> [*] = https://daniel.haxx.se/blog/2022/01/10/dont-mix-url-parsers/

It would help immensely if people would just use a common terminology
of references and addresses.

What URI (RFC3986) defines is a standard naming format that uses
hierarchical name delegation to cover the entire Internet with identifiers.
URL is just another name for URI. It's uniform and restricted to what
will interoperate, like the postal code addresses, and like postal regulations
this doesn't prevent people from using non-standard references that
can be reinterpreted into some standard form.

URI has one appendix that defines how to parse any reference into the
common components (even when the characters are not allowed in the
standard URI grammar). That's what most implementations interop upon.

The WHATWG url spec defines a set of rules for interpreting references
and placing them in a url data structure within browser memory. url != URL.
href != URL. The spec says that this is somehow replacing URI, but it isn't
even defining the same thing. The algorithms are designed to support
1997-era browser compatibility (even when there is no desire for that).

The WHATWG spec uses the same name for (last I checked) five different
concepts with five different sets of rules associated with them, each of
which are very important for browser consistency. The specification is
owned and controlled by four corporations, but isn't fully implemented
by any of them. It aspires to be implemented.

These specs could easily exist in harmony if WHATWG would stop
insisting it is defining the URL standard and instead define an HTML
href standard that makes sense for HTML processors. The deviations
are mostly due to the variances in constructing/interpreting references
that are generated via forms, javascript, etc. Well, that and the i18n
hostname processing that keeps changing in weird ways. None of
that impacts the interoperability of URIs.

There are other specs that have been implemented (XML, IRN, URN, etc.)
that also exist, with their own fundamental flaws, that have tried to fix
the perceived limitations of URI by changing the identifiers, but not
actually recognizing that references >> identifiers. Anyway, they exist,
the world has changed several times over, and the URI spec still only
deals with the interoperable standard address, not how to make every
possible reference into a valid URI.

I could write a spec that formally defines Hypertext References
(and whatever other updates are needed for 3986) for the Internet.
Actually, I started to do that and stopped to revise HTTP instead.
It's not an easy thing to do, mostly because everyone wants a little
thing done and there's a lot of people wanting (for a very long time).
The hard part is resisting temptation.

I would not do so at the WHATWG. This is not because I am somehow
antagonistic to WHATWG folks; it's because I find it unethical to take
a public standard and place it under the ownership of four companies
that have shown no interest in supporting the needs of the entire
Web/Internet, even if I like those companies, use their products, and
enjoy working with their employees to make the IETF standards better.

This is not my fault, not a fault in the IETF process, nor any attempt
to "go political" on WHATWG: it's a fact of "https://whatwg.org/policies"
which was entirely and artfully created by those companies even
though they were (and are) fully capable of participating in nonprofits
like the rest of us. Power corrupts. I am happy to work with those same
people within the IETF framework, where they can participate as
individuals and not have veto power over the resulting spec.

....Roy