Re: [dispatch] RFC 3896 and 3987 vs WHATWG URL Living Standard

Alwin Blok <alwinb@gmail.com> Sun, 13 June 2021 12:01 UTC

Return-Path: <alwinb@gmail.com>
X-Original-To: dispatch@ietfa.amsl.com
Delivered-To: dispatch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0389A3A1809 for <dispatch@ietfa.amsl.com>; Sun, 13 Jun 2021 05:01:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PtKa_j4UfkZ3 for <dispatch@ietfa.amsl.com>; Sun, 13 Jun 2021 05:01:35 -0700 (PDT)
Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 513103A1808 for <dispatch@ietf.org>; Sun, 13 Jun 2021 05:01:35 -0700 (PDT)
Received: by mail-ej1-x636.google.com with SMTP id ci15so11733225ejc.10 for <dispatch@ietf.org>; Sun, 13 Jun 2021 05:01:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=SWiN7c5s3tG0pVKpJof99V1fAVB7gNrvouRITGhLEHQ=; b=WY01ACHECfbC0mNohGA43z8OF5KjzIqhJH+OWjLFovBpQUSJ/UkZgpSQYH5icZeFpp 0BURaYBmbpITR7U/QjmzYYuZQS1fSonZp8ek1AbVJ6gmZXP6RtoykJ46VAogrhZM859X vRXJFsr7xoaOeMcbj1ATJvktiz6fWTUlUlMcSBDHTzCJEiKudBTvsPWxIkN7EtfjZMjO scrjNZ/XmFUXp1/4on08M6wkOkE7GDzk2LhmV/fLOieIxWQqziqPqI/RDhMYhDr5hfYU Rx2op7szkIk/Bv3metyBY6Wd54+Gv0a5MJE7bTSs1LdsslHe3Ol6hAPXlbPj6VMwPvS2 wZxw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=SWiN7c5s3tG0pVKpJof99V1fAVB7gNrvouRITGhLEHQ=; b=Qm/BGDj4KckQDcLMcS7JNXM+YybMlauB9HVmNnh45BfkgIbrs8WEqV/ZqlbbPU55vE wrZHUaE/Eih0C3cbv98KhDYt0gIGz9eelJlXHjr2LDabtSC9E5sVdmlgNrizTLBi0Bsr MXQzgapA4x0EoLYnu+5/MgSh4R/2Z3WvamNse9RINeGWKgmWkgOLSZPW0tACFQsKqaRp MhTLOhhr9/uDOGkEh6/2jfH3VfjBfdd+i6U5HbOB+AfMVirao83D2anw6sWQW2AGXSbe cx8VcczSPCd6exZmo820fciZ2FX8ZWN+AB2V2kYypRDdxhHDy3xM5JJwumSvhLRTIW6h FRuA==
X-Gm-Message-State: AOAM5335oilRr/ttX71vUwe1A9oq8KKdAgMPdV4aAPZ2A9wat8N82bi0 lLO6DRz97zAAJ3PQRvv4B48=
X-Google-Smtp-Source: ABdhPJw0FDhwIjmvaWAxhtjFs5P3OplAOmue3ZRlCL8d6yW6BKirpmdRI2zs/Lu36TAdlR6ri3nY8A==
X-Received: by 2002:a17:906:d791:: with SMTP id pj17mr11402477ejb.442.1623585692973; Sun, 13 Jun 2021 05:01:32 -0700 (PDT)
Received: from [192.168.1.166] ([87.214.169.250]) by smtp.gmail.com with ESMTPSA id e25sm4635511eja.15.2021.06.13.05.01.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 13 Jun 2021 05:01:32 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Alwin Blok <alwinb@gmail.com>
In-Reply-To: <BCCD9ABB-9E18-481B-8342-70005966E7E2@mnot.net>
Date: Sun, 13 Jun 2021 14:01:31 +0200
Cc: John C Klensin <john-ietf@jck.com>, dispatch@ietf.org, Larry Masinter <LMM@acm.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C1E8FA12-0409-4440-95F5-B42094A5B5BE@gmail.com>
References: <002501d75a5b$08694740$193bd5c0$@acm.org> <FC052CE1D6FD5CD0B69051AE@PSB> <BCCD9ABB-9E18-481B-8342-70005966E7E2@mnot.net>
To: Mark Nottingham <mnot@mnot.net>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dispatch/UUX_z4_7G2NVDg9EFKArLHTRJqo>
Subject: Re: [dispatch] RFC 3896 and 3987 vs WHATWG URL Living Standard
X-BeenThere: dispatch@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DISPATCH Working Group Mail List <dispatch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dispatch>, <mailto:dispatch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dispatch/>
List-Post: <mailto:dispatch@ietf.org>
List-Help: <mailto:dispatch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dispatch>, <mailto:dispatch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Jun 2021 12:01:40 -0000

> On 10 Jun 2021, at 02:39, Larry Masinter <LMM@acm.org> wrote:
> 
> A replacement for RFCs 3986 and 3987 wouldn't be my
> first step. Maybe a short standards-track "UPDATES" that points to/contains
> deltas or a new grammar that is more consistent with (tested)
> implementations.

Since I am new to the IETF process, is there a link that I can follow with more information about this?
Specifically, a description of what is an UPDATES, in this case. 


> On 8 Jun 2021, at 08:37, Mark Nottingham <mnot@mnot.net> wrote:
> 
> The issue that Larry links to seems to be taking tentative steps towards aligning the grammar in 3986/7 with what WHATWG does, which is great. If that's incorporated, I think there's still a significant gap that would need to be filled before we can deprecate. 

I don’t know if it will be incorporated. I feel quite a lot of resistance to it. My impression may be wrong though and maybe it will work out just fine. 

> although obviously the W3C has come to peace with HTML living there

Oh…. They should up their game and publish a much simpler but behaviourally equivalent specification of the html parsing chapter only. 

> Looking at this from a different angle, we could also ask ourselves whether 3986/7 should be updated to align with WHATWG URL. I think the answer to that is likely to be no -- not only is the delta apparently very small (according to the comments on the linked issue)

It can maybe just be descriptive, as in: this is the grammar and the behaviour that the WHATWG implicitly specifies. 

The delta between RFC3987 and WHATWG URL is not small. You cannot directly compare the documents, the style of the WHATWG prevents that. But the standard can be (after so much effort…) characterised. That characterisation could be published as an IETF document. That would be great! The diff between that, and RFC3987 should be relatively small. 

> alignment would mean doing things like considering <> to be valid content in URIs. Again, what they're specifying is how to get from an arbitrary string to a URI, not the syntax itself.

The WHATWG explicitly defines valid URLs to always have a scheme. But then it implicitly defines ‘loose URL’s and ‘loose relative URL’s via their ‘basic-url-parser’ function. 

That function specifies, how to get from an arbitrary input string and optionally an ‘URL record’ to a resolved-normalised-URL-record, or failure. 

If you break down that function, you see that it implicitly specifies:

- A grammar for valid URLs and one for loose URLs (incorporating valid and invalid)
- Grammars for relative references
- An encoding normal form for loose URLs
- Several slightly different resolution operations

They don’t seem to be attentively aware of that themselves. 
The solution was pointed out lucidly by David Sheets in 2012:

<https://mailarchive.ietf.org/arch/msg/ietf/iX5a8Bn3JJO98O1Z6BShkRSApY0/>

Kind regards,
-Alwin Blok