Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

"Roy T. Fielding" <fielding@gbiv.com> Wed, 24 October 2012 19:11 UTC

Return-Path: <fielding@gbiv.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9D8BC21F887E for <ietf@ietfa.amsl.com>; Wed, 24 Oct 2012 12:11:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.235
X-Spam-Level:
X-Spam-Status: No, score=-106.235 tagged_above=-999 required=5 tests=[AWL=-3.636, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vaQ3XGFGL6Sx for <ietf@ietfa.amsl.com>; Wed, 24 Oct 2012 12:11:03 -0700 (PDT)
Received: from homiemail-a24.g.dreamhost.com (caiajhbdcbef.dreamhost.com [208.97.132.145]) by ietfa.amsl.com (Postfix) with ESMTP id 9A45F21F8755 for <ietf@ietf.org>; Wed, 24 Oct 2012 12:11:03 -0700 (PDT)
Received: from homiemail-a24.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a24.g.dreamhost.com (Postfix) with ESMTP id 84E2A2C806D; Wed, 24 Oct 2012 12:11:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gbiv.com; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=gbiv.com; bh=vGp4eUrybmjKKRHnUjE5bMc9NqI=; b=W/f8KmfgXMX+xsaroaOKJ2c7+kFh nux1/ZHanse39Scogac1onbrbAtWtm3hCYhK3uSbHDlywJVOmB2wq+UjiGDinBpb JTvyk/5IKLx+v+zC/u65NzYHdIiqidNgIMkMVEGROG7Ikg4gIPshl5TSNZ3UO8ov Pdo8wS8lLNlDFHY=
Received: from [192.168.1.84] (99-21-208-82.lightspeed.irvnca.sbcglobal.net [99.21.208.82]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: fielding@gbiv.com) by homiemail-a24.g.dreamhost.com (Postfix) with ESMTPSA id CD7502C8057; Wed, 24 Oct 2012 12:10:59 -0700 (PDT)
Subject: Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="us-ascii"
From: "Roy T. Fielding" <fielding@gbiv.com>
In-Reply-To: <5087C57A.80607@gmail.com>
Date: Wed, 24 Oct 2012 12:10:59 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <097534B5-8135-4A20-8275-934A5547ACDB@gbiv.com>
References: <50604C1A.7090901@gmx.de> <5060A964.5060001@stpeter.im> <Pine.LNX.4.64.1210172354500.2478@ps20323.dreamhostps.com> <507F5A7E.6040206@arcanedomain.com> <50856E3C.103@gmail.com> <Pine.LNX.4.64.1210221753010.2471@ps20323.dreamhostps.com> <0DBC8A11-319C-4120-975E-7E40FD5818BF@gbiv.com> <Pine.LNX.4.64.1210222137530.2471@ps20323.dreamhostps.com> <5085C4BA.2030505@gmx.de> <Pine.LNX.4.64.1210222220510.2471@ps20323.dreamhostps.com> <CAHBU6is8LNZ7Rq-vwLuOm+8ThKB9c=QPwbUfQwDQD5bDPjtf7w@mail.gmail.com> <Pine.LNX.4.64.1210222320070.2471@ps20323.dreamhostps.com> <09DC68AA-2DAD-4CB1-9CA9-799AF12B7BE2@mnot.net> <5087C57A.80607@gmail.com>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
X-Mailer: Apple Mail (2.1283)
X-Mailman-Approved-At: Wed, 24 Oct 2012 14:49:03 -0700
Cc: IETF Discussion <ietf@ietf.org>, Julian Reschke <julian.reschke@gmx.de>, Jan Algermissen <jan.algermissen@nordsc.com>, URI <uri@w3.org>, Noah Mendelsohn <nrm@arcanedomain.com>, Mark Nottingham <mnot@mnot.net>, Tim Bray <tbray@textuality.com>, Ian Hickson <ian@hixie.ch>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Oct 2012 19:11:04 -0000

On Oct 24, 2012, at 3:39 AM, Brian E Carpenter wrote:
> On 23/10/2012 00:32, Mark Nottingham wrote:
> ...
>> The underlying point that people seem to be making is that there's legitimate need for URIs to be a separate concept from "strings that will become URIs." By collapsing them into one thing, you're doing those folks a disservice. Browser implementers may not care, but it's pretty obvious that lots of other people do.
> 
> Thanks for bringing this point out. It was explained to me in 1993 by TBL and
> Robert Cailliau that URLs (the only term used then, I think)

As a historical footnote, the term URL was created by the same
BOF that created the Uniform Resource Identifiers working group
at the IETF meeting in July 1992.

The early Web protocol specs had used the term "network address".

The term "Document Identifiers" came from Brewster Kahle and was
later used in a call for proposals by the Coalition for Networked
Information's Architectures & Standards Working Group, which in
turn led to TimBL propose Web addresses as Universal Document
Identifiers for a BOF at IETF 24 (Cambridge, MA).  Somewhere
in that BOF discussion, the URI working group was proposed and
TimBL's proposal was renamed Uniform Resource Locators
to distinguish it from other ideas for URNs
[see IETF 24 proceedings, p.184, and the following link].

 ftp://ftp.ietf.org/ietf/92jul/udi-minutes-92jul.txt

TimBL had originally specified that addresses in HREF could be
provided in full or partial form.  The IETF removed the partial
form, leading to all sorts of bad decisions regarding syntax,
and so I revived it in 1994 as Relative URLs [RFC1808].  That
spec is the only one that came close to defining what Anne
is trying to do here -- a single parsing standard for
potentially relative references. 

It is easy to claim that the merging of syntax specs that
created RFC2396 lost some value when the parsing standard was
replaced by a non-normative appendix.  However, it was discussed
extensively at the time, including with the browser developers,
and there was simply nothing common enough to make standard.
The best I could do for 2396 and 3986 was to include a
regular expression that accepts all strings and parses them
into the component parts.

I have absolutely no problem with writing a proposed standard
for parsing references, particularly if browser developers are
willing to adhere to one.  However, it is not a redefinition of
URLs, nor does it make sense for error-correcting transformations
(like pct-encoding embedded spaces) to be "the standard" for
parsing when there are plenty of applications that string
parse references for the sake of generating invalid test cases
(e.g., the example attributed to curl).

It is not non-interoperable behavior to parse input data
differently depending on the context in which it is entered.
What matters is that the context be properly documented to
indicate what pre/post-processing is applied, just as we
expect a browser's combined search/location dialog bars to
be documented as not merely URL-entry forms (or be banned
due to the privacy leakage of incremental search results).

....Roy