Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson <ian@hixie.ch> Mon, 22 October 2012 23:25 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C54631F0C5C for <ietf@ietfa.amsl.com>; Mon, 22 Oct 2012 16:25:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.407
X-Spam-Level:
X-Spam-Status: No, score=-2.407 tagged_above=-999 required=5 tests=[AWL=0.192, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LIoUSC6itaM3 for <ietf@ietfa.amsl.com>; Mon, 22 Oct 2012 16:25:46 -0700 (PDT)
Received: from homiemail-a80.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by ietfa.amsl.com (Postfix) with ESMTP id 04D8D1F0C54 for <ietf@ietf.org>; Mon, 22 Oct 2012 16:25:46 -0700 (PDT)
Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 7AA4037A06F; Mon, 22 Oct 2012 16:25:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=hixie.ch; h=date:from:to :cc:subject:in-reply-to:message-id:references:mime-version: content-type; s=hixie.ch; bh=q3vKAFMbO3sNB58gYSoGH0+AWlY=; b=D6R qizRxqQ8836F1m9jC28CviQEZ3+O188KCI/GO4sp82UpmPv6GJ0oO7hrLDeo4OBn 8/7sJzT9uL7AjZAhGCA+I9gYgOEX4x2bwZFnAZYK7pweAMeufN1ZQK3UijgPfal0 GMvYpbbTi6SNRqSOyxtBCM2riQcbO/t1406pFaYc=
Received: from ps20323.dreamhostps.com (ps20323.dreamhost.com [69.163.222.251]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: internal@index.hixie.ch) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id 4B64937A06B; Mon, 22 Oct 2012 16:25:45 -0700 (PDT)
Date: Mon, 22 Oct 2012 23:25:45 +0000
From: Ian Hickson <ian@hixie.ch>
To: Mark Nottingham <mnot@mnot.net>, Tim Bray <tbray@textuality.com>
Subject: Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)
In-Reply-To: <CAHBU6is8LNZ7Rq-vwLuOm+8ThKB9c=QPwbUfQwDQD5bDPjtf7w@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1210222320070.2471@ps20323.dreamhostps.com>
References: <50604C1A.7090901@gmx.de> <5060A964.5060001@stpeter.im> <Pine.LNX.4.64.1210172354500.2478@ps20323.dreamhostps.com> <507F5A7E.6040206@arcanedomain.com> <50856E3C.103@gmail.com> <Pine.LNX.4.64.1210221753010.2471@ps20323.dreamhostps.com> <0DBC8A11-319C-4120-975E-7E40FD5818BF@gbiv.com> <Pine.LNX.4.64.1210222137530.2471@ps20323.dreamhostps.com> <5085C4BA.2030505@gmx.de> <Pine.LNX.4.64.1210222220510.2471@ps20323.dreamhostps.com> <CAHBU6is8LNZ7Rq-vwLuOm+8ThKB9c=QPwbUfQwDQD5bDPjtf7w@mail.gmail.com>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="-1555694626-2073023172-1350948345=:2471"
X-Mailman-Approved-At: Tue, 23 Oct 2012 09:13:03 -0700
Cc: IETF Discussion <ietf@ietf.org>, Julian Reschke <julian.reschke@gmx.de>, "Roy T. Fielding" <fielding@gbiv.com>, Jan Algermissen <jan.algermissen@nordsc.com>, Noah Mendelsohn <nrm@arcanedomain.com>, URI <uri@w3.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Oct 2012 23:25:46 -0000

On Tue, 23 Oct 2012, Mark Nottingham wrote:
> On 23/10/2012, at 9:35 AM, Ian Hickson <ian@hixie.ch> wrote:
> > 
> > Consensus isn't a value I hold highly, but review of Anne's work is 
> > welcome.
> > 
> > If the IETF community didn't want Anne to do this work, then the IETF 
> > community should have done it. Having not done it, having not even 
> > understood that the problem exists, means the IETF has lost the 
> > credibility it needs to claim that this is in the IETF's domain.
> > 
> > You don't get to claim authority over an area while at the same time 
> > telling someone else "please fix that" for the hard work that comes 
> > with that area. The reality is, he who does the hard work, gets the 
> > authority.
> 
> All very interesting, but please address the point that's now been made 
> repeatedly -- why is it necessary for you to redefine URIs, rather than 
> doing as we suggest?

What exactly do you suggest? 

Doing the work but at the IETF? See my reply to James.

Waiting for the IETF to do the work? We did, and timed out.

Not doing the work? That doesn't lead to interop.

Doing the work as a diff spec? That's what we did for a while, but it 
doesn't work. Having to reference three specs (pre-parse, IRI, URI) just 
to parse and resolve a URL is not what leads to implementors having a good 
time and thus not what leads to interop.

What else do you suggest?


On Mon, 22 Oct 2012, Tim Bray wrote:
> >
> >    $ wget 'http://example.com/a b'
> >    --2012-10-23 00:27:43--  http://example.com/a%20b
> >
> >    # test.cgi returns a 301 with "Location: a b"
> >    $ curl -L http://damowmow.com/playground/demos/url/in-http-headers/test.cgi
> >    This file is: http://damowmow.com/playground/demos/url/in-http-headers/a%20b
> 
> Hmm.  I went to tbray.org and made a file at '$ROOT_DIR/tmp/a b' - note 
> the space.
> 
> Then I did
> 
> curl -I 'http://www.tbray.org/tmp/a%20b'
> curl -I 'http://www.tbray.org/tmp/a b'
> 
> Curl, quite properly, doesn't fuck with what I ask it

Instead it makes an invalid HTTP request. Your offensive language 
notwithstanding, that means wget and curl don't interoperate. This is bad. 
This is what we want to fix.


> and revealed a very interesting fact: That my Apache httpd returns 200 
> for both of these, but, with, uh, interesting variations, amounting to 
> what I think is quite possibly a bug.

How could it be a bug, since there's no spec that says how to handle a URL 
with spaces in it?


> I also pasted the version with the space into the nearest Web browser, 
> and it quite properly auto-corrected to a%20b.

Quite properly according to whom? There's no spec that defines this.


> I think it’s a bug that curl is claiming the 301 pointed at "a%20b" not 
> "a b".

You're wrong, but only because the de facto standard of "most software 
does it that way" says so. No IETF spec does. That's the problem.


> Because suppose it had pointed at "a%20b" - I don’t want middleware 
> lying to me.

What you want isn't really the issue. Compatibility with deployed code is 
the issue.


> It seems like a good idea to document the steps by which "a b" pasted in 
> becomes "a%20b" in the address bar. But I don’t see the relevance 
> outside human-authored strings.

All the strings in question are human-authored.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'