RE: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson <ian@hixie.ch> Wed, 24 October 2012 05:05 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E7B4521F8C66 for <ietf@ietfa.amsl.com>; Tue, 23 Oct 2012 22:05:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.541
X-Spam-Level:
X-Spam-Status: No, score=-2.541 tagged_above=-999 required=5 tests=[AWL=0.058, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fiUNt3pAxj8G for <ietf@ietfa.amsl.com>; Tue, 23 Oct 2012 22:05:50 -0700 (PDT)
Received: from homiemail-a83.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by ietfa.amsl.com (Postfix) with ESMTP id B1EBD21F8C65 for <ietf@ietf.org>; Tue, 23 Oct 2012 22:05:45 -0700 (PDT)
Received: from homiemail-a83.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTP id 4EED75E06A; Tue, 23 Oct 2012 22:05:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=hixie.ch; h=date:from:to :cc:subject:in-reply-to:message-id:references:mime-version: content-type; s=hixie.ch; bh=3nA2Vv9poxRq5GY5pv8ycFfFKNI=; b=ymb arrihSE8Eeg08aJJ4Nx5BOCXFPuXK5iUVRqSV+u8E/SVRSTwiPwWWRQuFGREWtBF pk+vDISjTBfUna4GvUPTKG/1xYYi4MSstxGabXwo5ORaZXIzBV3hL1cDzcupBxLY PQF1baqrtAkAs/QFPWdY/i+d2e1X6ti2wdu0SBM0=
Received: from ps20323.dreamhostps.com (ps20323.dreamhost.com [69.163.222.251]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: internal@index.hixie.ch) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTPSA id 17C955E063; Tue, 23 Oct 2012 22:05:45 -0700 (PDT)
Date: Wed, 24 Oct 2012 05:05:44 +0000
From: Ian Hickson <ian@hixie.ch>
To: "Manger, James H" <James.H.Manger@team.telstra.com>, David Sheets <kosmo.zb@gmail.com>
Subject: RE: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)
In-Reply-To: <CAAWM5Tz3NdprjqwgyoVoV9qUuiwXb2gTQ49u4a4ePGfjyusDkw@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1210240451150.2471@ps20323.dreamhostps.com>
References: <50604C1A.7090901@gmx.de> <5060A964.5060001@stpeter.im> <Pine.LNX.4.64.1210172354500.2478@ps20323.dreamhostps.com> <507F5A7E.6040206@arcanedomain.com> <50856E3C.103@gmail.com> <Pine.LNX.4.64.1210221753010.2471@ps20323.dreamhostps.com> <0DBC8A11-319C-4120-975E-7E40FD5818BF@gbiv.com> <Pine.LNX.4.64.1210222137530.2471@ps20323.dreamhostps.com> <CA+9kkMDpEZCvcG1DJd=O1qPNV+=+GTBeN+CGndUe51Xym_A9sg@mail.gmail.com> <Pine.LNX.4.64.1210232115210.2471@ps20323.dreamhostps.com> <15E1D98B-8883-4936-81A9-174E1323683C@nordsc.com> <CAGKvQ5ZV6_GMVgjEezhR-oKqSikxR7GYgacMitbfczmNh725mw@mail.gmail.com> <Pine.LNX.4.64.1210232348110.2471@ps20323.dreamhostps.com> <CAAWM5Tz3NdprjqwgyoVoV9qUuiwXb2gTQ49u4a4ePGfjyusDkw@mail.gmail.com>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
X-Mailman-Approved-At: Wed, 24 Oct 2012 08:37:10 -0700
Cc: Ted Hardie <ted.ietf@gmail.com>, IETF Discussion <ietf@ietf.org>, Christophe Lauret <clauret@weborganic.com>, Jan Algermissen <jan.algermissen@nordsc.com>, URI <uri@w3.org>, Anne van Kesteren <annevk@annevk.nl>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Oct 2012 05:05:51 -0000

On Wed, 24 Oct 2012, Manger, James H wrote:
> 
> Currently, I don't think url.spec.whatwg.org distinguishes between 
> strings that are valid URLs and strings that can be interpreted as URLs 
> by applying its standardised error handling. Consequently, error 
> handling cannot be at the option of the software developer as you cannot 
> tell which bits are error handling.

Well first, the whole point of discussions like this is to work out what 
the specs _should_ say; if the specs were perfect then there wouldn't be 
any need for discussion.

But second, I believe it's already Anne's intention to add to the parsing 
algorithm the ability to abort whenever the URL isn't conforming, he just 
hasn't done that yet because he hasn't specced what's conforming in the 
first place.


On Tue, 23 Oct 2012, David Sheets wrote:
> 
> One algorithm? There seem to be several functions...
> 
> - URI reference parsing (parse : scheme -> string -> raw uri_ref)
> - URI reference normalization (normalize : raw uri_ref -> normal uri_ref)
> - absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option)
> - URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref)

I don't understand what your four algorithms are supposed to be.

There's just one algorithm as far as I can tell -- it takes as input an 
arbitrary string and a base URL object, and returns a normalised absolute 
URL object, where a "URL object" is a conceptual construct consisting of 
the components scheme, userinfo, host, port, path, query, and 
fragment, which can be serialised together into a string form.

(I guess you could count the serialiser as a second algorithm, in which 
case there's two.)


> Anne's current draft increases the space of valid addresses.

No, Anne hasn't finished defining conformance yet. (He just started 
today.)

You may be getting confused by the "invalid flag", which doesn't mean the 
input is non-conforming, but means that the input is uninterpretable.


> > The de facto parsing rules are already complicated by de facto 
> > requirements for handling errors, so defining those doesn't increase 
> > complexity either (especially if such behaviour is left as optional, 
> > as discussed above.)
> 
> *parse* is separate from *normalize* is separate from checking if a 
> reference is absolute (*absp*) is separate from *resolve*.

No, it doesn't have to be. That's actually a more complicated way of 
looking at it than necessary, IMHO.


> Why don't we have a discussion about the functions and types involved in 
> URI processing?
>
> Why don't we discuss expanding allowable alphabets and production rules?

Personally I think this kind of open-ended approach is not a good way to 
write specs. Better is to put forward concrete use cases, technical data, 
etc, and let the spec editor take all that into account and turn it into a 
standard. Arguing about what precise alphabets are allowed and whether to 
spec something using prose or production rules is just bikeshedding.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'