Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

David Sheets <kosmo.zb@gmail.com> Wed, 24 October 2012 03:49 UTC

Return-Path: <kosmo.zb@gmail.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F0F511E810B for <ietf@ietfa.amsl.com>; Tue, 23 Oct 2012 20:49:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JlnbPXoDqvWv for <ietf@ietfa.amsl.com>; Tue, 23 Oct 2012 20:49:54 -0700 (PDT)
Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by ietfa.amsl.com (Postfix) with ESMTP id DFC7711E80D3 for <ietf@ietf.org>; Tue, 23 Oct 2012 20:49:53 -0700 (PDT)
Received: by mail-lb0-f172.google.com with SMTP id k13so887287lbo.31 for <ietf@ietf.org>; Tue, 23 Oct 2012 20:49:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PKZPtJ8b5c+W0ciT+NzeCmOU0mJLa0R9Cg7hPp5XQQI=; b=KAB4WJTybRwgIbEI1QJnH1Q0lTKNHWFP5XCsCWqQuCNBtYEQQ/+LYxDfgbhRdIDWOU XLp2melfob8wCeoK/wWmEQ96/Zu41ypcXIkwuS5y2axqWI63EnQz5348UPx7DXF5pykc 0DS4xmD5RAZg/NNiaTvAiYhZWtKyglUV56wuYI/VdTkIa901O9sv3YNKKJ6kBta+snn+ mbNVo/ch5buInh6TDpbv/l5gU42NtPVkLrVZtO9t+nELAB0ThkGYZzkbBqNAuFZD9OMT VPNBXFC/XlGRuqLvQnpIF4pAmhc0O1xCsjb0AxVOfZHPYp9uRZn8wed/8eFE2QocoJ/T SmNw==
MIME-Version: 1.0
Received: by 10.152.105.173 with SMTP id gn13mr13151743lab.41.1351050592846; Tue, 23 Oct 2012 20:49:52 -0700 (PDT)
Received: by 10.112.36.227 with HTTP; Tue, 23 Oct 2012 20:49:52 -0700 (PDT)
In-Reply-To: <Pine.LNX.4.64.1210232348110.2471@ps20323.dreamhostps.com>
References: <50604C1A.7090901@gmx.de> <5060A964.5060001@stpeter.im> <Pine.LNX.4.64.1210172354500.2478@ps20323.dreamhostps.com> <507F5A7E.6040206@arcanedomain.com> <50856E3C.103@gmail.com> <Pine.LNX.4.64.1210221753010.2471@ps20323.dreamhostps.com> <0DBC8A11-319C-4120-975E-7E40FD5818BF@gbiv.com> <Pine.LNX.4.64.1210222137530.2471@ps20323.dreamhostps.com> <CA+9kkMDpEZCvcG1DJd=O1qPNV+=+GTBeN+CGndUe51Xym_A9sg@mail.gmail.com> <Pine.LNX.4.64.1210232115210.2471@ps20323.dreamhostps.com> <15E1D98B-8883-4936-81A9-174E1323683C@nordsc.com> <CAGKvQ5ZV6_GMVgjEezhR-oKqSikxR7GYgacMitbfczmNh725mw@mail.gmail.com> <Pine.LNX.4.64.1210232348110.2471@ps20323.dreamhostps.com>
Date: Tue, 23 Oct 2012 20:49:52 -0700
Message-ID: <CAAWM5Tz3NdprjqwgyoVoV9qUuiwXb2gTQ49u4a4ePGfjyusDkw@mail.gmail.com>
Subject: Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)
From: David Sheets <kosmo.zb@gmail.com>
To: Ian Hickson <ian@hixie.ch>
Content-Type: text/plain; charset="ISO-8859-1"
X-Mailman-Approved-At: Wed, 24 Oct 2012 08:37:10 -0700
Cc: Christophe Lauret <clauret@weborganic.com>, Jan Algermissen <jan.algermissen@nordsc.com>, Ted Hardie <ted.ietf@gmail.com>, URI <uri@w3.org>, IETF Discussion <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Oct 2012 03:49:55 -0000

On Tue, Oct 23, 2012 at 4:51 PM, Ian Hickson <ian@hixie.ch> wrote:
> On Wed, 24 Oct 2012, Christophe Lauret wrote:
>>
>> As a Web developer who's had to write code multiple times to handle URIs
>> in very different contexts, I actually *like* the constraints in STD 66,
>> there are many instances where it is simpler to assume that the error
>> handling has been done prior and simply reject an invalid URI.
>
> I think we can agree that the error handling should be, at the option of
> the software developer, either to handle the input as defined by the
> spec's algorithms, or to abort and not handle the input at all.

Yes, input is handled according to the specs' algorithmS.

>> But why not do it as a separate spec?
>
> Having multiple specs means an implementor has to refer to multiple specs
> to implement one algorithm, which is not a way to get interoperability.
> Bugs creep in much faster when implementors have to switch between specs
> just in the implementation of one algorithm.

One algorithm? There seem to be several functions...

- URI reference parsing (parse : scheme -> string -> raw uri_ref)
- URI reference normalization (normalize : raw uri_ref -> normal uri_ref)
- absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option)
- URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref)

Of course, some of these may be composed in any given implementation.
In the case of a/@href and img/@src, it appears to be something like
(one_algorithm = (resolve base_uri) . normalize . parse (scheme
base_uri)) is in use.

A good way to get interop is to thoroughly define each function and
supply implementors with test cases for each processing stage
(one_algorithm's test cases define some tests for parse, normalize,
and resolve as well).

Some systems use more than the simple function composition of web browsers...

>> Increasing the space of valid addresses, when the set of addressable
>> resources is not actually increasing only means more complex parsing rules.
>
> I'm not saying we should increase the space of valid addresses.

Anne's current draft increases the space of valid addresses. This
isn't obvious as Anne's draft lacks a grammar and URI component
alphabets. You support Anne's draft and its philosophy, therefore you
are saying the space of valid addresses should be expanded.

Here is an example of a grammar extension that STD 66 disallows but
WHATWGRL allows:
<http://www.rfc-editor.org/errata_search.php?rfc=3986&eid=3330>

> The de facto parsing rules are already complicated by de facto requirements for
> handling errors, so defining those doesn't increase complexity either
> (especially if such behaviour is left as optional, as discussed above.)

*parse* is separate from *normalize* is separate from checking if a
reference is absolute (*absp*) is separate from *resolve*.

Why don't we have a discussion about the functions and types involved
in URI processing?

Why don't we discuss expanding allowable alphabets and production rules?

David