Re: [apps-discuss] presumption that RFC3986 is correct

Sam Ruby <rubys@intertwingly.net> Sat, 03 January 2015 20:46 UTC

Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB82F1A0263 for <apps-discuss@ietfa.amsl.com>; Sat, 3 Jan 2015 12:46:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.3
X-Spam-Level:
X-Spam-Status: No, score=-1.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_35=0.6, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vNNsgqeNg7jV for <apps-discuss@ietfa.amsl.com>; Sat, 3 Jan 2015 12:46:05 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.225]) by ietfa.amsl.com (Postfix) with ESMTP id 2E2111A024C for <apps-discuss@ietf.org>; Sat, 3 Jan 2015 12:46:05 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:45039] helo=rubix) by cdptpa-oedge02 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 27/00-08196-B0558A45; Sat, 03 Jan 2015 20:46:04 +0000
Received: from [192.168.1.102] (unknown [192.168.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 0E971140128; Sat, 3 Jan 2015 15:46:03 -0500 (EST)
Message-ID: <54A8550A.1020708@intertwingly.net>
Date: Sat, 03 Jan 2015 15:46:02 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>
References: <20140926010029.26660.82167.idtracker@ietfa.amsl.com> <EAACE200D9B0224D94BF52CF2DD166A425A68A90@ex10mb6.qut.edu.au> <CACweHNBEYRFAuw9-vfeyd_wf703cvM3ykZoRMqAokRFYG_O7hQ@mail.gmail.com> <DM2PR0201MB09602B351692D424A49C6B0DC3650@DM2PR0201MB0960.namprd02.prod.outlook.com> <CACweHNBN_Bv=jeXQ_VwXi2HzHKNEwZJ1NiF-BJJo_9-mhO60gQ@mail.gmail.com> <54A5730C.8040501@ninebynine.org> <54A583DD.9010602@intertwingly.net> <54A59651.4060306@ninebynine.org> <54A59B26.5000408@intertwingly.net> <54A6AABF.4060406@ninebynine.org> <54A6B6DF.1010206@intertwingly.net> <54A7DC46.2020708@ninebynine.org> <54A7E9F4.80406@intertwingly.net> <54A820EA.20200@ninebynine.org> <54A82CC4.9080606@intertwingly.net> <54A83B72.4010106@cs.tcd.ie>
In-Reply-To: <54A83B72.4010106@cs.tcd.ie>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RR-Connecting-IP: 107.14.168.130:25
X-Cloudmark-Score: 0
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/FtsMkGHsIWqxCoSbgeCCp3qiEpY
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] presumption that RFC3986 is correct
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Jan 2015 20:46:07 -0000

On 01/03/2015 01:56 PM, Stephen Farrell wrote:
>
> Hi Sam,
>
> On 03/01/15 17:54, Sam Ruby wrote:
>>
>> I intend to work with implementors, providing patches and/or new
>> implementations along the way.  And I'll continue to document and
>> publish findings.  One such place I have published such work is at the
>> W3C:
>>
>>    http://www.w3.org/TR/url/
>
> I have at least one question about how you (or W3C, or any of us)
> plan to head towards some reasonable level of completeness with
> that work. (This may be a bit of an aside in the current discussion,
> or maybe not, I'm not sure.)
>
> The draft at the URL above includes [1], which is a risibly small
> and fixed (?) subset of an IANA registry. [2] What's the plan for
> making that sensible? I would assume pointing at the IANA registry
> is the simple and obvious fix there, but am puzzled as to why that
> hasn't been done in the few years this text has been around.
>
> Is that just an oversight? Or is your work really only covering
> exactly that particular subset of schemes? Or something else?

This is a valid question, and the subject of an open bug:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233

So the short answer is: it is a known issue, and suggestions are welcome.

The longer answer isn't all that much longer.  Given that every modern 
programming language (and for that matter, every browser) will have a 
part of their runtime library a concept of either a URI or a URL, and a 
method to parse a string into such a structure, the question you pose is 
equivalent to: "how should URI.parse methods handle unknown schemes"?

Possible answers include: treat the content as hierarchical, and treat 
the content as opaque.  There may be other answers.

What there probably needs to be is a sane default, and a way to register 
new schemes.  At the moment, the URL Working Draft treats unknown 
schemes as opaque.  The bug suggests that hierarchical might be a better 
choice.

As to registration, at the moment that is undefined.  The spec literally 
says "..." at this point:

   http://www.w3.org/TR/url/#url-writing

The hope is to work together with the authors of the following 
Internet-Draft:

   https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg

This is mentioned in bullet 3 of the following section:

   https://tools.ietf.org/html/draft-ruby-url-problem-00#section-4

Meanwhile, patches are welcome!  It may be that there are certain URI 
schemes that defy conventional classification (file: certainly comes to 
mind, there may be others) that need to be specified explicitly in the 
specification.

The easiest way to participate is to propose tests in the form of input 
strings, base strings, and expected results.  That data will be added to:

https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt

And I'll use that data to update:

https://url.spec.whatwg.org/interop/test-results/

Should you feel inclined to suggest changes to the URL spec, I'd 
encourage you to look at the following which contains an incomplete but 
significantly reworked parser:

https://specs.webplatform.org/url/webspecs/develop/

That repository has a bunch of other things, including the evaluation 
scripts and a reference implementation.  More information can be found here:

https://github.com/webspecs/url#the-url-standard

> Thanks,
> S.

- Sam Ruby

> [1] http://www.w3.org/TR/url/#relative-scheme
> [2] https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml