Re: [apps-discuss] Fun with URLs and regex

Sean Leonard <dev+ietf@seantek.com> Thu, 29 January 2015 09:10 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7017E1A01D8 for <apps-discuss@ietfa.amsl.com>; Thu, 29 Jan 2015 01:10:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sJ6-HgmyqFeU for <apps-discuss@ietfa.amsl.com>; Thu, 29 Jan 2015 01:10:45 -0800 (PST)
Received: from mxout-07.mxes.net (mxout-07.mxes.net [216.86.168.182]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ED1F21A0102 for <apps-discuss@ietf.org>; Thu, 29 Jan 2015 01:10:44 -0800 (PST)
Received: from [192.168.123.7] (unknown [23.241.1.22]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id CC46C22E263 for <apps-discuss@ietf.org>; Thu, 29 Jan 2015 04:10:43 -0500 (EST)
Message-ID: <54C9F914.4090002@seantek.com>
Date: Thu, 29 Jan 2015 01:10:44 -0800
From: Sean Leonard <dev+ietf@seantek.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: apps-discuss@ietf.org
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <CACweHNBVOrVMesB7HOjPNHe5FtzL1k9XDGAHUXAx5DbOSYv5jA@mail.gmail.com> <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net> <54AEB660.1020701@intertwingly.net> <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net> <54C84872.5040902@intertwingly.net> <EF1E36FA-6A30-4A65-9520-5A31571EE445@mnot.net> <54C95132.2060402@gmx.de> <154ABFBB-AB8C-447A-89A3-D1746EFBF1C6@gbiv.com> <54C95AF7.6030703@gmx.de> <CACweHNBHiEGUwLB3z6YoTexF=b9ApwsUy6-DVCf9vnBSD+L5Rw@mail.gmail.com> <E6AB5A9F-D1DF-45A2-AAEF-FCF2752FD254@gbiv.com> <CACweHNAitEigzDkxOrnR9fkCeMG=ft8g6cVvpmtBrPMMp9xOeA@mail.gmail.com>
In-Reply-To: <CACweHNAitEigzDkxOrnR9fkCeMG=ft8g6cVvpmtBrPMMp9xOeA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------080702090703080806090801"
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/9qYt5HjZr0Y6yPdpRT9zI24mNDs>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jan 2015 09:10:47 -0000

On 1/28/2015 9:14 PM, Matthew Kerwin wrote:
>
> I don't know what I've said by leaving off the fragment part. 
> According to RFC 3986 I'm not allowed to touch the it, but now I have 
> a collected ABNF for 'file' URIs that doesn't have fragments. I think 
> my way forward is to include the fragment part in the grammar, and 
> then deflect it the way 3986, 7230, etc. do.

More than anything else, this is probably an artifact of the way that 
ABNF is written. It is not possible to "subclass" ABNF expressions; 
however, that is what URI scheme registrations should do. I.e., they 
should say "this URI scheme subclasses/profiles the following ABNF 
productions from RFC 3986...scheme = "file", hier-part = such-and-such 
restrictions". Something like that. (In the case of file: , there is the 
other problem that some things that are not valid per RFC 3986, like the 
use of |, you want to say something about anyway, so can be both 
restrictive and permissive in different ways.)

>     ​ ​
>
>     What makes you think that dereferenced files don't have a well-defined
>     content type?  The client might not know what it is, but that doesn't
>     mean the content type doesn't exist, and any decision to process the
>     file is basically an assumption of some content type (and its rules
>     for processing fragments).
>
>
> Perhaps I s​hould have said "well known" instead of "well-defined," or 
> that the *means of determining* the content type is not well defined.
>

Sounds about right.

Sean