Re: [apps-discuss] Fun with URLs and regex

Nico Williams <nico@cryptonector.com> Wed, 28 January 2015 23:08 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB0E71A1BFC for <apps-discuss@ietfa.amsl.com>; Wed, 28 Jan 2015 15:08:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Level:
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n6vrGO-rB7_D for <apps-discuss@ietfa.amsl.com>; Wed, 28 Jan 2015 15:08:34 -0800 (PST)
Received: from homiemail-a111.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) by ietfa.amsl.com (Postfix) with ESMTP id 8F0E31A1B91 for <apps-discuss@ietf.org>; Wed, 28 Jan 2015 15:08:34 -0800 (PST)
Received: from homiemail-a111.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a111.g.dreamhost.com (Postfix) with ESMTP id B680520078714; Wed, 28 Jan 2015 15:08:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=o7Bm3p9YiiY0Z7 N98rFsj1o5o1o=; b=aBHAl8F/PpMkcKxTBcZw6SJZbmbhZQrtACfFzj/59Ocirb 8Cvqtrg856MqalbzMY9YVFBFCrwIQCgU4GtCdcIE0qAQ6iZ9J0VpchkpIhoe/D/S eOu4keFl8VJ9ABBWSpCMlELAdAkAITwGmmPce50QJD5oMg0aFbMD7O2UZUKEw=
Received: from localhost (108-207-244-174.lightspeed.austtx.sbcglobal.net [108.207.244.174]) (Authenticated sender: nico@cryptonector.com) by homiemail-a111.g.dreamhost.com (Postfix) with ESMTPA id 3B6D52007870D; Wed, 28 Jan 2015 15:08:33 -0800 (PST)
Date: Wed, 28 Jan 2015 17:08:32 -0600
From: Nico Williams <nico@cryptonector.com>
To: Sam Ruby <rubys@intertwingly.net>
Message-ID: <20150128230828.GI3110@localhost>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <CACweHNBVOrVMesB7HOjPNHe5FtzL1k9XDGAHUXAx5DbOSYv5jA@mail.gmail.com> <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net> <54AEB660.1020701@intertwingly.net> <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net> <54C84872.5040902@intertwingly.net> <EF1E36FA-6A30-4A65-9520-5A31571EE445@mnot.net> <54C95132.2060402@gmx.de> <154ABFBB-AB8C-447A-89A3-D1746EFBF1C6@gbiv.com> <54C959C9.2090002@intertwingly.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <54C959C9.2090002@intertwingly.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/XTu2W4Cmqhd-l-P9Sq2NVY50sE4>
Cc: Julian Reschke <julian.reschke@gmx.de>, "Roy T. Fielding" <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>, IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jan 2015 23:08:37 -0000

On Wed, Jan 28, 2015 at 04:51:05PM -0500, Sam Ruby wrote:
> 
> 
> On 01/28/2015 04:40 PM, Roy T. Fielding wrote:
> >On Jan 28, 2015, at 1:14 PM, Julian Reschke wrote:
> >>On 2015-01-28 06:36, Mark Nottingham wrote:
> >>>... which brings about another interesting observation -- only http and https define fragments in their syntax; the other schemes do not.
> >>>...
> >>
> >>It's because you asked for that in <https://lists.w3.org/Archives/Public/ietf-http-wg/2013AprJun/0187.html>, and apparently were successful in convincing Roy.
> >
> >It is a very very very old debate regarding whether the fragment is part
> >of a URI or something attached to the end of a URI, but that was resolved
> >in RFC3986 (since the only thing that really matters here is that a fragment
> >is going to be parsed as such regardless of the scheme).
> >
> >HTTP was merely updated to reflect what STD66 calls a URI.
> 
> Based on this discussion, I am gathering that the correct way to
> validate a URI with a known scheme is as follows:
> 
>   return new RegExp("^" + known[scheme] + "($|#" + fragment +
> ")").test(string)
> 
> Anybody care to confirm or deny?

For dereferenceable URI schemes that looks right to me.

It's not really right for mailto: URIs (which aren't dereferenceable).

I suppose data: URIs could have fragments.

Nico
--