Re: [apps-discuss] Fun with URLs and regex

Sam Ruby <rubys@intertwingly.net> Sun, 18 January 2015 22:52 UTC

Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 653351ACE72 for <apps-discuss@ietfa.amsl.com>; Sun, 18 Jan 2015 14:52:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.3
X-Spam-Level:
X-Spam-Status: No, score=-1.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_25=0.6, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OJCB76pCEHOa for <apps-discuss@ietfa.amsl.com>; Sun, 18 Jan 2015 14:52:08 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.229]) by ietfa.amsl.com (Postfix) with ESMTP id 535D61ACE7E for <apps-discuss@ietf.org>; Sun, 18 Jan 2015 14:52:08 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:43695] helo=rubix) by cdptpa-oedge01 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 61/4A-21299-7193CB45; Sun, 18 Jan 2015 22:52:07 +0000
Received: from [192.168.1.115] (unknown [192.168.1.115]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id ADAB41402BE; Sun, 18 Jan 2015 17:52:07 -0500 (EST)
Message-ID: <54BC3916.7000800@intertwingly.net>
Date: Sun, 18 Jan 2015 17:52:06 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Mark Nottingham <mnot@mnot.net>, IETF Apps Discuss <apps-discuss@ietf.org>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net>
In-Reply-To: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-RR-Connecting-IP: 107.14.168.118:25
X-Cloudmark-Score: 0
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/kXQvutdViy75km_7scwEE0Rk9_g>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Jan 2015 22:52:09 -0000

On 01/07/2015 04:35 PM, Mark Nottingham wrote:
> I’ve updated my Python script that serves as a translation of ABNF for URIs into regex.
>
> https://gist.github.com/mnot/138549

I've attempted to convert this to JavaScript:

https://url.spec.whatwg.org/reference-implementation/uri-validate.js

I've built a web page that makes use of it:

https://url.spec.whatwg.org/reference-implementation/uri-validate.html

- - -

Issues I encountered the process:

1) file_auth_path is defined with four instead of three double quotes

2) the last re.match rejects inputs that do not have a hash sign

3) extra, and potentially misleading, output is provided if instr starts 
with the characters "absolute:".

- Sam Ruby