[apps-discuss] Fun with URLs and regex

Mark Nottingham <mnot@mnot.net> Wed, 07 January 2015 21:36 UTC

Return-Path: <mnot@mnot.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 00E721A6F5D for <apps-discuss@ietfa.amsl.com>; Wed, 7 Jan 2015 13:36:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.703
X-Spam-Level:
X-Spam-Status: No, score=-0.703 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hw9P1Zvx_CNt for <apps-discuss@ietfa.amsl.com>; Wed, 7 Jan 2015 13:36:07 -0800 (PST)
Received: from mxout-07.mxes.net (mxout-07.mxes.net [216.86.168.182]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D9EC41A6F3D for <apps-discuss@ietf.org>; Wed, 7 Jan 2015 13:36:06 -0800 (PST)
Received: from [192.168.159.227] (unknown [104.132.4.110]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id B105A22E25F; Wed, 7 Jan 2015 16:35:59 -0500 (EST)
From: Mark Nottingham <mnot@mnot.net>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 07 Jan 2015 16:35:58 -0500
Message-Id: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net>
To: IETF Apps Discuss <apps-discuss@ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
X-Mailer: Apple Mail (2.1993)
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/ajTG0cK3f0jJNPcVvkY1qmrohno
Subject: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jan 2015 21:36:09 -0000

I’ve updated my Python script that serves as a translation of ABNF for URIs into regex.

https://gist.github.com/mnot/138549

It now validates the following URI schemes according to their respective specifications:
  - http
  - https
  - file
  - data
  - gopher
  - ws
  - wss
  - mailto

I didn’t finish mailto or data, because they allow quoted-string inside of URLs, and that makes my head hurt.

Would the respective communities review the regex to make sure they’re faithful (except for the caveat around quoted strings)?

Cheers,

--
Mark Nottingham   http://www.mnot.net/