[apps-discuss] Fun with URLs and regex

Mark Nottingham <mnot@mnot.net> Wed, 07 January 2015 21:36 UTC

Return-Path: <mnot@mnot.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com []) by ietfa.amsl.com (Postfix) with ESMTP id 00E721A6F5D for <apps-discuss@ietfa.amsl.com>; Wed, 7 Jan 2015 13:36:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.703
X-Spam-Status: No, score=-0.703 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id hw9P1Zvx_CNt for <apps-discuss@ietfa.amsl.com>; Wed, 7 Jan 2015 13:36:07 -0800 (PST)
Received: from mxout-07.mxes.net (mxout-07.mxes.net []) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D9EC41A6F3D for <apps-discuss@ietf.org>; Wed, 7 Jan 2015 13:36:06 -0800 (PST)
Received: from [] (unknown []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id B105A22E25F; Wed, 7 Jan 2015 16:35:59 -0500 (EST)
From: Mark Nottingham <mnot@mnot.net>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 07 Jan 2015 16:35:58 -0500
Message-Id: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net>
To: IETF Apps Discuss <apps-discuss@ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
X-Mailer: Apple Mail (2.1993)
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/ajTG0cK3f0jJNPcVvkY1qmrohno
Subject: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jan 2015 21:36:09 -0000

I’ve updated my Python script that serves as a translation of ABNF for URIs into regex.


It now validates the following URI schemes according to their respective specifications:
  - http
  - https
  - file
  - data
  - gopher
  - ws
  - wss
  - mailto

I didn’t finish mailto or data, because they allow quoted-string inside of URLs, and that makes my head hurt.

Would the respective communities review the regex to make sure they’re faithful (except for the caveat around quoted strings)?


Mark Nottingham   http://www.mnot.net/