Re: [apps-discuss] Fun with URLs and regex

Mark Nottingham <mnot@mnot.net> Thu, 08 January 2015 15:06 UTC

Return-Path: <mnot@mnot.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9E581A8A5E for <apps-discuss@ietfa.amsl.com>; Thu, 8 Jan 2015 07:06:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.602
X-Spam-Level:
X-Spam-Status: No, score=-2.602 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6a66NzJjU3YS for <apps-discuss@ietfa.amsl.com>; Thu, 8 Jan 2015 07:06:06 -0800 (PST)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CCFC1A8A6C for <apps-discuss@ietf.org>; Thu, 8 Jan 2015 07:06:01 -0800 (PST)
Received: from [192.168.158.75] (unknown [104.132.4.108]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id DF36B509BB; Thu, 8 Jan 2015 10:05:59 -0500 (EST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <vperaa1clvfrj9hajpjhl7h3senipqdam6@hive.bjoern.hoehrmann.de>
Date: Thu, 08 Jan 2015 10:05:59 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <CAD0DF41-FADC-4DB9-932A-2C16062F4E83@mnot.net>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <vperaa1clvfrj9hajpjhl7h3senipqdam6@hive.bjoern.hoehrmann.de>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
X-Mailer: Apple Mail (2.1993)
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/abB_NyeS_SmEVXn9u0o58_JX9A0>
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jan 2015 15:06:09 -0000

On 7 Jan 2015, at 6:23 pm, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> 
>> I didn’t finish mailto or data, because they allow quoted-string inside 
>> of URLs, and that makes my head hurt.
> 
> The bigger problem might be that URI scheme grammars typically do not
> account for %xx-encoding. It should be fine to write `dAtA:;B%41se64,`,
> so you cannot use literals like `;base64` directly, unless you apply
> some syntax-preserving pre-processing. RFC 6068 also re-writes the RFC
> 5322 productions it uses in prose, and considering how complex the RFC
> 5322 grammar is, it is probably not wise to attempt to do this manually.

This is exactly the problem I ran into.

I think that this really needs to be clearer in the documents; while we all understand that the ABNF needs to be read within the context of the prose, this is stretching credibility IMO.

Cheers,


--
Mark Nottingham   http://www.mnot.net/