Re: [netmod] regular expression flavours (again)

Carsten Bormann <cabo@tzi.org> Fri, 14 June 2019 10:45 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 84536120178 for <netmod@ietfa.amsl.com>; Fri, 14 Jun 2019 03:45:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.197
X-Spam-Level:
X-Spam-Status: No, score=-4.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VnHil4Q9qWEP for <netmod@ietfa.amsl.com>; Fri, 14 Jun 2019 03:45:32 -0700 (PDT)
Received: from smtp.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 540EE120159 for <netmod@ietf.org>; Fri, 14 Jun 2019 03:45:31 -0700 (PDT)
Received: from [192.168.217.113] (p54A6CA4C.dip0.t-ipconnect.de [84.166.202.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.uni-bremen.de (Postfix) with ESMTPSA id 45QHLw6v8zz100L; Fri, 14 Jun 2019 12:45:28 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <BYAPR11MB26314900335A447BF2AD8F81B5EE0@BYAPR11MB2631.namprd11.prod.outlook.com>
Date: Fri, 14 Jun 2019 12:45:28 +0200
Cc: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>, Robert Varga <nite@hq.sk>, NETMOD WG <netmod@ietf.org>
X-Mao-Original-Outgoing-Id: 582201926.407581-d4a8e5626d374ee514a8fdd56116bbb7
Content-Transfer-Encoding: quoted-printable
Message-Id: <AA38E8E2-8778-4A88-8F07-384886F45B38@tzi.org>
References: <291106e34498ebd68f26bf9ff9b679dd5bd8f0cd.camel@nic.cz> <20190612092555.xotrr4moh36xv4kl@anna.jacobs.jacobs-university.de> <4611382f-be58-a20f-1712-e5fb3e4ef3ec@hq.sk> <20190613140655.jyq3iltl2v22ekmb@anna.jacobs.jacobs-university.de> <BYAPR11MB26311142F2841456A42623CDB5EE0@BYAPR11MB2631.namprd11.prod.outlook.com> <815A80B5-A05F-4867-BFF4-7C08081F433A@tzi.org> <BYAPR11MB26314900335A447BF2AD8F81B5EE0@BYAPR11MB2631.namprd11.prod.outlook.com>
To: "Rob Wilton (rwilton)" <rwilton@cisco.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/OL_zn-NpwgK972gq1x2i_J4tDfM>
Subject: Re: [netmod] regular expression flavours (again)
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Jun 2019 10:45:35 -0000

On Jun 14, 2019, at 12:08, Rob Wilton (rwilton) <rwilton@cisco.com> wrote:
> 
> Or perhaps we could define a regex language that worked with normal implementations without requiring any conversion.

Unlikely, as it is easy to write an regex that inadvertently triggers some “feature" in a specific flavor.  So you would need to list all the current and future idiosyncrasies of all flavors and outlaw them.  Worse, what you have outlawed may be *the* valid way to represent something in some other dialect, so the specifier needs to jump through hoops to work around that or simply cannot write down the regex needed.  Also, all those features that are subtly different between flavors (starting with ., \s, …) can’t be used.

The approach probably works for [A-Fa-f0-9]+, but becomes icky for anything more complicated quickly.  (And, actually, XSD regexes are very close to what you would come up with, anyway, except for “features” like character class subtraction or block escapes.)

Oh, and defining “normal implementations” is left as an exercise to the reader :-)

What we could do (and that would be quite useful, I think) is *document* the subset of XSD regexes that actually has the same meaning in PCRE2, Java8, JavaScript, .NET and a few more real-world regex languages after adding the necessary anchors in those languages.

Grüße, Carsten