Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Andy Bierman <andy@yumaworks.com> Wed, 30 August 2017 14:52 UTC

Return-Path: <andy@yumaworks.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7CF93132C36 for <netmod@ietfa.amsl.com>; Wed, 30 Aug 2017 07:52:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yumaworks-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ljvXhL8MvMVX for <netmod@ietfa.amsl.com>; Wed, 30 Aug 2017 07:52:04 -0700 (PDT)
Received: from mail-wr0-x22d.google.com (mail-wr0-x22d.google.com [IPv6:2a00:1450:400c:c0c::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B81EA124B18 for <netmod@ietf.org>; Wed, 30 Aug 2017 07:52:03 -0700 (PDT)
Received: by mail-wr0-x22d.google.com with SMTP id z91so19228845wrc.1 for <netmod@ietf.org>; Wed, 30 Aug 2017 07:52:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yumaworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=FE6LuSj/ftXhdqx8xpAnNL7r+UDWdHwFr6it7D9bx+E=; b=swxQcAEZa9uoqY/JiWy7AGDMZLoPz73+ZUR5oIXtxa12e2A3Yr5k5wh4fthogrcV5/ T+kecihOQBK7xHlvEIE+QWyeRr5PqkXGbUSm3njPFycLCb6mHvVFYsPDwNbLCh/nXmAs qbz40UJtTmBB9jGMPfGwCU5DpNUt4NTw2jhkSYR4JQrP4s4rCdrsFX4OqFFhn+qI61W6 n52ETLXA3r+fkcQeNBptkWykRHhqsvTBSuWaJIxy6xX7ZIyYBA/3TuOCqVH+Z3BwPHaE Yx+J7LJSy5deZztfS4DR0cPG2OhtvQEcquWL8WHX3xIlpFmiW8LjFbeZC12PWqoPcZn7 xd1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=FE6LuSj/ftXhdqx8xpAnNL7r+UDWdHwFr6it7D9bx+E=; b=OxRT0YW9V85kFJMO/NJGhyz0WDEk+4QVa0PXz4qNTp6EWINnb/RO3F1gBfr8raJFhr NIGlwtzv4Ft1mNcZOuIhr8JzOhtJByx7CTQdFL96c1awUD8iXySaIx4Aa0/52ySGCn/r 3X3PNSI3RJ9ZdtVhByOThHgiacg1I64h+G0F0C0IWcAMpbr/HzPll1MGYNf8ISxrluDo 5IL8U7Z/Cgm3CgS1cp7gCpmG75btwVsceA32TZSSg611DJsAgoTJpY679P9OIgMkncJy FVKEAxOi0AlrAkabFNfFjNLhJi6UKCZTRsFajpvHn2ChekLw+DXurSbe+0GBbheBymz9 yXpg==
X-Gm-Message-State: AHYfb5g7/UdEc3kplnuFwi2G3FyxfA4zLP4E9u2xysCP73T7V/8PNoDw pyle289S5VdM3vHlRCkg9WPgF2xh3Yb/
X-Received: by 10.223.142.237 with SMTP id q100mr1399797wrb.228.1504104722108; Wed, 30 Aug 2017 07:52:02 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.223.171.84 with HTTP; Wed, 30 Aug 2017 07:52:01 -0700 (PDT)
In-Reply-To: <20170830123156.cssrg5kklpo67fie@elstar.local>
References: <599F0991.7020900@tail-f.com> <BN3PR0201MB0867A248887538077CD5D49FF19B0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170825125254.6nhnzkrar6fhu7zr@elstar.local> <BN3PR0201MB086796F09BFD77FCD718C21BF19E0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170828154640.pzg7jfy5uepkb22q@elstar.local> <c8de6140-af50-0a4b-a479-b011a8dfbbe7@cisco.com> <CABCOCHRNt3Tkxy8Ffz3JGgPe-rQYwZ3MTLmD43OQi4P6tZQJmg@mail.gmail.com> <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local>
From: Andy Bierman <andy@yumaworks.com>
Date: Wed, 30 Aug 2017 07:52:01 -0700
Message-ID: <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com>
To: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>, Robert Wilton <rwilton@cisco.com>, Andy Bierman <andy@yumaworks.com>, Xufeng Liu <Xufeng_Liu@jabil.com>, "netmod@ietf.org" <netmod@ietf.org>
Content-Type: multipart/alternative; boundary="f403045f51a2aaa8d30557f9ab56"
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/-_j_syTMKs0CikOkt_EnisHWxuo>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Aug 2017 14:52:06 -0000

On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder <
j.schoenwaelder@jacobs-university.de> wrote:

> On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
> >
> >
> > On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > > Hi Andy,
> > > >
> > > > What I am suggesting makes it easier for readers, because I am a
> proponent
> > > > of simpler regular expressions that are easy to read and understand.
> > > >
> > > > For example, I wonder how many YANG model readers would immediately
> > > > comprehend what this pattern statement means:
> > > >
> > > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > > >
> > > > Does allowing such patterns really make it easier for model readers?
> > > This is always difficult to judge but to be fair you have to show how
> > > you express _the same_ (and not a subset) with some other kind of
> > > regular expressions. (My understanding is that \p{Sc} is a currency
> > > symbol.)
> > Yes, the expression would cover a currency amount, along with associated
> > symbol (e.g. "$200.00").
> >
> > If I was writing a module, I would probably use the following pattern
> > statement instead, which I think a lot more people would likely be able
> to
> > comprehend:
> >
> > pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217, currency
> codes.  e.g. ("USD 200.00")
>
> But that is not the same. Apples versus oranges. (I expect people to
> tell me that (i) currency is irrelevant and (ii) that three ASCII
> letter currency acronyms are better than currency symbols anyway but
> this is a separate discussion I am not interested in.)
>
> > >
> > > > The proposes guidelines obviously make it easier (or at least no
> harder) for
> > > > tool makers.
> > > >
> > > > I agree that there is an minor impact to model writers, but really
> only in
> > > > the sense that the guidelines would be telling them not to use the
> esoteric
> > > > options of the XML regex syntax that they probably don't know about
> anyway.
> > > What is 'esoteric' largely depends on your language environment. What
> > > you are saying by 'do not use \p{}' is essentially 'do not use any
> > > unicode long live ASCII'.
> > No, that is not my intention, i.e. I'm not suggesting banning all use of
> > \p{}, but instead limiting it to the character classes that seem like
> they
> > may plausibly be used in standardized YANG modules.
>
> This is entirely subjective. And if you still allow some \p{}, what is
> the point of the exercise?
>
> > I'm not trying to change what 6020/7950 defines the pattern statement as,
> > just give what I perceive as some pragmatic guidance as to what parts of
> XML
> > RE it makes sense to use in standardized YANG modules, making it easier
> for
> > readers and implementations.
> >
> > I think that it is fine for companies, vendors, etc to use the full
> breadth
> > of XML RE if they wish.
>
> Implementations have to be prepared to handle XSD pattern if they
> claim compliance to YANG 1.0 and 1.1. So all this only helps
> non-compliant implementations. This may indeed be a goal - but then we
> should spell this out as such - this helps non-compliant
> implementations (and they may still fail on the first \p{} that
> you still allow).
>
> If implementations do not implement the YANG pattern statement but
> something else, then then they should ignore patterns they can't
> understand and treat the pattern as if it would have been in a
> description clause - i.e., leave it to humans to write the code that
> implements the pattern correctly. Note that YANG does not say anything
> how stuff is implemented.
>


This does not work.
There are 3 outcomes from the regex compiler

1) proper syntax was used and accepted; pattern matches correctly
2) improper syntax was used and accepted; pattern matches incorrectly
3) improper syntax was used and rejected; compiler error generated

Case (2) is the really bad one and we have seen in in bug reports.

This issue was discussed in detail for almost 2 years and the conclusion was
that a YANG extension would be used to specify other pattern types than
the XSD pattern mandated by the standard.


> /js
>
>
Andy


> --
> Juergen Schoenwaelder           Jacobs University Bremen gGmbH
> Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
> Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>
>