Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Robert Wilton <rwilton@cisco.com> Mon, 04 September 2017 14:05 UTC

Return-Path: <rwilton@cisco.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EBA3132A84 for <netmod@ietfa.amsl.com>; Mon, 4 Sep 2017 07:05:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.5
X-Spam-Level:
X-Spam-Status: No, score=-14.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0CLn0BNJmhP3 for <netmod@ietfa.amsl.com>; Mon, 4 Sep 2017 07:05:03 -0700 (PDT)
Received: from aer-iport-4.cisco.com (aer-iport-4.cisco.com [173.38.203.54]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EDD25129C41 for <netmod@ietf.org>; Mon, 4 Sep 2017 07:05:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=13180; q=dns/txt; s=iport; t=1504533903; x=1505743503; h=subject:to:references:from:message-id:date:mime-version: in-reply-to; bh=t2qh6xIusZ23PgDiWhn7wkK15i0oKxs8xtSY5d2Vtq8=; b=MQarm2WyJDMahwfV5DdR8sWFZDRGYVPgV88q5i1gN0SlZka/dxJvqv/o ECSWc5fzuY5dCcDm1Z6VJ9tPne3dNG96wwYg7XPtpnwVmDO5GWLNTOSev 8aJ4aOQsQCzPJqgxJ4TamuYKJlblSa3ht4Jd8BRwt1ohREwmlr0QV2dea U=;
X-IronPort-AV: E=Sophos;i="5.41,474,1498521600"; d="scan'208,217";a="657235058"
Received: from aer-iport-nat.cisco.com (HELO aer-core-1.cisco.com) ([173.38.203.22]) by aer-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Sep 2017 14:05:01 +0000
Received: from [10.63.23.66] (dhcp-ensft1-uk-vla370-10-63-23-66.cisco.com [10.63.23.66]) by aer-core-1.cisco.com (8.14.5/8.14.5) with ESMTP id v84E50rw026631; Mon, 4 Sep 2017 14:05:01 GMT
To: Andy Bierman <andy@yumaworks.com>, Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>, "Acee Lindem (acee)" <acee@cisco.com>, "netmod@ietf.org" <netmod@ietf.org>
References: <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com> <1504223854014.55228@Aviatnet.com> <847e5bf9-7b3d-9ff8-9954-970f32a2094c@cisco.com> <20170902073342.xoziwor4tdr5bipw@elstar.local> <D5D00209.C5C67%acee@cisco.com> <20170902112832.ymorfgdthobeio6q@elstar.local> <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com>
From: Robert Wilton <rwilton@cisco.com>
Message-ID: <1e95ba5d-7aa2-e08f-56f9-27aa70822a11@cisco.com>
Date: Mon, 04 Sep 2017 15:05:00 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------9777293457911A956D3A9FB2"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/hSCfaBORgRN_sZBl4orkJsIa9TY>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Sep 2017 14:05:06 -0000

Hi Andy,


On 02/09/2017 17:46, Andy Bierman wrote:
>
>
> On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder 
> <j.schoenwaelder@jacobs-university.de 
> <mailto:j.schoenwaelder@jacobs-university.de>> wrote:
>
>     On Sat, Sep 02, 2017 at 10:39:57AM +0000, Acee Lindem (acee) wrote:
>     >
>     > This is not an effort to change or bifurcate the YANG 1.1. It is
>     simply to
>     > RECOMMEND a proper subset of XSD pattern that is more portable.
>     >
>
>     If you implement YANG as it is defined, pattern are portable. Given
>     this, I do not understand the notion of 'more portable'.
>
>     Anyway, it seems that those who want a more portable subset do not
>     even agree on what that subset is. Perhaps people pushing for this
>     should go and write an I-D that explains why a 'more portable' subset
>     is needed (which problems are we fixing), that defines such a 'more
>     portable subset', and which includes the reasoning how the subset has
>     been determined.
>
>
>
> I do not agree that the YANG pattern contains a string that is both a 
> POSIX and XSD regular expression.
> The RFC is very clear it contains an XSD expression. Pretending it is 
> both is a hack that does not even seem
> to work 100%, so it is not reliable.
I am not suggesting that the YANG pattern is both a POSIX and XSD 
regular expression.

I am only suggesting that the guidelines recommend that authors use a 
subset of XSD, to make it easier to programmatically *convert* the 'XSD 
subset compliant regular expression' into a functionally equivalent 
regular expression for whatever regular expression engine the tooling 
decides to use.

E.g. this seems to be the approach used by "libyang" that uses libpcre 
as the backend RE library rather than libxml. Unfortunately, I think 
that the libyang library would currently fail if the pattern statement 
contained "[[A-Z]-[P-R]]" because it looks like the PCRE2 language does 
not support character class subtraction.  ACAICT, no standard YANG 
modules currently support character class subtraction, so the authors of 
libyang have a choice here:
   (i) write a block of code that most likely nobody is going to use, or
   (ii) document the limitation, spot character class subtraction in the 
regex, and flag that it is not supported (or perhaps just ignore it).


>
> If the community wants to support both XSD and POSIX expressions, then 
> the proper engineering
> solution is to introduce a new statement that is defined to contain a 
> POSIX expression.
> This can be done with a YANG extension now and added to YANG 2.0 later.
I think that this is an inferior solution:
- there are many languages that YANG tools could be written in: C/C++, 
Python, Java, Go, Rust, Javascript are all reasonably plausible choices.
- they all have similar, but with small differences regular expression 
flavours (according to http://www.regular-expressions.info/reference.html).
- Personally, I see no inherent advantage of the POSIX Extended Regex 
over XML RE.   In fact, given that it doesn't support Unicode at all, it 
would seem to be a somewhat strange choice for a second pattern statement.
- Nor does it seem pragmatic to introduce lots of different flavors of 
pattern statements into YANG each supporting a different regex syntax.

I also don't like the solution that every YANG tool maker has to either 
link against libxml2,  or write their own efficient regular expression 
engine.  I'm not convinced that what the world needs is yet more regular 
expression implementations :-)

So, I still see that the better technical solution is always only define 
the pattern statements in XML RE language, but to strongly encourage 
folks to use a subset of that language for standards models (which they 
appear to be doing anyway) to make it easier to covert the regular 
expression into compatible versions for other engines.

Thanks,
Rob


>
>     /js
>
>
> Andy
>
>
>     --
>     Juergen Schoenwaelder           Jacobs University Bremen gGmbH
>     Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
>     Fax:   +49 421 200 3103         <http://www.jacobs-university.de/
>     <http://www.jacobs-university.de/>>
>
>     _______________________________________________
>     netmod mailing list
>     netmod@ietf.org <mailto:netmod@ietf.org>
>     https://www.ietf.org/mailman/listinfo/netmod
>     <https://www.ietf.org/mailman/listinfo/netmod>
>
>