Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Andy Bierman <andy@yumaworks.com> Mon, 04 September 2017 16:50 UTC

Return-Path: <andy@yumaworks.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 489A91321B0 for <netmod@ietfa.amsl.com>; Mon, 4 Sep 2017 09:50:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yumaworks-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tRxEo6jkElOL for <netmod@ietfa.amsl.com>; Mon, 4 Sep 2017 09:50:41 -0700 (PDT)
Received: from mail-io0-x232.google.com (mail-io0-x232.google.com [IPv6:2607:f8b0:4001:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A6D79126C19 for <netmod@ietf.org>; Mon, 4 Sep 2017 09:50:41 -0700 (PDT)
Received: by mail-io0-x232.google.com with SMTP id b2so3059167iof.1 for <netmod@ietf.org>; Mon, 04 Sep 2017 09:50:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yumaworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=px9mw5rc67ivHXlkrvPsHebchrjhGCO6DuMORSXGZNI=; b=zeixHS+XG9Bpy7sGYD18W/48vM+JbybKaOFlYxGrJfc+J4fWqO+wSDpQJ+QW7dr5PF x3v9UXYZn0FCvLgCU/nnFKEkssa1wTnUFw1NRJv6jjkhmA6NTa+CIEemiUgoNDa/uFyZ wfs5184LHknXTtFqDSEFAaigV/j4B+kp4Wkjmkr5ROAATYOyn7veIzYJB6Af2jG8froh 6mNz5p8IfOjr0UADqbXn/X9hqOYPAUDMSiJQQYxppVnGHfUZunNhK+MzxMGvt76RrZJo qo83f0ILnVe+gfoRN5dsGNO+Ao4kkW6JVKakutSj8Xfb66L98w2cg+fudOkTA2LFwdl1 EirQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=px9mw5rc67ivHXlkrvPsHebchrjhGCO6DuMORSXGZNI=; b=S3VWRx+VOB2cHztyteoNuCmc3i26HW1aWZRKmSTQPS2V9V3YWhKWsf58wdUIZ2cdor Aq4EMb7DQ+52j347FUbefMIV+gAJ1ofgRRWyg0CW7WEncXj1/Bt5RmbNqPoWUnNUMhVh VpqzCIr1mMwUxLjgGoiHTmnZh2Jb1UYJQNMzqvaDyY1ufhW+a+Hi4NWyupP6DzAVdN/8 u/6WPVQK1DdXaplRn2JO+2LGzlA3BGRnJ5VjlNlnsQpTzm2fYqjKcz7yxZPf1xl74z9a 5/zATEWETPUzJDRYegk1siAiatQEQgyPt1HuebAme4652TI1C3tjArltg5dz1vrYoUev JrrA==
X-Gm-Message-State: AHPjjUjZDgaQurM3gbw7gkroc9Sq/XLUPwZdcpfY3l6e8P4xH4NOUsOa M+hBsH27+/6pVX87DgftWeNJR2+h+dkG
X-Google-Smtp-Source: ADKCNb4jCtSJs6Z/jX9NgPqLqPr1/ZGidmZAOe+3UJeKtD0n6FcogBVAmA4akb8mo8dA5eMW8hhXoduqnP0z4SVpOnU=
X-Received: by 10.107.51.81 with SMTP id z78mr1347577ioz.146.1504543840901; Mon, 04 Sep 2017 09:50:40 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.36.6.206 with HTTP; Mon, 4 Sep 2017 09:50:40 -0700 (PDT)
In-Reply-To: <f93fca7e-b9f9-f910-b882-111bafa69ce7@cisco.com>
References: <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com> <1504223854014.55228@Aviatnet.com> <847e5bf9-7b3d-9ff8-9954-970f32a2094c@cisco.com> <20170902073342.xoziwor4tdr5bipw@elstar.local> <D5D00209.C5C67%acee@cisco.com> <20170902112832.ymorfgdthobeio6q@elstar.local> <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com> <1e95ba5d-7aa2-e08f-56f9-27aa70822a11@cisco.com> <CABCOCHRyxfMqd5QbxdGg6WpuLybJhS2R01URYV9tK5dxOV9tLg@mail.gmail.com> <f93fca7e-b9f9-f910-b882-111bafa69ce7@cisco.com>
From: Andy Bierman <andy@yumaworks.com>
Date: Mon, 04 Sep 2017 09:50:40 -0700
Message-ID: <CABCOCHRQ2S26pdkd0bWk8NRcr0vpiBBKAe0GOe1=LjzuzVWRqQ@mail.gmail.com>
To: Robert Wilton <rwilton@cisco.com>
Cc: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>, "Acee Lindem (acee)" <acee@cisco.com>, "netmod@ietf.org" <netmod@ietf.org>
Content-Type: multipart/alternative; boundary="001a11440c9c2fc69905585fe938"
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/vopuyG9Fhkq9iDTdWuwCdipjbv4>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Sep 2017 16:50:44 -0000

On Mon, Sep 4, 2017 at 9:22 AM, Robert Wilton <rwilton@cisco.com> wrote:

>
>
> On 04/09/2017 16:55, Andy Bierman wrote:
>
>
>
> On Mon, Sep 4, 2017 at 7:05 AM, Robert Wilton <rwilton@cisco.com> wrote:
>
>> Hi Andy,
>>
>> On 02/09/2017 17:46, Andy Bierman wrote:
>>
>>
>>
>> On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder <
>> j.schoenwaelder@jacobs-university.de> wrote:
>>
>>> On Sat, Sep 02, 2017 at 10:39:57AM +0000, Acee Lindem (acee) wrote:
>>> >
>>> > This is not an effort to change or bifurcate the YANG 1.1. It is
>>> simply to
>>> > RECOMMEND a proper subset of XSD pattern that is more portable.
>>> >
>>>
>>> If you implement YANG as it is defined, pattern are portable. Given
>>> this, I do not understand the notion of 'more portable'.
>>>
>>> Anyway, it seems that those who want a more portable subset do not
>>> even agree on what that subset is. Perhaps people pushing for this
>>> should go and write an I-D that explains why a 'more portable' subset
>>> is needed (which problems are we fixing), that defines such a 'more
>>> portable subset', and which includes the reasoning how the subset has
>>> been determined.
>>>
>>>
>>
>> I do not agree that the YANG pattern contains a string that is both a
>> POSIX and XSD regular expression.
>> The RFC is very clear it contains an XSD expression. Pretending it is
>> both is a hack that does not even seem
>> to work 100%, so it is not reliable.
>>
>> I am not suggesting that the YANG pattern is both a POSIX and XSD regular
>> expression.
>>
>> I am only suggesting that the guidelines recommend that authors use a
>> subset of XSD, to make it easier to programmatically *convert* the 'XSD
>> subset compliant regular expression' into a functionally equivalent regular
>> expression for whatever regular expression engine the tooling decides to
>> use.
>>
>>
> Looks like you want the expression to mean the exact same thing in
> multiple expression languages
> and you want to put the burden of this perfect subset on humans who write
> YANG.
>
> Again, no, that is not what I want.
>
> I would like the rules to recommend that authors of standards based YANG
> modules don't use the bits of the XML RE language that (i) they don't use
> anyway, (ii) don't appear to have any compelling use case in standard YANG
> modules, and (iii) are hard to convert to other RE languages.
>
> There recommendations also have the additional advantage that the pattern
> statements that follow these rules are likely to be much easier to
> understand because they use the aspects of regular expressions languages
> that folks are likely to be more familiar with.
>
> This is a really unworkable plan.
>
> Is my proposed 6087bis text really that complicated?
>


Yes -- way too complicated to burden all YANG module writers.
We did a study at Cisco around 2002 to find out why engineers were having
such a hard time learning to write MIB modules.  It turned out that all the
CLRs,
the "helpful" arcane rules on top of standard rules, were causing great
pain.
IMO the IETF should not create a new special variant of the definition in
XSD-TYPES,
which is what 'SHOULD NOT use' guidelines in 6087bis will do.

The pattern-stmt is like YANG XPath -- it is for describing the constraint.
An implementation is not required to use off the shelf tools to enforce the
constraint.
In this case (libxml2) there are widely-available tools that could be
leveraged.
If conversion to a different parser is the desired implementation choice,
then
that should be the tool-makers problem. If a YANG module writer can be told
"convert A to B"
then so can a tool-maker.




> Thanks,
> Rob
>
>
Andy


>
>
>
>
>> E.g. this seems to be the approach used by "libyang" that uses libpcre as
>> the backend RE library rather than libxml.  Unfortunately, I think that the
>> libyang library would currently fail if the pattern statement contained
>> "[[A-Z]-[P-R]]" because it looks like the PCRE2 language does not support
>> character class subtraction.  ACAICT, no standard YANG modules currently
>> support character class subtraction, so the authors of libyang have a
>> choice here:
>>   (i) write a block of code that most likely nobody is going to use, or
>>   (ii) document the limitation, spot character class subtraction in the
>> regex, and flag that it is not supported (or perhaps just ignore it).
>>
>>
>>
>> If the community wants to support both XSD and POSIX expressions, then
>> the proper engineering
>> solution is to introduce a new statement that is defined to contain a
>> POSIX expression.
>> This can be done with a YANG extension now and added to YANG 2.0 later.
>>
>> I think that this is an inferior solution:
>> - there are many languages that YANG tools could be written in: C/C++,
>> Python, Java, Go, Rust, Javascript are all reasonably plausible choices.
>> - they all have similar, but with small differences regular expression
>> flavours (according to http://www.regular-expressions.info/reference.html
>> ).
>> - Personally, I see no inherent advantage of the POSIX Extended Regex
>> over XML RE.   In fact, given that it doesn't support Unicode at all, it
>> would seem to be a somewhat strange choice for a second pattern statement.
>> - Nor does it seem pragmatic to introduce lots of different flavors of
>> pattern statements into YANG each supporting a different regex syntax.
>>
>>
>
> You seem to be confirming that picking 1 flavor of Posix would be
> impossible.
> All the more reason to keep the XSD pattern unburdened.
> I see no reason XSD patterns should be constrained because some
> implementors want to
> ignore the RFC and pretend the string is some other expression language.
>
>
>
>> I also don't like the solution that every YANG tool maker has to either
>> link against libxml2,  or write their own efficient regular expression
>> engine.  I'm not convinced that what the world needs is yet more regular
>> expression implementations :-)
>>
>
> The write your own tools and don't use libxml2.
>
>
>
>> So, I still see that the better technical solution is always only define
>> the pattern statements in XML RE language, but to strongly encourage folks
>> to use a subset of that language for standards models (which they appear to
>> be doing anyway) to make it easier to covert the regular expression into
>> compatible versions for other engines.
>>
>> Thanks,
>> Rob
>>
>>
>>
>
> Andy
>
>
>>
>>
>>
>>> /js
>>>
>>
>> Andy
>>
>>
>>>
>>> --
>>> Juergen Schoenwaelder           Jacobs University Bremen gGmbH
>>> Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
>>> Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>
>>>
>>> _______________________________________________
>>> netmod mailing list
>>> netmod@ietf.org
>>> https://www.ietf.org/mailman/listinfo/netmod
>>>
>>
>>
>>
>
>