Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Andy Bierman <andy@yumaworks.com> Wed, 30 August 2017 20:43 UTC

Return-Path: <andy@yumaworks.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 851AE1252BA for <netmod@ietfa.amsl.com>; Wed, 30 Aug 2017 13:43:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yumaworks-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uOWoZPWZePCV for <netmod@ietfa.amsl.com>; Wed, 30 Aug 2017 13:43:25 -0700 (PDT)
Received: from mail-wr0-x22d.google.com (mail-wr0-x22d.google.com [IPv6:2a00:1450:400c:c0c::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3AE91132153 for <netmod@ietf.org>; Wed, 30 Aug 2017 13:43:25 -0700 (PDT)
Received: by mail-wr0-x22d.google.com with SMTP id j29so20572269wre.2 for <netmod@ietf.org>; Wed, 30 Aug 2017 13:43:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yumaworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=5boL/FriCPOQ6IsbdHBNg8sk/HFIQtvzAFBv1QrYbz8=; b=AcmA5dU1a0kHFlQZqE3Os/PM3v4JlT6/jCe7atXuM2UftDPklXHLi9SW1ljQ8Rmk6R ZHS5XD2Pomc91Oct8ElkL6ETyIi4juBm0l78gWmoBTzgqMbdN/WNgUwj+kR6yRLKcPlf xj9Wko+Ymbqn/Kxsi6YHhXviGssaRrQrG4VeglGQ39cF4FUYq4SU39GHaBwCn0aweBY2 hUlGzIR0tR4NuGHL348vSNBgyylHfCmmxUhKaHCBafeOtj+asQWV++ztUgE6gXzJK0lS UJRvAybJFauOBHA8tWDnjVsDjA56uwhczNJE+lwiafJBNhBPqht9+rxQQbz+R/4BGAqK N1HQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5boL/FriCPOQ6IsbdHBNg8sk/HFIQtvzAFBv1QrYbz8=; b=MJ62gmjP1wIUl50fZtWqPTVgfHlkHmoyzoA/7sn8c/udZ42ntwkk7Leyobzw1JrTn9 ehd2iY9H6mYlQcUdy64r6QemCrNl+ZlWRLO5nu3i4waMng2k+0m6656G/t0QQAYljKeI pHRggytWNOf1FFH7lU+xpBMtORbI/qDzwgqWjSt7x+5G+RDVL0UK5zaV+e6cV4/T/aXA KY6Vv+idwlnbOoHgkTH2fW1tI4EgKAutmMtAIFXR78cuQgwsr8Xwod0L4MJpIYpJhuws 8GWd9utdHPFtBfRXZclTL0OLkUa38zbzjkuiavsQUCc5rIceZLv88eHSiQZFNLTrEwBr Qs2A==
X-Gm-Message-State: AHYfb5jEv/+4LxwQQxwubymzaaKkyGNJYUcxb2YYb5GAZWgARKLJW3r8 dmO/+R/7k9DDKZTWVYRKctE9QCUSmNci
X-Received: by 10.223.142.237 with SMTP id q100mr1980423wrb.228.1504125803679; Wed, 30 Aug 2017 13:43:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.223.171.84 with HTTP; Wed, 30 Aug 2017 13:43:22 -0700 (PDT)
In-Reply-To: <36B35912-1FC1-4B05-A61A-44D21813CC79@juniper.net>
References: <599F0991.7020900@tail-f.com> <BN3PR0201MB0867A248887538077CD5D49FF19B0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170825125254.6nhnzkrar6fhu7zr@elstar.local> <BN3PR0201MB086796F09BFD77FCD718C21BF19E0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170828154640.pzg7jfy5uepkb22q@elstar.local> <c8de6140-af50-0a4b-a479-b011a8dfbbe7@cisco.com> <CABCOCHRNt3Tkxy8Ffz3JGgPe-rQYwZ3MTLmD43OQi4P6tZQJmg@mail.gmail.com> <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com> <36B35912-1FC1-4B05-A61A-44D21813CC79@juniper.net>
From: Andy Bierman <andy@yumaworks.com>
Date: Wed, 30 Aug 2017 13:43:22 -0700
Message-ID: <CABCOCHRtGxSCxC76T0DgEnr=bdaRE6NomhbY_+eOvPfGyzPp0w@mail.gmail.com>
To: Kent Watsen <kwatsen@juniper.net>
Cc: Robert Wilton <rwilton@cisco.com>, Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>, Xufeng Liu <Xufeng_Liu@jabil.com>, "netmod@ietf.org" <netmod@ietf.org>
Content-Type: multipart/alternative; boundary="f403045f51a239e52e0557fe948e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/QQBQd_1V0wnVHnZf61sCc3oSCkA>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Aug 2017 20:43:28 -0000

Hi,

The burden this would place on YANG writers would be excessive.
We learned in SNMP-land about CLRs (clever little rules) and how they need
to be avoided. We learned that special-casing and sub-setting technology has
its own costs, which are usually more than the problem they solved
(e.g., counter names MUST be in the plural form).


Andy



On Wed, Aug 30, 2017 at 1:03 PM, Kent Watsen <kwatsen@juniper.net> wrote:

>
>
> As Andy says, readability is #1, and it follows that a restricted subset
> would be more understandable.  Standardizing this would require an update
> to RFC 7950 (read: not going to happen anytime soon).  Maybe we could start
> with just having a tool detect when something outside the common-subset is
> used.   Can a "common subset" be well-defined?  - "common" between how many
> engines? - would it be forever evolving?
>
>
>
> K. // contributor
>
>
>
>
>
> On 8/30/17, 12:44 PM, "netmod on behalf of Robert Wilton" <
> netmod-bounces@ietf.org on behalf of rwilton@cisco.com> wrote:
>
>
>
> I actually think that XML RE is a good choice for YANG pattern statements
> (because it is one of the more simple RE languages), I just don't think
> that we need all of it.
>
>
> First question: How many pattern statements in draft and standard IETF
> YANG modules actually use Unicode properties (e.g \p{}).
> Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.
>
> E.g.       pattern
>         '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
>       +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
>       + '(%[\p{N}\p{L}]+)?';
>
> This could quite possibly have been written just as
> "\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.
>
> There a couple more occurrences of Unicode character classes in the vendor
> models on github, but only to restrict them to the ASCII character set (oh
> the irony), which I believe can be accomplished without resorting to
> Unicode properties.
>
>
> Another question: How often is character class subtraction (e.g.
> [A-Z-[PQ]] used in standard & the github YANG modules?
> Answer: 0.  AFAICT, it isn't used at all, anywhere ...
>
>
>
> Now, I'm not proposing using a different regex syntax for pattern
> statements, just a sensible subset of XSD RE, such that it easier for folks
> to read/review pattern statements, and it is easier for client and server
> implementations to translate into other common regex implementations if
> they so wish.
>
> Of course, as part of that translation, I would expect a translation
> function to check and generate an error if the translation cannot handle
> the input regex (e.g. if it uses an obscure unmatched unicode property or a
> unicode block, or character class subtraction syntax).  This really doesn't
> seem hard to me.
>
> But the XML RE language has stuff in it that I don't think anyone is ever
> going to use in a standardized network management YANG model.   Forcing
> everyone to implement support for this stuff just seems like a complete
> waste of time and effort.  Looking at the regex info website it looks like
> there are about 143 unicode properties and blocks defined (it may be
> incomplete), or which I think that 135+ of these probably have no relevance
> in network management YANG modules, and the benefit of the remaining ones
> is pretty suspect.
>
> I mean, how many network management YANG modules really need a pattern
> statement that only matches Runic characters?  Perhaps someone out there is
> busy defining "middle-earth.yang" ;-)
>
> If I am the only person opposed to making life unnecessarily difficult to
> readers of YANG models, and client/server tool implementors interacting
> with YANG then it is probably time to give up this discussion. ;-)
>
> Python, quite likely a common tool for client side network management,
> also doesn't seem to have any support of unicode properties or blocks.
> Perhaps implementations will hook it up to libxml2 instead, or write a full
> translation XML RE to Python RE conversion tool.  But probably most people
> will just feed the pattern statement into the native Python regex engine,
> and my guess is that this will probably work 95% of the time.  The other 5%
> ... who knows what will happen ... oh well, better to try and fail than to
> not try at all.
>
> Apologies if this email comes across as a rant.
>
> Rob
>
>
>
>