Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Kent Watsen <kwatsen@juniper.net> Wed, 30 August 2017 20:03 UTC

Return-Path: <kwatsen@juniper.net>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 86BFD1326EC for <netmod@ietfa.amsl.com>; Wed, 30 Aug 2017 13:03:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.8
X-Spam-Level:
X-Spam-Status: No, score=-4.8 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=juniper.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lhXM4ydO4WMv for <netmod@ietfa.amsl.com>; Wed, 30 Aug 2017 13:03:46 -0700 (PDT)
Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on0116.outbound.protection.outlook.com [104.47.33.116]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6C881132955 for <netmod@ietf.org>; Wed, 30 Aug 2017 13:03:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=juniper.net; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=T+F9zVXMePsYAzw1AUFArIRW2UmcrKug5gsWN9k535Y=; b=Vbcuj/rTHO9DC77HLdRiHSUDyKdlD9l5wEzq+c7lYK+QAfiyUXTktW5izUqLJ1jQLohWXcZeaAqNcHj0pYYeySl912SUNWEb0ip037zBViADiB5tSkk2Gml7yVf0fBqE//U9CjABYbqBgjue8hMospffSGXm4ZhkIFmPS4JGWKo=
Received: from BN3PR0501MB1442.namprd05.prod.outlook.com (10.160.117.151) by BN3PR0501MB1169.namprd05.prod.outlook.com (10.160.113.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.35.3; Wed, 30 Aug 2017 20:03:44 +0000
Received: from BN3PR0501MB1442.namprd05.prod.outlook.com ([10.160.117.151]) by BN3PR0501MB1442.namprd05.prod.outlook.com ([10.160.117.151]) with mapi id 15.20.0013.011; Wed, 30 Aug 2017 20:03:44 +0000
From: Kent Watsen <kwatsen@juniper.net>
To: Robert Wilton <rwilton@cisco.com>, Andy Bierman <andy@yumaworks.com>, Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>, Xufeng Liu <Xufeng_Liu@jabil.com>, "netmod@ietf.org" <netmod@ietf.org>
Thread-Topic: [netmod] Potential additions to rfc6087bis: RegEx guidelines
Thread-Index: AdMcU4SqqeTEr3DWR4ygcD75zJMNgAAS/WQAAAKnWIAAEce1gAAC0wKAACiZxdAAAInsAACWumggAAY3AAAAL4LGgAADyxwAACOm9gAAAoiAAAACxNmAAAGF9gAABORygAAD6VuA///0vgA=
Date: Wed, 30 Aug 2017 20:03:44 +0000
Message-ID: <36B35912-1FC1-4B05-A61A-44D21813CC79@juniper.net>
References: <599F0991.7020900@tail-f.com> <BN3PR0201MB0867A248887538077CD5D49FF19B0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170825125254.6nhnzkrar6fhu7zr@elstar.local> <BN3PR0201MB086796F09BFD77FCD718C21BF19E0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170828154640.pzg7jfy5uepkb22q@elstar.local> <c8de6140-af50-0a4b-a479-b011a8dfbbe7@cisco.com> <CABCOCHRNt3Tkxy8Ffz3JGgPe-rQYwZ3MTLmD43OQi4P6tZQJmg@mail.gmail.com> <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com>
In-Reply-To: <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/f.20.0.170309
x-originating-ip: [66.129.241.10]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BN3PR0501MB1169; 6:aKly56dno1eW8l4aH2yyCcPivJDoC6HLDkygxMHZrYTpMIXh++UjceKcNQzGCdBu709qhMPXtfM5tbc/zs7E1W2TU3w35aGgCj9vN+DW8poLHR+CBypABnFmxVEcgQeTA643rea/YdvyMV89ttneWwRkTIKVV/xwcjNoFDXB1TmlO1Oufc08KeUgYH0fhXusf6Bl3UG8XvrZSeTvqXewXnTiTPzfZlD9YB/n6EogxMbE3lZ2Iui/ry4EB9FJyl1ntMIXv5FLVzsSBtRqhdQFqfLA0VfcaQP3Xpvf1DCWNWvLlh/LhV9A9w19RHsXhiG7sBvie5sU5ekVd7hPD8eiRg==; 5:yy2WvuvR39nPelNG6QZ7kIiDvmXZuz3Ijkx7xQM5UcSlk/sL2j712tz2y5D5v6GKIYIFQOiebSa/LXsFgdVjSvzUp3rUiAB7SSXfsobkMFvf0V4Rv3wP0eKR9fgxlVNcrXTR8F1XXDDdLJi020XEkA==; 24:6NqgEEWVhpEHtNVvXjOoLPdhJISIxcktJIMA2H0sQoDZm+ivmS1dQ2kYaX9rwQZa0zVxCxww0ArXD0g11cMLIjE7joTWsTaX7NJP8Tez/nk=; 7:ixDqFNctwAH/tnrZ1+GTCOk1dYJGdvc4VLbsuZsI0/7zUdBjiTMXk42SAmKuLhtYUotPjsre1Zk0PS1+noxOVMofXUTWtUiRfU4fl/ngbLiaE3ZvQngA1mU8SMMyPTiHSdLj2Du2HuBRVj24WQZ1e8abYcXxaxWBGTrLmql+JazNF6GkoaJiIaIkdk0YBqXudqsDpcs39Zd+3Z2HAJS/ajGCRXt1wXgdy8bZthc+fQ4=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: b423f92e-211e-42f4-6f7b-08d4efe237c7
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:BN3PR0501MB1169;
x-ms-traffictypediagnostic: BN3PR0501MB1169:
x-exchange-antispam-report-test: UriScan:(158342451672863)(244540007438412)(95692535739014)(21748063052155);
x-microsoft-antispam-prvs: <BN3PR0501MB1169C62EB0DBE5EB2CDB9825A59C0@BN3PR0501MB1169.namprd05.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(3002001)(6055026)(6041248)(20161123558100)(20161123555025)(20161123562025)(20161123564025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BN3PR0501MB1169; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BN3PR0501MB1169;
x-forefront-prvs: 041517DFAB
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39860400002)(24454002)(189002)(199003)(51444003)(377454003)(83716003)(6436002)(105586002)(25786009)(2950100002)(229853002)(6246003)(106356001)(6486002)(77096006)(99286003)(6506006)(236005)(53936002)(6512007)(54896002)(6306002)(33656002)(36756003)(5660300001)(86362001)(101416001)(50986999)(2900100001)(76176999)(54356999)(66066001)(189998001)(3846002)(102836003)(6116002)(53546010)(3660700001)(8936002)(478600001)(7736002)(8676002)(83506001)(68736007)(81166006)(82746002)(93886005)(4001350100001)(14454004)(3280700002)(2906002)(81156014)(97736004)(2501003); DIR:OUT; SFP:1102; SCL:1; SRVR:BN3PR0501MB1169; H:BN3PR0501MB1442.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
authentication-results: spf=none (sender IP is ) smtp.mailfrom=kwatsen@juniper.net;
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_36B359121FC14B05A61A44D21813CC79junipernet_"
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Aug 2017 20:03:44.2922 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0501MB1169
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/q498kVc766v7V2ged-L393Jaykw>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Aug 2017 20:03:49 -0000

As Andy says, readability is #1, and it follows that a restricted subset would be more understandable.  Standardizing this would require an update to RFC 7950 (read: not going to happen anytime soon).  Maybe we could start with just having a tool detect when something outside the common-subset is used.   Can a "common subset" be well-defined?  - "common" between how many engines? - would it be forever evolving?

K. // contributor


On 8/30/17, 12:44 PM, "netmod on behalf of Robert Wilton" <netmod-bounces@ietf.org<mailto:netmod-bounces@ietf.org> on behalf of rwilton@cisco.com<mailto:rwilton@cisco.com>> wrote:

I actually think that XML RE is a good choice for YANG pattern statements (because it is one of the more simple RE languages), I just don't think that we need all of it.


First question: How many pattern statements in draft and standard IETF YANG modules actually use Unicode properties (e.g \p{}).
Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.

E.g.       pattern
        '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
      +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
      + '(%[\p{N}\p{L}]+)?';

This could quite possibly have been written just as "\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.

There a couple more occurrences of Unicode character classes in the vendor models on github, but only to restrict them to the ASCII character set (oh the irony), which I believe can be accomplished without resorting to Unicode properties.


Another question: How often is character class subtraction (e.g. [A-Z-[PQ]] used in standard & the github YANG modules?
Answer: 0.  AFAICT, it isn't used at all, anywhere ...



Now, I'm not proposing using a different regex syntax for pattern statements, just a sensible subset of XSD RE, such that it easier for folks to read/review pattern statements, and it is easier for client and server implementations to translate into other common regex implementations if they so wish.

Of course, as part of that translation, I would expect a translation function to check and generate an error if the translation cannot handle the input regex (e.g. if it uses an obscure unmatched unicode property or a unicode block, or character class subtraction syntax).  This really doesn't seem hard to me.

But the XML RE language has stuff in it that I don't think anyone is ever going to use in a standardized network management YANG model.   Forcing everyone to implement support for this stuff just seems like a complete waste of time and effort.  Looking at the regex info website it looks like there are about 143 unicode properties and blocks defined (it may be incomplete), or which I think that 135+ of these probably have no relevance in network management YANG modules, and the benefit of the remaining ones is pretty suspect.

I mean, how many network management YANG modules really need a pattern statement that only matches Runic characters?  Perhaps someone out there is busy defining "middle-earth.yang" ;-)

If I am the only person opposed to making life unnecessarily difficult to readers of YANG models, and client/server tool implementors interacting with YANG then it is probably time to give up this discussion. ;-)

Python, quite likely a common tool for client side network management, also doesn't seem to have any support of unicode properties or blocks.  Perhaps implementations will hook it up to libxml2 instead, or write a full translation XML RE to Python RE conversion tool.  But probably most people will just feed the pattern statement into the native Python regex engine, and my guess is that this will probably work 95% of the time.  The other 5% ... who knows what will happen ... oh well, better to try and fail than to not try at all.

Apologies if this email comes across as a rant.

Rob