Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Xufeng Liu <Xufeng_Liu@jabil.com> Fri, 25 August 2017 12:40 UTC

Return-Path: <Xufeng_Liu@jabil.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3046C132027 for <netmod@ietfa.amsl.com>; Fri, 25 Aug 2017 05:40:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.92
X-Spam-Level:
X-Spam-Status: No, score=-1.92 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=jabil.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pF0-MP-aQQuJ for <netmod@ietfa.amsl.com>; Fri, 25 Aug 2017 05:40:21 -0700 (PDT)
Received: from NAM01-SN1-obe.outbound.protection.outlook.com (mail-sn1nam01on0093.outbound.protection.outlook.com [104.47.32.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13405132BE4 for <netmod@ietf.org>; Fri, 25 Aug 2017 05:40:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jabil.onmicrosoft.com; s=selector1-jabil-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=KFDa/ZNQiUqLLMNLijHTLdZ71iluGwk32U/luinnh/w=; b=kQvQuwtmF2Ak718SAJVhGioyRxFNXz2dppqAUJJkOMaTiJqI/G3720YH8NeKYXIMDPhYJBv/euTaybjRN/Ev4arl2FUG+Sv5o17LaAx2+dI+fWY1H4gi/nJMrQdk+hNImVs19Qr+flXPLoCS+yekfFNyt3CoWr8u72kttQaexB0=
Received: from BN3PR0201MB0867.namprd02.prod.outlook.com (10.160.154.13) by BN3PR0201MB1058.namprd02.prod.outlook.com (10.161.209.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1385.9; Fri, 25 Aug 2017 12:40:18 +0000
Received: from BN3PR0201MB0867.namprd02.prod.outlook.com ([10.160.154.13]) by BN3PR0201MB0867.namprd02.prod.outlook.com ([10.160.154.13]) with mapi id 15.01.1385.010; Fri, 25 Aug 2017 12:40:18 +0000
From: Xufeng Liu <Xufeng_Liu@jabil.com>
To: Per Hedeland <per@tail-f.com>, Ladislav Lhotka <lhotka@nic.cz>
CC: "'netmod@ietf.org'" <netmod@ietf.org>
Thread-Topic: [netmod] Potential additions to rfc6087bis: RegEx guidelines
Thread-Index: AdMcU4SqqeTEr3DWR4ygcD75zJMNgAAS/WQAAAKnWIAAEce1gAAC0wKAACiZxdA=
Date: Fri, 25 Aug 2017 12:40:18 +0000
Message-ID: <BN3PR0201MB0867A248887538077CD5D49FF19B0@BN3PR0201MB0867.namprd02.prod.outlook.com>
References: <BN3PR0201MB0867DAD1212DBA2E88570AD5F1850@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170824060900.u5kcffzvwjr7mmob@elstar.local> <152f24b2-7947-9c76-714c-af226ab3fe91@tail-f.com> <8760ddc676.fsf@nic.cz> <599F0991.7020900@tail-f.com>
In-Reply-To: <599F0991.7020900@tail-f.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-dg-ref: PG1ldGE+PGF0IG5tPSJib2R5LnR4dCIgcD0iYzpcdXNlcnNceGxpdVxhcHBkYXRhXHJvYW1pbmdcMDlkODQ5YjYtMzJkMy00YTQwLTg1ZWUtNmI4NGJhMjllMzViXG1zZ3NcbXNnLThjZGM5NWFlLTg5OTItMTFlNy05YzI4LTE4NWUwZmUzYzQ1Y1xhbWUtdGVzdFw4Y2RjOTViMC04OTkyLTExZTctOWMyOC0xODVlMGZlM2M0NWNib2R5LnR4dCIgc3o9IjY3MjgiIHQ9IjEzMTQ4MTM4NDE5MjQxMzQ4OCIgaD0iMURnMThvRkc1TUdMR2RwU0FrUjhUSXhaYkZRPSIgaWQ9IiIgYmw9IjAiIGJvPSIxIi8+PC9tZXRhPg==
authentication-results: spf=none (sender IP is ) smtp.mailfrom=Xufeng_Liu@jabil.com;
x-originating-ip: [72.209.195.86]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BN3PR0201MB1058; 6:lLnPOOs5QitKe1e6wM32tV4vZzgGYPirkcUGR3qPebZkFkDkChCtnqkxgEm6ivNXzBo0ELgeR6SX3RkRAthWppt4tcezgBe8QmxeZjlY6WXLyS1Yr5Zyq0m/PoRBtVixrEXCb+kOM2LbxHKkEqEQ38BIhPqy3bTu3rvtqqyGt7ikwd0EJaqSh5ysfYwNjQe8RpEUuuwFD3c5ddBI9U7fvcHOt6P+Gk43f6js2PYT6m2DEkKvWQlxFFre5FfWk4arox7bOINi9iv5AiNwGGzbilspJKOZQdVEyShp7uDE8x3GTg2fmZIIvqFd7aPgd63IUjAMXOTnZh50hzErsAIbuw==; 5:qBjqkN7pwt3nw7hOs+94q6/qFtfMs7FO9XQFsk/ndxFFzHxRSs2FM+cIb5cbdJ2U5D8gvpu9FX5Hjkxp031bgG+W/ISHzwb9ByRs5Pw4pTgcE87fnjEvlnPhgjUKOHQHiIaQPiS6tXfNEYLqda6GIg==; 24:+wJiiqHkIqypoEv5eA1KiRZG1vQjZeKfFODPsq5GKoWSWptXBwibVu68kVdp37M8Gr5/ViJZh8XwoW9IVfvRjaembROCl237f4Sj8yuyJ0I=; 7:blgKQXS/bTzTdIelC5kw4M4D6tKDXLR1a9IFYzsylPDbpIgu18b8pdEsYZ2gGyxIa3JSvCCkH0UNVSg4GV2s5zOk5vHJcmu7a4lRtxzB2xtmy8QqJSZ26izXxlaM24NOedaiaNIEfuzwLTJATg4IF08Ope0EnP7LISaX6vaWZNnItYWlQakz2QFcdg/SIFIqpPZJ4NOTCeHUJdhtS4A/B4LYne2i4EL+5TTYDwc3y/I=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: f4ea312a-1893-4cc8-d455-08d4ebb67175
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:BN3PR0201MB1058;
x-ms-traffictypediagnostic: BN3PR0201MB1058:
x-exchange-antispam-report-test: UriScan:(158342451672863)(21534305686606);
x-microsoft-antispam-prvs: <BN3PR0201MB1058D601571AC74B473BAB75F19B0@BN3PR0201MB1058.namprd02.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(3002001)(100000703101)(100105400095)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123555025)(20161123558100)(20161123562025)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BN3PR0201MB1058; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BN3PR0201MB1058;
x-forefront-prvs: 041032FF37
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(39860400002)(13464003)(189002)(24454002)(377424004)(377454003)(199003)(76176999)(77096006)(6116002)(53936002)(189998001)(7736002)(305945005)(102836003)(25786009)(3846002)(101416001)(54356999)(14454004)(72206003)(561944003)(50986999)(478600001)(8676002)(81166006)(5660300001)(81156014)(6246003)(6306002)(9686003)(6506006)(8936002)(33656002)(3280700002)(966005)(229853002)(106356001)(68736007)(55016002)(105586002)(3660700001)(99286003)(6436002)(74316002)(4326008)(7696004)(86362001)(2906002)(80792005)(2950100002)(66066001)(53546010)(2900100001)(93886005)(97736004); DIR:OUT; SFP:1102; SCL:1; SRVR:BN3PR0201MB1058; H:BN3PR0201MB0867.namprd02.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
received-spf: None (protection.outlook.com: jabil.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: jabil.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Aug 2017 12:40:18.3493 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bc876b21-f134-4c12-a265-8ed26b7f0f3b
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0201MB1058
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/MehoS36BUkSmxrqX7yGTCUDpgpA>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Aug 2017 12:40:25 -0000


> -----Original Message-----
> From: Per Hedeland [mailto:per@tail-f.com]
> Sent: Thursday, August 24, 2017 1:15 PM
> To: Ladislav Lhotka <lhotka@nic.cz>
> Cc: 'netmod@ietf.org' <netmod@ietf.org>; Xufeng Liu <Xufeng_Liu@jabil.com>
> Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
> 
> On 2017-08-24 17:54, Ladislav Lhotka wrote:
> > Per Hedeland <per@tail-f.com> writes:
> >
> >> I strongly agree with all of Juergen's statements, and disagree also
> >> with the suggestion to include the parts of the text that he didn't
> >> specifically disagree with. And I'd like to add that the "lack of XSD
> >> support" argument is pretty weak - there exists at least one freely
> >> available implementation in the form of libxml2, which is actually
> >> present by default in basically all "normal" Linux installations.
> >> It is portable C code, and the parts needed for regexp matching
> >> amount to just above 100 kB of compiled code on an x86_64 CPU.
> >
> > I wouldn't be so strict here. Libxml2 has its share of problems - for
> > one, its "official" bindings do not support Python3, so e.g. in
> > Yangson I had to use PyXB package instead and pyang gives up pattern
> > validation in Python 3 entirely.
> 
> I don't really see how claiming that the "lack of XSD support" argument is weak
> amounts to being "strict" - and I suspect that the claim is valid even considering
> the amount of pattern-validating server and client implementations written in
> Python3. For a validation/translation tool such as pyang, having a minuscule C
> program that is invoked for validation would seem to be a reasonable
> implementation if no other options exist, though admittedly it is an annoyance.
> 
[Xufeng] Besides the language issue, there are situations where XML is not used, so that it is not desirable to include the libxml2 library.

> > That being said, there doesn't seem to be a clearly superior
> > replacement, and some aspects of XSD regexes, such as support for
> > Unicode and the absence of ^ and $ anchors, make a lot of sense in
> > YANG. So I am also not in favour of the proposed change.
> 
> I did not see a proposed change to the standard YANG specification regarding
> the regexp flavor, only a proposal that module authors SHOULD show
> consideration for implementations that don't comply with the standard.
> 
[Xufeng] This is the point.

> --Per
> 
> > BTW, it is actually a shame that there is no standard regex language
> > that could be easily used in all programming languages. Oh well ...
> >
> > Lada
> >
> >>
> >> --Per
> >>
> >> On 2017-08-24 08:09, Juergen Schoenwaelder wrote:
> >>> On Wed, Aug 23, 2017 at 09:20:36PM +0000, Xufeng Liu wrote:
> >>>> Members of Routing Area Yang DT have had some discussions about the
> handling of various variants of regular expressions. The followings are the
> current state, and we are thinking that if this topic can be added to RFC6087bis:
> >>>>
> >>>> 1. Regular Expression Usage
> >>>> YANG uses regular expressions to restrict string values. Such a restriction
> can be a part of a "pattern" statement or a string matching function. [RFC7950]
> specifies that YANG regular expressions will conform to Appendix F in [XSD-
> TYPES].
> >>>> YANG models have been implemented in many different environments and
> the XSD variant of the regular expressions is not supported in many of these
> environments. There are currently more than a dozen popular regular expression
> variants implemented in various environments. While the usage of the XSD
> variant of regular expression described in [RFC7950] remains the preferred
> standard, a few conventions are prescribed to maximize the portability of YANG
> models between environments.
> >>>>
> >>>
> >>> I strongly disagree with this statement. The standard format are XSD
> >>> regular expressions. RFC 7950 section 9.4.5:
> >>>
> >>>     The "pattern" statement, which is an optional substatement to the
> >>>     "type" statement, takes as an argument a regular expression string,
> >>>     as defined in [XSD-TYPES].
> >>>
> >>> There is no notion of a 'preferred' standard.
> >>>
> >>>> 1.1. Regular Expression Variant Choice Precedence YANG model
> >>>> designers SHOULD use the most portable syntax whenever possible. Under
> the condition that XSD compliance is satisfied and there are multiple choices for
> a given expression, the following precedence SHOULD be used to choose a
> regular expressions variant:
> >>>>
> >>>> o    POSIX base
> >>>>
> >>>> o    POSIX extended
> >>>>
> >>>> o    BSD
> >>>>
> >>>> o    GNU Regular Expression Extensions
> >>>>
> >>>> o    C++ Regular Expressions with std::regex
> >>>>
> >>>> o    Others
> >>>
> >>> Strongly disagree. You either write YANG or something different.
> >>> There is no way to recognize what kind of regular expressions have
> >>> been used by the model designer. The value of a standard is that
> >>> everybody does the same.
> >>>
> >>>> For example, either \d or [0-9] can be used with equivalent semantics and
> they are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] is
> supported by POSIX base but \d is not.
> >>>>
> >>>> 1.2.  Convention Guidelines
> >>>> 1.2.1. Avoid Character Category Escapes For example, in XSD regular
> >>>> expression, \d is a Character Category Escape denoting the range of digits,
> i.e.,  [0-9]. To maximize portability, the model designers SHOULD use [0-9]
> instead of \d.
> >>>>
> >>>> 1.2.2. Avoid Unicode Characters
> >>>> Unicode characters are allowed in XSD regular expressions, but are not
> supported in the POSIX variant. If possible, the model designers SHOULD avoid
> using Unicode characters, such as: \p{L} and \p{N}.
> >>>>
> >>>> 1.3. Conversion Tools
> >>>> Tools can automatically convert regular expressions from one variant to
> another. When a YANG model is implemented in an environment where XSD
> regular expressions are not supported, the recommended approach is to use a
> conversion tool. For example, if needed, anchor position characters, i.e., '^' and
> '$', can be added by a regular expression conversion tool.
> >>>
> >>> If conversion tools exist that can convert, then by all means use
> >>> XSD in the YANG model and use tools to convert to whatever format
> >>> your implementation prefers to use.
> >>>
> >>> /js
> >>>
> >>
> >> _______________________________________________
> >> netmod mailing list
> >> netmod@ietf.org
> >> https://www.ietf.org/mailman/listinfo/netmod
> >