Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Ladislav Lhotka <lhotka@nic.cz> Thu, 24 August 2017 15:53 UTC

Return-Path: <lhotka@nic.cz>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C2B4126DD9 for <netmod@ietfa.amsl.com>; Thu, 24 Aug 2017 08:53:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5wqP9ocYSzaE for <netmod@ietfa.amsl.com>; Thu, 24 Aug 2017 08:53:46 -0700 (PDT)
Received: from trail.lhotka.name (trail.lhotka.name [77.48.224.143]) by ietfa.amsl.com (Postfix) with ESMTP id A026413239A for <netmod@ietf.org>; Thu, 24 Aug 2017 08:53:44 -0700 (PDT)
Received: by trail.lhotka.name (Postfix, from userid 109) id 108DF1820E76; Thu, 24 Aug 2017 17:53:43 +0200 (CEST)
Received: from localhost (cst-prg-99-26.cust.vodafone.cz [46.135.99.26]) by trail.lhotka.name (Postfix) with ESMTPSA id 3E77B1820043; Thu, 24 Aug 2017 17:53:39 +0200 (CEST)
From: Ladislav Lhotka <lhotka@nic.cz>
To: Per Hedeland <per@tail-f.com>, Xufeng Liu <Xufeng_Liu@jabil.com>, "'netmod@ietf.org'" <netmod@ietf.org>
In-Reply-To: <152f24b2-7947-9c76-714c-af226ab3fe91@tail-f.com>
References: <BN3PR0201MB0867DAD1212DBA2E88570AD5F1850@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170824060900.u5kcffzvwjr7mmob@elstar.local> <152f24b2-7947-9c76-714c-af226ab3fe91@tail-f.com>
Mail-Followup-To: Per Hedeland <per@tail-f.com>, Xufeng Liu <Xufeng_Liu@jabil.com>, "'netmod\@ietf.org'" <netmod@ietf.org>
Date: Thu, 24 Aug 2017 17:54:05 +0200
Message-ID: <8760ddc676.fsf@nic.cz>
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/7PTf_7O6r42Qdilq5zub8BcnPig>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Aug 2017 15:53:49 -0000

Per Hedeland <per@tail-f.com> writes:

> I strongly agree with all of Juergen's statements, and disagree also
> with the suggestion to include the parts of the text that he didn't
> specifically disagree with. And I'd like to add that the "lack of XSD
> support" argument is pretty weak - there exists at least one freely
> available implementation in the form of libxml2, which is actually
> present by default in basically all "normal" Linux installations.
> It is portable C code, and the parts needed for regexp matching amount
> to just above 100 kB of compiled code on an x86_64 CPU.

I wouldn't be so strict here. Libxml2 has its share of problems - for
one, its "official" bindings do not support Python3, so e.g. in Yangson
I had to use PyXB package instead and pyang gives up pattern validation
in Python 3 entirely. 

That being said, there doesn't seem to be a clearly superior
replacement, and some aspects of XSD regexes, such as support for
Unicode and the absence of ^ and $ anchors, make a lot of sense in
YANG. So I am also not in favour of the proposed change.

BTW, it is actually a shame that there is no standard regex language
that could be easily used in all programming languages. Oh well ...

Lada

>
> --Per
>
> On 2017-08-24 08:09, Juergen Schoenwaelder wrote:
>> On Wed, Aug 23, 2017 at 09:20:36PM +0000, Xufeng Liu wrote:
>>> Members of Routing Area Yang DT have had some discussions about the handling of various variants of regular expressions. The followings are the current state, and we are thinking that if this topic can be added to RFC6087bis:
>>>
>>> 1. Regular Expression Usage
>>> YANG uses regular expressions to restrict string values. Such a restriction can be a part of a "pattern" statement or a string matching function. [RFC7950] specifies that YANG regular expressions will conform to Appendix F in [XSD-TYPES].
>>> YANG models have been implemented in many different environments and the XSD variant of the regular expressions is not supported in many of these environments. There are currently more than a dozen popular regular expression variants implemented in various environments. While the usage of the XSD variant of regular expression described in [RFC7950] remains the preferred standard, a few conventions are prescribed to maximize the portability of YANG models between environments.
>>>
>> 
>> I strongly disagree with this statement. The standard format are XSD
>> regular expressions. RFC 7950 section 9.4.5:
>> 
>>     The "pattern" statement, which is an optional substatement to the
>>     "type" statement, takes as an argument a regular expression string,
>>     as defined in [XSD-TYPES].
>> 
>> There is no notion of a 'preferred' standard.
>> 
>>> 1.1. Regular Expression Variant Choice Precedence
>>> YANG model designers SHOULD use the most portable syntax whenever possible. Under the condition that XSD compliance is satisfied and there are multiple choices for a given expression, the following precedence SHOULD be used to choose a regular expressions variant:
>>>
>>> o    POSIX base
>>>
>>> o    POSIX extended
>>>
>>> o    BSD
>>>
>>> o    GNU Regular Expression Extensions
>>>
>>> o    C++ Regular Expressions with std::regex
>>>
>>> o    Others
>> 
>> Strongly disagree. You either write YANG or something different. There
>> is no way to recognize what kind of regular expressions have been used
>> by the model designer. The value of a standard is that everybody does
>> the same.
>> 
>>> For example, either \d or [0-9] can be used with equivalent semantics and they are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] is supported by POSIX base but \d is not.
>>>
>>> 1.2.  Convention Guidelines
>>> 1.2.1. Avoid Character Category Escapes
>>> For example, in XSD regular expression, \d is a Character Category Escape denoting the range of digits, i.e.,  [0-9]. To maximize portability, the model designers SHOULD use [0-9] instead of \d.
>>>
>>> 1.2.2. Avoid Unicode Characters
>>> Unicode characters are allowed in XSD regular expressions, but are not supported in the POSIX variant. If possible, the model designers SHOULD avoid using Unicode characters, such as: \p{L} and \p{N}.
>>>
>>> 1.3. Conversion Tools
>>> Tools can automatically convert regular expressions from one variant to another. When a YANG model is implemented in an environment where XSD regular expressions are not supported, the recommended approach is to use a conversion tool. For example, if needed, anchor position characters, i.e., '^' and '$', can be added by a regular expression conversion tool.
>> 
>> If conversion tools exist that can convert, then by all means use XSD
>> in the YANG model and use tools to convert to whatever format your
>> implementation prefers to use.
>> 
>> /js
>> 
>
> _______________________________________________
> netmod mailing list
> netmod@ietf.org
> https://www.ietf.org/mailman/listinfo/netmod

-- 
Ladislav Lhotka
Head, CZ.NIC Labs
PGP Key ID: 0xB8F92B08A9F76C67