Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Lou Berger <lberger@labn.net> Fri, 08 September 2017 15:13 UTC

Return-Path: <lberger@labn.net>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A17B13291C for <netmod@ietfa.amsl.com>; Fri, 8 Sep 2017 08:13:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.701
X-Spam-Level:
X-Spam-Status: No, score=-4.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (768-bit key) header.d=labn.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AxhEy8qq4GQ5 for <netmod@ietfa.amsl.com>; Fri, 8 Sep 2017 08:13:23 -0700 (PDT)
Received: from gproxy6-pub.mail.unifiedlayer.com (gproxy6-pub.mail.unifiedlayer.com [67.222.39.168]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 72FD8132EE9 for <netmod@ietf.org>; Fri, 8 Sep 2017 08:13:23 -0700 (PDT)
Received: from CMOut01 (unknown [10.0.90.82]) by gproxy6.mail.unifiedlayer.com (Postfix) with ESMTP id 2D02C1E3C4E for <netmod@ietf.org>; Fri, 8 Sep 2017 09:07:14 -0600 (MDT)
Received: from box313.bluehost.com ([69.89.31.113]) by CMOut01 with id 737A1w00c2SSUrH0137Dts; Fri, 08 Sep 2017 09:07:14 -0600
X-Authority-Analysis: v=2.2 cv=fJ5J5dSe c=1 sm=1 tr=0 a=h1BC+oY+fLhyFmnTBx92Jg==:117 a=h1BC+oY+fLhyFmnTBx92Jg==:17 a=IkcTkHD0fZMA:10 a=xqWC_Br6kY4A:10 a=2JCJgTwv5E4A:10 a=AUd_NHdVAAAA:8 a=uKRgO4BIgdL0fz_OgDgA:9 a=WVNO1HQdYXl3mB8E:21 a=ca_8yGLQMD-HRXHk:21 a=QEXdDO2ut3YA:10
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=labn.net; s=default; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version :Date:Message-ID:References:To:From:Subject:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=2K6KlKgFmZtC3Dqyo0BFXqT36RgiXTQ4r05lbYqQWUo=; b=LrYSINY8RBxUpfl1wjffYFj8jK yMAWHy549Azq50iJbwjFXbLzFBvyonQj9X3xvVnjydzTsOpOPMdPP7Q0GuKAyJufuw0WccuyZRd8d /TX2tzxcnjDuqKWZzjlafoKO3;
Received: from pool-100-15-84-20.washdc.fios.verizon.net ([100.15.84.20]:36730 helo=[IPv6:::1]) by box313.bluehost.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from <lberger@labn.net>) id 1dqKsA-003gcn-AU; Fri, 08 Sep 2017 09:07:10 -0600
From: Lou Berger <lberger@labn.net>
To: Robert Wilton <rwilton@cisco.com>, Ladislav Lhotka <lhotka@nic.cz>, netmod@ietf.org
References: <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com> <1504223854014.55228@Aviatnet.com> <847e5bf9-7b3d-9ff8-9954-970f32a2094c@cisco.com> <20170902073342.xoziwor4tdr5bipw@elstar.local> <D5D00209.C5C67%acee@cisco.com> <20170902112832.ymorfgdthobeio6q@elstar.local> <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com> <1e95ba5d-7aa2-e08f-56f9-27aa70822a11@cisco.com> <1504537140.5874.38.camel@nic.cz> <f0ddf7bd-c249-389f-e34b-0b901697307e@cisco.com> <1504629352.7175.40.camel@nic.cz> <8af6041d-7cd5-9608-70b4-7cffc4f884f8@cisco.com> <2a70ce5e-7727-d280-98e4-481d87314d14@labn.net> <cbe34a3e-cf6d-6da7-07fb-ad544892453d@cisco.com> <15e58235480.27d3.9b4188e636579690ba6c69f2c8a0f1fd@labn.net>
Message-ID: <7398f538-d092-429a-30d5-e5609096a48a@labn.net>
Date: Fri, 08 Sep 2017 11:07:06 -0400
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <15e58235480.27d3.9b4188e636579690ba6c69f2c8a0f1fd@labn.net>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - box313.bluehost.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - labn.net
X-BWhitelist: no
X-Source-IP: 100.15.84.20
X-Exim-ID: 1dqKsA-003gcn-AU
X-Source:
X-Source-Args:
X-Source-Dir:
X-Source-Sender: pool-100-15-84-20.washdc.fios.verizon.net ([IPv6:::1]) [100.15.84.20]:36730
X-Source-Auth: lberger@labn.net
X-Email-Count: 3
X-Source-Cap: bGFibm1vYmk7bGFibm1vYmk7Ym94MzEzLmJsdWVob3N0LmNvbQ==
X-Local-Domain: yes
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/-bN9Ip_bcAxMt_jXWXBB9G2Glpc>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Sep 2017 15:13:25 -0000

Kent and I discussed this.  We (as chairs) don't think there is
currently WG consensus on RegEx guidelines.  We do think there is
sufficient interest to continue the discussion, and would like to do so
both on list and in our next meeting in Singapore.

Thank you,

Lou and Kent

On 9/6/2017 1:01 PM, Lou Berger wrote:
> Thanks Rob.  I'll get with Kent and  then one of us will get back to the wg 
> on next steps.
>
> Lou
>
>
> On September 6, 2017 3:53:33 AM Robert Wilton <rwilton@cisco.com> wrote:
>
>> Hi Lou,
>>
>> This is the addition to 6087bis that I propose.   Note, this is the same
>> text in my email on the 31st of August.
>>
>> I propose adding the following 2 paragraphs to 6087bis section on
>> pattern and ranges:
>>
>> NEW:
>> To ensure patterns are easy to read and implement, authors SHOULD
>> restrict themselves to the parts of the XML schema regular expression
>> language that are common across most regular expression languages.  In
>> particular, pattern statements SHOULD avoid using 'character class
>> subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
>> unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
>> They MAY use the '\d', '\w', '\s' character class shorthands and their
>> negated versions, where appropriate, but SHOULD avoid other character
>> class shorthands.  To match ASCII digits 0-9 the character class
>> '[0-9]' MUST be used instead of the '\d' character class shorthand
>> that matches Unicode digits in all scripts.
>>
>> Pattern statements do not have to strictly restrict numerical values,
>> and a simple less specific pattern may be preferable over a more
>> complex and precise pattern, e.g. as illustrated in the
>> 'ipv4-address-no-zone' example pattern below.
>>
>>
>> Or, put in context of the existing text 6087bis text:
>>
>> *** Patterns and Ranges
>>
>> For string data types, if a machine-readable pattern
>> can be defined for the desired semantics, then
>> one or more pattern statements SHOULD be present.
>> A single quoted string SHOULD be used to specify the pattern,
>> since a double-quoted string can modify the content.
>>
>> To ensure patterns are easy to read and implement, authors SHOULD
>> restrict themselves to the parts of the XML schema regular expression
>> language that are common across most regular expression languages.  In
>> particular, pattern statements SHOULD avoid using 'character class
>> subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
>> unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
>> They MAY use the '\d', '\w', '\s' character class shorthands and their
>> negated versions, where appropriate, but SHOULD avoid other character
>> class shorthands.  To match ASCII digits 0-9 the character class
>> '[0-9]' MUST be used instead of the '\d' character class shorthand
>> that also matches Unicode digits in all scripts.
>>
>> Pattern statements do not have to strictly restrict numerical values,
>> and a simple less specific pattern may be preferable over a more
>> complex and precise pattern, e.g. as illustrated in the
>> 'ipv4-address-no-zone' example pattern below.
>>
>> The following typedef from ^RFC6991^ demonstrates the proper
>> use of the "pattern" statement:
>>
>>      typedef ipv4-address-no-zone {
>>        type inet:ipv4-address {
>>          pattern '[0-9\.]*';
>>        }
>>        ...
>>      }
>>
>> For string data types, if the length of the string
>> is required to be bounded in all implementations,
>> then a length statement MUST be present.
>>
>> The following typedef from ^RFC6991^ demonstrates the proper
>> use of the "length" statement:
>>
>>      typedef yang-identifier {
>>        type string {
>>          length "1..max";
>>          pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
>>          pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
>>        }
>>        ...
>>      }
>>
>> For numeric data types, if the values allowed
>> by the intended semantics are different than
>> those allowed by the unbounded intrinsic data
>> type (e.g., 'int32'), then a range statement SHOULD be present.
>>
>> The following typedef from ^RFC6991^ demonstrates the proper
>> use of the "range" statement:
>>
>>      typedef dscp {
>>        type uint8 {
>>           range "0..63";
>>        }
>>        ...
>>      }
>>
>> Thanks,
>> Rob
>>
>>
>> On 05/09/2017 22:37, Lou Berger wrote:
>>> Rob,
>>>
>>> (as chair)
>>> On 9/5/2017 1:17 PM, Robert Wilton wrote:
>>>> However, I have thrown in the towel on my regex crusade.
>>> I'm sorry, I've lost the thread here a bit. in order to guage consensus
>>> on this topic, it would be helpful to send the latest text that you are
>>> proposing for inclusion in the the bis.  If you are willing to do these,
>>> we can poll to see if there is/is not support for inclusion of this
>>> text.  Are you willing, i.e., can you send the current proposed text change?
>>>
>>> Thank you,
>>> Lou
>>>
>>> .
>>>
>>