Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Robert Wilton <rwilton@cisco.com> Wed, 06 September 2017 07:53 UTC

Return-Path: <rwilton@cisco.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 989881323F7 for <netmod@ietfa.amsl.com>; Wed, 6 Sep 2017 00:53:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.501
X-Spam-Level:
X-Spam-Status: No, score=-14.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id St4onvlaS7Pv for <netmod@ietfa.amsl.com>; Wed, 6 Sep 2017 00:53:00 -0700 (PDT)
Received: from aer-iport-4.cisco.com (aer-iport-4.cisco.com [173.38.203.54]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C50601323B0 for <netmod@ietf.org>; Wed, 6 Sep 2017 00:52:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=4344; q=dns/txt; s=iport; t=1504684379; x=1505893979; h=subject:to:references:from:message-id:date:mime-version: in-reply-to:content-transfer-encoding; bh=csupYKrRQOToZUI1GVJyJYala7JIXtowy6kCg25qjOQ=; b=QktvGdw5C13Chyl+PtAIchuJfhSdWmSu+nUvP5+/xh0kwtm4zV/jeLdd fXpF6YvJiTzpejBcS2JSfSTqro43tEWnm2LutXQUOyKLCrjS18duaefJp j/IVyK5nk0yOzCgFsIBRYXbGrtMFw0rEPD4umuoJ54UeQ06I+3Zw+10UG U=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0DbAQBsqK9Z/xbLJq1eGQEBAQEBAQEBAQEBBwEBAQEBiUqLFZEeljaCBAqFPgKEexQBAgEBAQEBAQFrKIUZAQUjDwEFUQkCDgoCAiYCAlcGAQwIAQGKLZEbnWaCJ4s5AQEBAQEBAQECAQEBAQEBASGBDYIdg1CBYyuCfYRCg0aCYQWgdIs1iRyCE4lBJIZ5jVeHVIE5NiGBAgsyIQgcFYdlP4pfAQEB
X-IronPort-AV: E=Sophos;i="5.41,483,1498521600"; d="scan'208";a="657272860"
Received: from aer-iport-nat.cisco.com (HELO aer-core-3.cisco.com) ([173.38.203.22]) by aer-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2017 07:52:55 +0000
Received: from [10.63.23.66] (dhcp-ensft1-uk-vla370-10-63-23-66.cisco.com [10.63.23.66]) by aer-core-3.cisco.com (8.14.5/8.14.5) with ESMTP id v867qtOP006588; Wed, 6 Sep 2017 07:52:55 GMT
To: Lou Berger <lberger@labn.net>, Ladislav Lhotka <lhotka@nic.cz>, netmod@ietf.org
References: <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com> <1504223854014.55228@Aviatnet.com> <847e5bf9-7b3d-9ff8-9954-970f32a2094c@cisco.com> <20170902073342.xoziwor4tdr5bipw@elstar.local> <D5D00209.C5C67%acee@cisco.com> <20170902112832.ymorfgdthobeio6q@elstar.local> <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com> <1e95ba5d-7aa2-e08f-56f9-27aa70822a11@cisco.com> <1504537140.5874.38.camel@nic.cz> <f0ddf7bd-c249-389f-e34b-0b901697307e@cisco.com> <1504629352.7175.40.camel@nic.cz> <8af6041d-7cd5-9608-70b4-7cffc4f884f8@cisco.com> <2a70ce5e-7727-d280-98e4-481d87314d14@labn.net>
From: Robert Wilton <rwilton@cisco.com>
Message-ID: <cbe34a3e-cf6d-6da7-07fb-ad544892453d@cisco.com>
Date: Wed, 06 Sep 2017 08:52:55 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <2a70ce5e-7727-d280-98e4-481d87314d14@labn.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/LkEuHflbY8Ni0oisTNV_uNiGusY>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Sep 2017 07:53:01 -0000

Hi Lou,

This is the addition to 6087bis that I propose.   Note, this is the same 
text in my email on the 31st of August.

I propose adding the following 2 paragraphs to 6087bis section on 
pattern and ranges:

NEW:
To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.


Or, put in context of the existing text 6087bis text:

*** Patterns and Ranges

For string data types, if a machine-readable pattern
can be defined for the desired semantics, then
one or more pattern statements SHOULD be present.
A single quoted string SHOULD be used to specify the pattern,
since a double-quoted string can modify the content.

To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that also matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "pattern" statement:

     typedef ipv4-address-no-zone {
       type inet:ipv4-address {
         pattern '[0-9\.]*';
       }
       ...
     }

For string data types, if the length of the string
is required to be bounded in all implementations,
then a length statement MUST be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "length" statement:

     typedef yang-identifier {
       type string {
         length "1..max";
         pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
         pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
       }
       ...
     }

For numeric data types, if the values allowed
by the intended semantics are different than
those allowed by the unbounded intrinsic data
type (e.g., 'int32'), then a range statement SHOULD be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "range" statement:

     typedef dscp {
       type uint8 {
          range "0..63";
       }
       ...
     }

Thanks,
Rob


On 05/09/2017 22:37, Lou Berger wrote:
> Rob,
>
> (as chair)
> On 9/5/2017 1:17 PM, Robert Wilton wrote:
>> However, I have thrown in the towel on my regex crusade.
> I'm sorry, I've lost the thread here a bit. in order to guage consensus
> on this topic, it would be helpful to send the latest text that you are
> proposing for inclusion in the the bis.  If you are willing to do these,
> we can poll to see if there is/is not support for inclusion of this
> text.  Are you willing, i.e., can you send the current proposed text change?
>
> Thank you,
> Lou
>
> .
>