Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de> Sat, 02 September 2017 08:24 UTC

Return-Path: <j.schoenwaelder@jacobs-university.de>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2262E1336EA for <netmod@ietfa.amsl.com>; Sat, 2 Sep 2017 01:24:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S-RV3O6Tirq9 for <netmod@ietfa.amsl.com>; Sat, 2 Sep 2017 01:24:26 -0700 (PDT)
Received: from atlas5.jacobs-university.de (atlas5.jacobs-university.de [212.201.44.20]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E970132F8F for <netmod@ietf.org>; Sat, 2 Sep 2017 01:24:26 -0700 (PDT)
Received: from localhost (demetrius5.irc-it.jacobs-university.de [10.70.0.222]) by atlas5.jacobs-university.de (Postfix) with ESMTP id 53AD06A2; Sat, 2 Sep 2017 10:24:25 +0200 (CEST)
X-Virus-Scanned: amavisd-new at jacobs-university.de
Received: from atlas5.jacobs-university.de ([10.70.0.217]) by localhost (demetrius5.jacobs-university.de [10.70.0.222]) (amavisd-new, port 10032) with ESMTP id 0maTCgcmct-x; Sat, 2 Sep 2017 10:24:21 +0200 (CEST)
Received: from hermes.jacobs-university.de (hermes.jacobs-university.de [212.201.44.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hermes.jacobs-university.de", Issuer "Jacobs University CA - G01" (verified OK)) by atlas5.jacobs-university.de (Postfix) with ESMTPS; Sat, 2 Sep 2017 10:24:25 +0200 (CEST)
Received: from localhost (demetrius1.jacobs-university.de [212.201.44.46]) by hermes.jacobs-university.de (Postfix) with ESMTP id 089EA200E2; Sat, 2 Sep 2017 10:24:25 +0200 (CEST)
X-Virus-Scanned: amavisd-new at jacobs-university.de
Received: from hermes.jacobs-university.de ([212.201.44.23]) by localhost (demetrius1.jacobs-university.de [212.201.44.32]) (amavisd-new, port 10024) with ESMTP id Khulfkm9n5jo; Sat, 2 Sep 2017 10:24:24 +0200 (CEST)
Received: from elstar.local (elstar.jacobs.jacobs-university.de [10.50.231.133]) by hermes.jacobs-university.de (Postfix) with ESMTP id 739EF200E0; Sat, 2 Sep 2017 10:24:24 +0200 (CEST)
Received: by elstar.local (Postfix, from userid 501) id 42655407AC38; Sat, 2 Sep 2017 10:24:24 +0200 (CEST)
Date: Sat, 02 Sep 2017 10:24:24 +0200
From: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>
To: Robert Wilton <rwilton@cisco.com>
Cc: Andy Bierman <andy@yumaworks.com>, Xufeng Liu <Xufeng_Liu@jabil.com>, "netmod@ietf.org" <netmod@ietf.org>
Message-ID: <20170902082424.zhirq544fqea4zab@elstar.local>
Reply-To: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>
Mail-Followup-To: Robert Wilton <rwilton@cisco.com>, Andy Bierman <andy@yumaworks.com>, Xufeng Liu <Xufeng_Liu@jabil.com>, "netmod@ietf.org" <netmod@ietf.org>
References: <BN3PR0201MB086796F09BFD77FCD718C21BF19E0@BN3PR0201MB0867.namprd02.prod.outlook.com> <20170828154640.pzg7jfy5uepkb22q@elstar.local> <c8de6140-af50-0a4b-a479-b011a8dfbbe7@cisco.com> <CABCOCHRNt3Tkxy8Ffz3JGgPe-rQYwZ3MTLmD43OQi4P6tZQJmg@mail.gmail.com> <f7151a6b-9deb-52ad-62a9-78b29a552540@cisco.com> <20170830102902.2n5q6rgq2x2dxfq2@elstar.local> <e8482a9c-cba3-28e2-9ffa-ec5eb5c1c0a4@cisco.com> <20170830123156.cssrg5kklpo67fie@elstar.local> <CABCOCHTtN611FO2ov2kTLtZx-Q3=tzgH7Xk9uGvFUD1WuyMZyw@mail.gmail.com> <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
X-Clacks-Overhead: GNU Terry Pratchett
Content-Transfer-Encoding: 8bit
In-Reply-To: <b13c5e9a-e9f9-96e9-8823-0402fb74af09@cisco.com>
User-Agent: NeoMutt/20170714 (1.8.3)
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/4pIizRUL8Y-vO4N_U0RiE01mHqY>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Sep 2017 08:24:28 -0000

On Wed, Aug 30, 2017 at 05:44:01PM +0100, Robert Wilton wrote:
> 
> First question: How many pattern statements in draft and standard IETF YANG
> modules actually use Unicode properties (e.g \p{}).
> Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.
> 
> E.g.       pattern
>         '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
>       +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
>       + '(%[\p{N}\p{L}]+)?';
> 
> This could quite possibly have been written just as
> "\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.

Shorter but less precise. The thread started with a proposal to ban
\d, you seem to like it. Note that \d is not the same as [0-9] in
unicode as far as I know. \d is defined to be \p{Nd} and Nd has way
more than [0-9].

https://www.w3.org/TR/xmlschema-2/#regexs
http://www.fileformat.info/info/unicode/category/Nd/list.htm

Perhaps the usage of \p{N} and \p{L} above is not quite right (I
recall that I tried to find out what exactly the rules for a zone
index are and often you find out that there is not really a precise
definition). My standpoint is that it is the WGs that are responsible
to work out the pattern; the WGs are responsible to decide how strict
they want patterns to be. The pattern in RFC6991 rejects an 'IP
address' of the form 321.1.2.3 or 01.2.3.4 and I think this is
goodness but it is ultimately a decision of the WG producing the YANG
module how the patterns should look like and how strict they are.

And we should separate the discussion of how strict a pattern should
be from the discussion of using unicode constructs or other 'more
recent' constructs in pattern.

/js

-- 
Juergen Schoenwaelder           Jacobs University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>