Re: [netmod] Pattern statements [was Re: Query about augmenting module from submodule in YANG 1.0]

Robert Wilton <rwilton@cisco.com> Wed, 23 August 2017 13:23 UTC

Return-Path: <rwilton@cisco.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C40F3132C0D for <netmod@ietfa.amsl.com>; Wed, 23 Aug 2017 06:23:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.501
X-Spam-Level:
X-Spam-Status: No, score=-14.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uGIrY8nVcmdU for <netmod@ietfa.amsl.com>; Wed, 23 Aug 2017 06:23:36 -0700 (PDT)
Received: from aer-iport-1.cisco.com (aer-iport-1.cisco.com [173.38.203.51]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 332FD1326F3 for <netmod@ietf.org>; Wed, 23 Aug 2017 06:23:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=4353; q=dns/txt; s=iport; t=1503494616; x=1504704216; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=05S+Tiganwx6++2brd7a2OOSoa0cGQpETx3yZDLOeLE=; b=LD9E3LjRj6WRZHr9ffraJEaD57Y3TFdKN3UQ+7o8/SKKKzYLrcgJgXmJ oYV6SYNu1ZPGNUeYOaCFJQggqI3RjePTM5jL5Z0YXubz6Mm94CICRxA8x 82Rl3LRiMEMfLxKDU9AgfB3cARWyipRrtmdiHSUE+Iw9Pa6TxjcmyrCW7 E=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0DiAQCBgZ1Z/xbLJq1ZAxoBAQEBAgEBAQEIAQEBAZRbkRaWMoIEhUcChQcUAQIBAQEBAQEBayiFGAEBAQECAThBBQsLGC5XBg0IAQGKJQiwfItsAQEBAQEBAQEBAQEBAQEBAQEBIIMqg06CDoFwWDSEQAESAUAmhS0FoFiLK4kZghKJPCSGco0+iHA2IX8LMiEIHBVJhUyBTz+IeA0XB4IUAQEB
X-IronPort-AV: E=Sophos;i="5.41,417,1498521600"; d="scan'208";a="696705512"
Received: from aer-iport-nat.cisco.com (HELO aer-core-2.cisco.com) ([173.38.203.22]) by aer-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Aug 2017 13:23:12 +0000
Received: from [10.63.23.66] (dhcp-ensft1-uk-vla370-10-63-23-66.cisco.com [10.63.23.66]) by aer-core-2.cisco.com (8.14.5/8.14.5) with ESMTP id v7NDNC6J013069; Wed, 23 Aug 2017 13:23:12 GMT
To: Vladimir Vassilev <vladimir@transpacket.com>
Cc: "netmod@ietf.org" <netmod@ietf.org>
References: <E3378E0605547F4E854DEE0CB1116AB020865B@gbcdcmbx03.intl.att.com> <85A1FF5A-EF0B-4278-B4FF-3FE431486B2C@tail-f.com> <E3378E0605547F4E854DEE0CB1116AB02102DC@gbcdcmbx03.intl.att.com> <11857e8e-f46e-dc2e-cf99-80224859d221@transpacket.com> <E3378E0605547F4E854DEE0CB1116AB0210631@gbcdcmbx03.intl.att.com> <defe35bb-bb8b-f1f0-d8c4-2d2d0f23731b@transpacket.com> <1502290869.16638.15.camel@nic.cz> <20170809151312.GC42207@elstar.local> <6ef68131-f731-0edc-b731-d7ec85924f03@cisco.com> <E3378E0605547F4E854DEE0CB1116AB021CE2D@gbcdcmbx03.intl.att.com> <D5C05EB3.C2681%acee@cisco.com> <7614040f-9f8f-09c2-1854-63ad9ffb6be1@cisco.com> <5929631c-e51d-ae66-52d1-cbc87ca3506b@transpacket.com>
From: Robert Wilton <rwilton@cisco.com>
Message-ID: <321a45fb-77e1-23c7-184b-d3bff9d41c39@cisco.com>
Date: Wed, 23 Aug 2017 14:23:12 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <5929631c-e51d-ae66-52d1-cbc87ca3506b@transpacket.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/Ki2D6xYslFuvgi09rwWpK4Pz864>
Subject: Re: [netmod] Pattern statements [was Re: Query about augmenting module from submodule in YANG 1.0]
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Aug 2017 13:23:39 -0000


On 23/08/2017 12:52, Vladimir Vassilev wrote:
> On 08/21/2017 05:14 PM, Robert Wilton wrote:
>
>> Hi Acee,
>>
>> That makes sense.
>>
>> The other thing that I think that we have got wrong is modelling 
>> regex pattern statements.  I think that it would be much better if 
>> these were written to be less exhaustive and much simpler.
>>
>> E.g. the "route distinguisher" pattern in 
>> draft-ietf-rtgwg-routing-types-09 is defined as this:
>>
>>           pattern
>>             '(0:(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|'
>>           +     '6[0-4][0-9]{3}|'
>>           +     '[0-5]?[0-9]{0,3}[0-9]):(429496729[0-5]|'
>>           +     '42949672[0-8][0-9]|'
>>           +     '4294967[01][0-9]{2}|429496[0-6][0-9]{3}|'
>>           +     '42949[0-5][0-9]{4}|'
>>           +     '4294[0-8][0-9]{5}|429[0-3][0-9]{6}|'
>>           +     '42[0-8][0-9]{7}|4[01][0-9]{8}|'
>>           +     '[0-3]?[0-9]{0,8}[0-9]))|'
>>           + '(1:((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|'
>>           +     '25[0-5])\.){3}([0-9]|[1-9][0-9]|'
>>           +     '1[0-9]{2}|2[0-4][0-9]|25[0-5])):(6553[0-5]|'
>>           +     '655[0-2][0-9]|'
>>           +     '65[0-4][0-9]{2}|6[0-4][0-9]{3}|'
>>           +     '[0-5]?[0-9]{0,3}[0-9]))|'
>>           + '(2:(429496729[0-5]|42949672[0-8][0-9]|'
>>           +     '4294967[01][0-9]{2}|'
>>           +     '429496[0-6][0-9]{3}|42949[0-5][0-9]{4}|'
>>           +     '4294[0-8][0-9]{5}|'
>>           + '429[0-3][0-9]{6}|42[0-8][0-9]{7}|4[01][0-9]{8}|'
>>           +     '[0-3]?[0-9]{0,8}[0-9]):'
>>           +     '(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|'
>>           +     '6[0-4][0-9]{3}|'
>>           +     '[0-5]?[0-9]{0,3}[0-9]))|'
>>           + '(6(:[a-fA-F0-9]{2}){6})|'
>>           + '(([3-57-9a-fA-F]|[1-9a-fA-F][0-9a-fA-F]{1,3}):'
>>           +     '[0-9a-fA-F]{1,12})';
>>         }
>>
>> But I think that it would be much easier to read, and quite possibly 
>> more performant to execute, if the pattern regex was written 
>> something like the following:
>>
>>  pattern:
>>     '(0:[0-9]{1,5}:[0-9]{1,10})|
>>      (1:([0-9]{1,3}\.){4}:[0-9]{1,5})|
>>      (2:[0-9]{1,10}:0:[0-9]{1,5})|
>>      (6(:[a-fA-F0-9]{2}){6})';
>>
>> Of course, this would allow more invalid values, but most servers 
>> would be expected to reject those when it converts them into an 
>> internal binary format any way.
>>
>> What do you, and others, think?
> You still need the 
> |(([3-57-9a-fA-F]|[1-9a-fA-F][0-9a-fA-F]{1,3}):[0-9a-fA-F]{1,12}) in 
> the end to not reject valid values though.
Sure, OK.

>
> IMO a pattern statement has value if it absolutely defines the set of 
> valid strings.
It still has value if it also performs some simple checks and removes 
obvious mistakes.

But even if a value passes the regex filter, it still doesn't guarantee 
that is the value is correct.  Someone could put a typo in there, or 
perhaps configure a multicast IP address where only unicast addresses 
are allowed, or put the same IP address on two separate interfaces, or 
use a IP address that they don't own, etc ...

> In general I do not see the benefit of pattern statements that do not 
> reject all invalid string instances. I prefer the original pattern or 
> none at all.
OK, so some potential counter examples:
1) Email address.  I understand that the full regex to validate all 
email addresses is very complex, but checking that it at least contains 
an @ symbol still has benefit.  It would seem that a short imperfect 
regex is better than a complete perfect regex.
2) A list of VLAN ranges, e.g. want to allow strings that look like 
this: "1-10,20-400,600,2000-3000", but only with non overlapping values 
in ascending order.  It is easy to write a regex to check that the 
structure is right, but AFAIK it is hard (impossible?) to write a regex 
that ensures that the ranges don't overlap and are specified in 
ascending order.

So, I propose that we use regexes for checking that the string is 
structurally correct, but don't use regexes to perform numerical range 
checks of string encoded numbers, since it makes the regexes hard to 
read/verify, and doesn't improve the readability of the YANG file either.

Thanks,
Rob


>
> Vladimir
>
>> Thanks,
>> Rob
>
> .
>