Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Robert Wilton <rwilton@cisco.com> Wed, 06 September 2017 09:16 UTC

Return-Path: <rwilton@cisco.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 511E81323B4 for <netmod@ietfa.amsl.com>; Wed, 6 Sep 2017 02:16:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.501
X-Spam-Level:
X-Spam-Status: No, score=-14.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iaALobKtsbtr for <netmod@ietfa.amsl.com>; Wed, 6 Sep 2017 02:16:41 -0700 (PDT)
Received: from aer-iport-3.cisco.com (aer-iport-3.cisco.com [173.38.203.53]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B7D62132623 for <netmod@ietf.org>; Wed, 6 Sep 2017 02:16:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=3799; q=dns/txt; s=iport; t=1504689400; x=1505899000; h=subject:to:references:from:message-id:date:mime-version: in-reply-to:content-transfer-encoding; bh=93qCK8s1UozrI416/nfmUU/lUZC0EQrtuhmSVeH1lEE=; b=PGpgGqHZP/KsiqTdlqoKJ1WfObNHX3GgvlX5x5pTInBLfEEPMJVE5Up9 O9IwsWzgRI/9KlmIyIWMuH8kCEIYwtRrlVviOinD9xJqFUe8IPwQBKbt6 cmvwLt9Ykwf27SU/MtUp1lcI5LOd9PGi8KHFKcuiE0g2b6GZbHzwTxqhX I=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0BFAgCUvK9Z/xbLJq1eGQEBAQEBAQEBAQEBBwEBAQEBiUqLFZB8Ipg6CoU+AoR9FAECAQEBAQEBAWsohRgBAQEBAgEjDwEFRgsLGAICJgICVwYBDAgBAReKDgivEYInizkBAQEBAQEBAQIBAQEBAQEBIYENgh2DUIIOC4JyiAiCYQWgdJRRi1SHHY1Xh1SBOTYhgQ0yIQgcFYVhHIFoP4pfAQEB
X-IronPort-AV: E=Sophos;i="5.41,483,1498521600"; d="scan'208";a="655453635"
Received: from aer-iport-nat.cisco.com (HELO aer-core-4.cisco.com) ([173.38.203.22]) by aer-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2017 09:16:36 +0000
Received: from [10.63.23.66] (dhcp-ensft1-uk-vla370-10-63-23-66.cisco.com [10.63.23.66]) by aer-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id v869GaOj023525; Wed, 6 Sep 2017 09:16:36 GMT
To: "j.schoenwaelder@jacobs-university.de" <j.schoenwaelder@jacobs-university.de>, netmod@ietf.org
References: <847e5bf9-7b3d-9ff8-9954-970f32a2094c@cisco.com> <20170902073342.xoziwor4tdr5bipw@elstar.local> <D5D00209.C5C67%acee@cisco.com> <20170902112832.ymorfgdthobeio6q@elstar.local> <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com> <1e95ba5d-7aa2-e08f-56f9-27aa70822a11@cisco.com> <1504537140.5874.38.camel@nic.cz> <f0ddf7bd-c249-389f-e34b-0b901697307e@cisco.com> <1504629352.7175.40.camel@nic.cz> <8af6041d-7cd5-9608-70b4-7cffc4f884f8@cisco.com> <20170905180006.yecbqqdhxtkvosxk@elstar.local>
From: Robert Wilton <rwilton@cisco.com>
Message-ID: <bada0ee6-2861-9b25-32e3-7dbd7cdd1433@cisco.com>
Date: Wed, 06 Sep 2017 10:16:36 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <20170905180006.yecbqqdhxtkvosxk@elstar.local>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/bYNPzi0MZYxsc2g5u-V6JTBAiQk>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Sep 2017 09:16:42 -0000


On 05/09/2017 19:00, Juergen Schoenwaelder wrote:
> On Tue, Sep 05, 2017 at 06:17:09PM +0100, Robert Wilton wrote:
>>> I believe that tools intended for general use should follow the YANG spec
>>> literally.
>> I don't fully agree.  I think that they only need to cover the parts of the
>> YANG spec for the models that they are using (or might use). If nobody uses
>> Unicode blocks then it doesn't really matter whether a given tool supports
>> them or not.  It is always possible to caveat and add support for the
>> missing bits later.  E.g. if I was writing a bespoke XPATH implementation
>> for YANG then there is probably quite a lot of the XPATH spec that I would
>> also leave out as well, and just concentrate on the parts that people
>> actually use, or are likely to use.
>>
> If this is your understanding of standards, why do you want to define
> a subset of XSD pattern based on the your observation what is used or
> not used? Simply do not implement what you observe is not used. Why do
> we need guidelines of constructs not to use so that they are not used?
My aims:
1) To make pattern statements in standard YANG models easier to comprehend.
2) So that implementations designed to only support standard YANG models 
can have more confidence that they don't need to support all of the 
Unciode properties and character blocks.

>
> There are multiple contradictions in your posts, one of them was the
> idea of translating unicode matching to ASCII - which simply does not
> work.
This does work if your implementation is willing to be restricted to 
only supporting ASCII.  Some users of YANG seem to think that ASCII is 
sufficient to configure and manage network devices.  My person opinion 
is that they are probably broadly right, but there are some places where 
supporting a unicode string is better (e.g. the interface description 
leaf).  However, in these cases I think that either no pattern statement 
is required, or otherwise \w,\s,\d are probably sufficient.

I understand, and agree, that an implementation that restricts pattern 
statement support to only ASCII strings makes the implementation non 
compliant to the YANG spec.


>   Or the post where you said \d is OK but then later said \d is
> not OK since it translates to a large number of numeric characters.
Yes, my opinion changed when I found our that '\d' covers more than just 
ASCII.  As per the 6087bis text that I sent out, I think that '\d' can 
be used, but must not be used if the regex is meant to only match ASCII 
0 to 9.  My concern is that many readers/authors/implementors of YANG 
models may not understand properly understand that '\d' also covers 
digits in other unicode scripts, and hence I think that it is more clear 
(and hence better) to use '[0-9]' in pattern statements instead, since 
the interpretation of that is entirely unambiguous.


> You really need to sort out what you want, what the problem is you are
> trying to solve, how you select the subset of XSD pattern etc. Write
> and I-D.
Do you think that writing an I-D, that would contain the same arguments 
that I've presented here, would sway your opinion at all?

My assumption is that it wouldn't and hence writing up an ID would seem 
to be a waste of effort.

>   And at the end, people who only do POSIX regular expressions,
> because they come with the standard C library on POSIX systems or
> whatever the reason really is, still will either have to continue to
> cheat by silently interpreting XSD pattern as POSIX pattern or they
> create a proper new statement to at least properly distinguish
> different pattern languages.
Sure, but I don't regard either of these as good long term solutions.

Thanks,
Rob

>
> /js
>