Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Andy Bierman <andy@yumaworks.com> Wed, 06 September 2017 17:27 UTC

Return-Path: <andy@yumaworks.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BC88D132055 for <netmod@ietfa.amsl.com>; Wed, 6 Sep 2017 10:27:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yumaworks-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eKX_4UTjdkv4 for <netmod@ietfa.amsl.com>; Wed, 6 Sep 2017 10:27:31 -0700 (PDT)
Received: from mail-it0-x235.google.com (mail-it0-x235.google.com [IPv6:2607:f8b0:4001:c0b::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E4ABF1243F6 for <netmod@ietf.org>; Wed, 6 Sep 2017 10:27:30 -0700 (PDT)
Received: by mail-it0-x235.google.com with SMTP id f199so14324964ita.1 for <netmod@ietf.org>; Wed, 06 Sep 2017 10:27:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yumaworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=v3YW1xs5BMK14fMD6zeYfI5c9KhuvQoysuqjwaUUy2M=; b=UKVyMgZwkWwPIRqFLDyUc4CyEqgZeMR1pFuJrwfPIYJx9A6Pm+Liz5Xm/fLJsZ8t5r JJArfWhqz18uOvhzuWqtPCEQUmiS0+5Ajdh+aMp6RT2LTh7HvWxcNkSH4OyaJjafXekc nCmo0bl+VD2bjEvT20D4Xdg44rxUVuzCnNlma/LjFR4ArneMMfv7kCWLPHwX/v0G3G/N Zhp1ZO/9PQoDXhWwXiluPFFk3OOs+7OYBcZWJefeaailBJ6/Nv2Q3hrkUDlZsualtPbz JbJjNVjs4YshCqzEIeVIsMxNCTtTDyxMdyickhzOqjdJiJoHOnZpmjTDZSnsDVIPyJcE zEww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=v3YW1xs5BMK14fMD6zeYfI5c9KhuvQoysuqjwaUUy2M=; b=nila8Bc3Qb0y8NK7LxYpU5FGxXgnyonnUSkUkYcvwK7DvOkCHA8Ml3vsAMjuzdF8Rd fOztKJ6BD/01xJmJPsIVEcKiEOwgsMzVXCi0SeTwzAUTSWueVHU7bPlh1HUqrUkFQ07o trXt57vfh0/4FLc2p6xsmSTxaCPB9VbXdYQwkdwe1S6I5NM+EQDvu6+5RplzDUlL4CIT SIQS8Kwf69++yBwjxz/ZVnz4cLDGDCJIkMMFXpPVhXx/YLi2rNnAGBIUCC5KDd2cAhV2 VSv1mXkdH2fYVVgDHgV8exSKNY7Hs2ez8IHpa4D0r/hnwQv0D/euj2rXavgNfcMvYLQj bvYA==
X-Gm-Message-State: AHPjjUg2sjnJOb/+9ENdT/Y7HBIrwNDWgJg//4qE0h+Y5COA9CE/AQ7D n8nBw7JhjDK/7pe2b9hz7Df/2Tiu8LmX
X-Google-Smtp-Source: ADKCNb6EfpFXFaNFY7Wu3XPg+JGhFlo4adQ/Xf3DQylOhioNmgi/6IAou0JQhigJXY0YbEtifvV61ZxMZ107RsaYKZg=
X-Received: by 10.36.250.5 with SMTP id v5mr617168ith.24.1504718850135; Wed, 06 Sep 2017 10:27:30 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.36.6.206 with HTTP; Wed, 6 Sep 2017 10:27:29 -0700 (PDT)
In-Reply-To: <bada0ee6-2861-9b25-32e3-7dbd7cdd1433@cisco.com>
References: <847e5bf9-7b3d-9ff8-9954-970f32a2094c@cisco.com> <20170902073342.xoziwor4tdr5bipw@elstar.local> <D5D00209.C5C67%acee@cisco.com> <20170902112832.ymorfgdthobeio6q@elstar.local> <CABCOCHTC2MhBu0Zu44Z=f+J04HiENjQR+J0Sxy-arjcDmBHb_A@mail.gmail.com> <1e95ba5d-7aa2-e08f-56f9-27aa70822a11@cisco.com> <1504537140.5874.38.camel@nic.cz> <f0ddf7bd-c249-389f-e34b-0b901697307e@cisco.com> <1504629352.7175.40.camel@nic.cz> <8af6041d-7cd5-9608-70b4-7cffc4f884f8@cisco.com> <20170905180006.yecbqqdhxtkvosxk@elstar.local> <bada0ee6-2861-9b25-32e3-7dbd7cdd1433@cisco.com>
From: Andy Bierman <andy@yumaworks.com>
Date: Wed, 06 Sep 2017 10:27:29 -0700
Message-ID: <CABCOCHS1keNOYvE5jf0quLU2KnGijCs2jcysGuXUGRzph+kKGQ@mail.gmail.com>
To: Robert Wilton <rwilton@cisco.com>
Cc: "j.schoenwaelder@jacobs-university.de" <j.schoenwaelder@jacobs-university.de>, "netmod@ietf.org" <netmod@ietf.org>
Content-Type: multipart/alternative; boundary="94eb2c034f528cae6d055888a800"
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/nvGd8dqVi9saeNcVUhBGh90r5KQ>
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Sep 2017 17:27:32 -0000

On Wed, Sep 6, 2017 at 2:16 AM, Robert Wilton <rwilton@cisco.com> wrote:

>
>
> On 05/09/2017 19:00, Juergen Schoenwaelder wrote:
>
>> On Tue, Sep 05, 2017 at 06:17:09PM +0100, Robert Wilton wrote:
>>
>>> I believe that tools intended for general use should follow the YANG spec
>>>> literally.
>>>>
>>> I don't fully agree.  I think that they only need to cover the parts of
>>> the
>>> YANG spec for the models that they are using (or might use). If nobody
>>> uses
>>> Unicode blocks then it doesn't really matter whether a given tool
>>> supports
>>> them or not.  It is always possible to caveat and add support for the
>>> missing bits later.  E.g. if I was writing a bespoke XPATH implementation
>>> for YANG then there is probably quite a lot of the XPATH spec that I
>>> would
>>> also leave out as well, and just concentrate on the parts that people
>>> actually use, or are likely to use.
>>>
>>> If this is your understanding of standards, why do you want to define
>> a subset of XSD pattern based on the your observation what is used or
>> not used? Simply do not implement what you observe is not used. Why do
>> we need guidelines of constructs not to use so that they are not used?
>>
> My aims:
> 1) To make pattern statements in standard YANG models easier to comprehend.
> 2) So that implementations designed to only support standard YANG models
> can have more confidence that they don't need to support all of the Unciode
> properties and character blocks.
>
>

I do not agree that goal (1) is achieved by limited the usage of the
pattern expression language.
IMO it is important to achieve the full interoperability that is possible
between tools
that conform to the pattern definition language.  This is true whether the
language is XSD or some flavor
of Posix or whatever.  It is valuable for readers, writers, and tool-makers
to know that all
tools that conform to the standard use the same pattern expression language.

I do not agree that (2) should be a goal of the standard.
If tool-makers have a false expectation that they can use the parser for
any pattern
expression language, then tools will be fragile. The


Andy



>> There are multiple contradictions in your posts, one of them was the
>> idea of translating unicode matching to ASCII - which simply does not
>> work.
>>
> This does work if your implementation is willing to be restricted to only
> supporting ASCII.  Some users of YANG seem to think that ASCII is
> sufficient to configure and manage network devices.  My person opinion is
> that they are probably broadly right, but there are some places where
> supporting a unicode string is better (e.g. the interface description
> leaf).  However, in these cases I think that either no pattern statement is
> required, or otherwise \w,\s,\d are probably sufficient.
>
> I understand, and agree, that an implementation that restricts pattern
> statement support to only ASCII strings makes the implementation non
> compliant to the YANG spec.
>
>
>   Or the post where you said \d is OK but then later said \d is
>> not OK since it translates to a large number of numeric characters.
>>
> Yes, my opinion changed when I found our that '\d' covers more than just
> ASCII.  As per the 6087bis text that I sent out, I think that '\d' can be
> used, but must not be used if the regex is meant to only match ASCII 0 to
> 9.  My concern is that many readers/authors/implementors of YANG models may
> not understand properly understand that '\d' also covers digits in other
> unicode scripts, and hence I think that it is more clear (and hence better)
> to use '[0-9]' in pattern statements instead, since the interpretation of
> that is entirely unambiguous.
>
>
> You really need to sort out what you want, what the problem is you are
>> trying to solve, how you select the subset of XSD pattern etc. Write
>> and I-D.
>>
> Do you think that writing an I-D, that would contain the same arguments
> that I've presented here, would sway your opinion at all?
>
> My assumption is that it wouldn't and hence writing up an ID would seem to
> be a waste of effort.
>
>   And at the end, people who only do POSIX regular expressions,
>> because they come with the standard C library on POSIX systems or
>> whatever the reason really is, still will either have to continue to
>> cheat by silently interpreting XSD pattern as POSIX pattern or they
>> create a proper new statement to at least properly distinguish
>> different pattern languages.
>>
> Sure, but I don't regard either of these as good long term solutions.
>
> Thanks,
> Rob
>
>
>> /js
>>
>>
> _______________________________________________
> netmod mailing list
> netmod@ietf.org
> https://www.ietf.org/mailman/listinfo/netmod
>