[Jsonpath] Templating regexps (was: Re: #70 Regexps (was: Re: Draft minutes, consensus points, and actions from IETF 112))

Glyn Normington <glyn.normington.work@gmail.com> Tue, 16 November 2021 17:38 UTC

Return-Path: <glyn.normington.work@gmail.com>
X-Original-To: jsonpath@ietfa.amsl.com
Delivered-To: jsonpath@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 735B33A00B1 for <jsonpath@ietfa.amsl.com>; Tue, 16 Nov 2021 09:38:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Gh1pFSbF9G4z for <jsonpath@ietfa.amsl.com>; Tue, 16 Nov 2021 09:38:35 -0800 (PST)
Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 168D13A0061 for <jsonpath@ietf.org>; Tue, 16 Nov 2021 09:38:35 -0800 (PST)
Received: by mail-pf1-x429.google.com with SMTP id z6so43793pfe.7 for <jsonpath@ietf.org>; Tue, 16 Nov 2021 09:38:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to:cc; bh=3sDgZpyDdKY79Pj5riyaYc67028oFXwfHXyl0+7JsbE=; b=dcyG8/19p3RqRmK8G7aSv5xSgJCsdxGXxX3Nw/pF4R/6JvIrYc2eQ3nQriPWB1R8Za MtBJLglgPP2thkquC0uBwhB6Fg6cxD/cHX7hTv5M0bWejruDD28brJbd9U8JgM2m4ai4 qlAYikILVES3Zn223mwaBnIPuBpy6rV/e+/dFKUiuvLhxaiNairdvqELSqQMRdpGigLW YY+QdvM4OEt3zYKPvd+u/sxO88wXMXpr/yVkfRJKIRj9FxPBA+SLiF9n2PAmFRnI+Xq3 GsNsYzhmNOUAFTiM+Wmc7jDCTOxMyLKP95+HVHIvdjtY+YWlgz3RCQgV7ShWKbn/oTsu 2X8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=3sDgZpyDdKY79Pj5riyaYc67028oFXwfHXyl0+7JsbE=; b=GxCRH4pEceUja+m0S/4paMUHVJo2G+ixu8V4tP9Y8EzlYC1xbzqvDlCSuUbPtx3vgw gQWftyAqne6iB4WuUADuSh/p1DpA9jnpSiehLIWwUPI5d5NsXL/RqfYLoAyM81qb7uYR yhM5LqKyImY2GErtSthUhX1lQNxewrgNpBH36F0B0OcHrR/aUgrjwxa/5Ph2hZQII73I ChPX1YRAVNQNRAUv314KWEjiiZSi3v7vaHAZWzgMYx3fvEcvJnr6etfQXWyjmHF8m8r/ KDfFUQeC3AmqI+NE/xSKItnGlfp9Ip79dhw3iSdFoNn8v5XZHnOCpHguSbng8AIelX5H Vgdw==
X-Gm-Message-State: AOAM530M1BDBgRy+bzKIj3eYAY5tJ0Ve0aymwNzg95rzht59VEWsi4Tc 7iigKmOB1hj0Hg95qGSoZb61P09QvfDt+LsgFc/56dmX
X-Google-Smtp-Source: ABdhPJxZlbFCh6C21ylUFJ/nnx2hXWqA1gFk/J08OC4R4Qn36Aq9lylT4mjxIlhZ/KNHolla/Vjwi6+CHwdFWWBreUs=
X-Received: by 2002:a65:5a8d:: with SMTP id c13mr512434pgt.228.1637084313125; Tue, 16 Nov 2021 09:38:33 -0800 (PST)
MIME-Version: 1.0
From: Glyn Normington <glyn.normington.work@gmail.com>
Date: Tue, 16 Nov 2021 17:38:22 +0000
Message-ID: <CANH0GbJbizUjvyipvytGjdn5D=iqxtuRVpaMfyOEfrK2+EP+Mw@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: jsonpath@ietf.org
Content-Type: multipart/alternative; boundary="000000000000f3861305d0eb64d4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/jsonpath/Bbf3SQfLvWlqTDPJxh9WtBX-mY0>
Subject: [Jsonpath] Templating regexps (was: Re: #70 Regexps (was: Re: Draft minutes, consensus points, and actions from IETF 112))
X-BeenThere: jsonpath@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A summary description of the list to be included in the table on this page <jsonpath.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jsonpath/>
List-Post: <mailto:jsonpath@ietf.org>
List-Help: <mailto:jsonpath-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2021 17:38:40 -0000

> 2. Provide a way to plug in regular expressions (of different flavors)

Let's explore the above option a little further in this separate thread.

If we defined a general way of using jsonpath with a variety of regexp
specs -- basically just ensuring that we can parse a filter's regexps
without consuming more of the surrounding jsonpath than we should --, then
jsonpath would effectively become a template for concrete specifications
such as jsonpath/iregexp, jsonpath/RE2, etc.

This would break the dependency between the jsonpath spec and completion of
iregexp, allowing jsonpath to complete early. It would also allow the
iregexp spec to be revised without having to revise the jsonpath spec.
Another advantage is that it would reduce the cost of implementation in
some cases and thereby increase the adoption of the jsonpath spec.

Our ultimate, interoperable recommendation could still end up being
jsonpath/iregexp, once iregexp is finished. We could do this either by
recommending (or mandating) jsonpath/iregexp in a second version of the
jsonpath spec or we could publish a separate spec tying the two together.

The main problem with this template approach is the reduction in
interoperability among implementations which choose distinct regexp specs.
But an advantage for implementers is that they could simply choose the most
convenient regexp implementation, not have to perform any mapping (which
would include possibly onerous test writing), and get good interoperation
with other jsonpath implementations which make the same choice. I think
this would increase the adoption of jsonpath among implementers.

Another problem with the template approach is that implementers may be
tempted to plug in a convenient regexp implementation without a decent
spec. This would make using that combination more difficult (because of the
need for guesswork on regexp details) and could further limit
interoperation.

Admittedly, this option has several disadvantages, but I don't think we
should necessarily discount it, especially as a stepping-stone to our final
destination.

On Sun, 14 Nov 2021 at 21:15, Carsten Bormann <cabo@tzi.org> wrote:

> On 2021-11-14, at 12:35, James <james.ietf@gmail.com> wrote:
> >
> > * #70 - Have discussion on list with known options
>
> As a reminder, my slides had these options (renumbered for easier
> reference, and expanded a bit):
>
> 1. Select (define) one regular expression flavor
> 2. Provide a way to plug in regular expressions (of different flavors)
> 3. No regexps in base RFC (but keep an extension point)
>
> 1. further splits into:
>
> 1a. Select *a version of* ECMAScript (parsing/searching RE)
> 1b. Select W3C XSD RE (matching RE)
> 1c. Build "modest subset" (e.g., iregexp)
>
> Since we don’t have a consensus or even a majority among the
> implementations, we are free to do the right thing, if we can pull that off.
>
> As you know, I have been exploring 1c.
>
> I submitted an updated version -01 of draft-bormann-jsonpath-iregexp.
> As in -00, I’m using W3C XSD RE as a base, as these are actual regular
> expressions, amenable to implementation techniques that are less
> susceptible to DoS problems than the Perl/PCRE/ECMAscript dialect.
>
> Apart from character class subtraction and the exact semantics of
> Multi-Character escapes (\s \d \w etc., outside and inside of character
> class expressions), W3C XSD RE are pretty much a consensus subset of the
> various regular expression dialects, except that they are in the form of
> matching expressions (no anchors needed) instead of parsing expressions.
>
> I believe the spec should have conversion instructions for implementers
> that just want to use the regexp engine they happen to have handy.  Because
> of the consensus subset nature of W3C XSD RE, these instructions are
> relatively straightforward (copy, and surround by the anchors the target
> flavor happens to use).
>
> I tried to add conversion instructions for Multi-Character escapes (\s \d
> \w and \S \D \W, leaving out the \c and \i that nobody except W3C has).
> As you can see when looking at the diff, the result is not pretty when it
> comes to double negation in character classes.  Maybe a pathological case,
> but a bit of a trap, and maybe not that useful anyway as the W3C semantics
> is quite different from the PCRE ones.
> So I’m leaning towards a -02 that does not have Multi-Character escapes.
> (None of the regexps I found in RFCs uses them, and not having them also
> happens to be pretty much what the json-schema.org drafts seem to
> converge at.)
>
> Grüße, Carsten
>
> Status:   https://datatracker.ietf.org/doc/draft-bormann-jsonpath-iregexp/
> Html:
> https://www.ietf.org/archive/id/draft-bormann-jsonpath-iregexp-01.html
> Diff:
> https://www.ietf.org/rfcdiff?url2=draft-bormann-jsonpath-iregexp-01
>
> --
> JSONpath mailing list
> JSONpath@ietf.org
> https://www.ietf.org/mailman/listinfo/jsonpath
>