Re: [Jsonpath] Some Comments ...

Tim Bray <tbray@textuality.com> Sat, 27 February 2021 15:43 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: jsonpath@ietfa.amsl.com
Delivered-To: jsonpath@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1CC613A0BC5 for <jsonpath@ietfa.amsl.com>; Sat, 27 Feb 2021 07:43:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=textuality-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Or95cCzALXWU for <jsonpath@ietfa.amsl.com>; Sat, 27 Feb 2021 07:43:42 -0800 (PST)
Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E534E3A0BC2 for <jsonpath@ietf.org>; Sat, 27 Feb 2021 07:43:41 -0800 (PST)
Received: by mail-lj1-x236.google.com with SMTP id q23so14132187lji.8 for <jsonpath@ietf.org>; Sat, 27 Feb 2021 07:43:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tk03jgcT5I7gXoZl2Qe1oWDvdwhCJg6R3zddjm//QM4=; b=A7NDrs9paNyI1N9RDFgn7856lL8L8yErAVVi72PiGuMG9qMPD5KO9hG1KCIERRs+l7 qXUoNdWn/zmyeIncCmB/8FR7fds9NgSuOB7KnVyHpUkkr12OLuWEM9U6FTEmYvOFQJoP X5ZDNDpI1UyGRJarpJXyp/3/yk/hnjJxn+wR3fQNmULyGho8fEdd4OZJxi8llVumYs// jx26UITHwpOeLdcG24OdS6SSKnDCfrDhTyJOiF7IIkr1npWTnSr3VFGcydRcqeO11a8Y zvN99z83UPwT1FO7PuanlLxGxcsefCmBZF8yUwRizhr/yw7OTuINvAkqhXhF69LqbDVZ H7yA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tk03jgcT5I7gXoZl2Qe1oWDvdwhCJg6R3zddjm//QM4=; b=LKNrFyY3H8UdFYqXw3TUEq+oLZtl9wup9mhHUwKUFhhskNsvu06699hALqMyexQEmz P6uA9faxD4hsY5u/N4s2NIPa8KKAhWrSkPjzxtt93mBBKynml3a6biXPsuXjFmLokXvu Uw4x0cG+tBsv1EGMI5jzWV5j8S+O29l04iUN1rU0Ru4WVUBtfenlOgND+9cxoD/YqgqE mHNiIFyFkDR6sOMPU3gqMqaFdZeHmZkYkje7bqkHpIMRbYFdhP0oo9Z+YUDkEkKRj6CX +bj7c92xsI4Ky4jBTk6Iv2vUr1QoG4gdQIMiQhOA4v5VqFkNp1gs1ZiXmvLTIVcPKwu0 Xhng==
X-Gm-Message-State: AOAM532jf+1GyaVEdCNSDy8D2r2hvpW+y54ARhR2KvkGn0+xMCLNJxzL y8hEfqYEDdMvjZp9ydiVQNTgyqIHxlPTlW+R6+ReYw==
X-Google-Smtp-Source: ABdhPJw6vmWqTrp6DqT30izBrq/oxC/Y8jyU4gVdLY/bBboRzwChK7vRMz1y1NfO8WkdjcdxMNutQaz/bd8N8Disegg=
X-Received: by 2002:a05:651c:2125:: with SMTP id a37mr4570601ljq.19.1614440619596; Sat, 27 Feb 2021 07:43:39 -0800 (PST)
MIME-Version: 1.0
References: <98ed1a4f-82fd-3f94-a707-8569f89a5041@goessner.net> <b028f688-c71a-058b-4948-1a87b4889ffd@goessner.net>
In-Reply-To: <b028f688-c71a-058b-4948-1a87b4889ffd@goessner.net>
From: Tim Bray <tbray@textuality.com>
Date: Sat, 27 Feb 2021 07:43:28 -0800
Message-ID: <CAHBU6isCkGBCG_Sb5B23_0j4byGVVD-xao+2bJ9mO53+kBAJ8w@mail.gmail.com>
To: Stefan Gössner <stefan@goessner.net>
Cc: jsonpath@ietf.org
Content-Type: multipart/alternative; boundary="000000000000a4742d05bc533fb7"
Archived-At: <https://mailarchive.ietf.org/arch/msg/jsonpath/wN1I_c5prWOxaCW69a9IUicqQ_o>
Subject: Re: [Jsonpath] Some Comments ...
X-BeenThere: jsonpath@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A summary description of the list to be included in the table on this page <jsonpath.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jsonpath/>
List-Post: <mailto:jsonpath@ietf.org>
List-Help: <mailto:jsonpath-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 15:43:47 -0000

On behalf of James and mysell, co-chair hats on: First of all, thanks to
Stefan for pulling together that nice list of issues.  It needs to be
broken up into a bunch of separate issues we can argue about and resolve.
If either Stefan or Marko wants to do that fine, otherwise I will in the
next couple of days.

Secondly, on the subject of WG communication:

The WG mailing list isn't optional, there are those who will want to
participate and not deal with GitHub etc, they're within their rights.
It's archived and searchable and paid for by IETF.

The use of GitHub is something of an open question, but it has been
successfully applied in multiple WGs, including some of our most
highly-trafficked and contentious, to the extent that there are RFCs [8874
<https://tools.ietf.org/html/rfc8874>, 8875
<https://tools.ietf.org/html/rfc8875>] covering how to do it.  For the
purposes of WG work, James and I would like to combine the mailing list and
GitHub in the way that seems to have proven to work.  We can't forbid
anyone from having Slack conversations but that's entirely optional, you
don't have to go there to participate in the WG.

On Sat, Feb 27, 2021 at 12:44 AM Stefan Gössner <stefan@goessner.net> wrote:

> I copied these comments forward to Github ... for better reading and
> further commenting.
>
> https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-jsonpath/issues/54
>
> I would also like to try out that cool Github discussion tab, Daniel wrote
> about.
>
> thanks
> --
> sg
>
>
> Am 26.02.2021 um 19:30 schrieb Stefan Gössner:
>
> Hello List,
>
> It has been important to go through this list threads carefully. In fact I
> should have done that at first. Now I can understand the current draft and
> appreciate the work already done much better.
>
> I collected some citations (important from my point of view) with comments
> already in Markdown.
>
>
> ## Title of the specification
>
> > JSONPath: A query language for JSON data.
> *(Carsten Bormann)*
>
> > I think I’d slightly prefer the term “syntax” to “language” because
> “query language” has a smell of various things that end with the letters
> “Q” and “L”.  But not passionate about that.
> *(Tim Bray)*
>
> > JSONPath: A query syntax for JSON.
> > Another wild-card idea: JSONPath: Query expressions for jSON
> *(Tim Bray)*
>
> > The beauty of this is that the plural form “query expressions” implies a
> set of expressions, so it implies “language”.  It’s indeed more than the
> grammar/syntax of those, so why not talk about the expressions as a whole.
> This also makes it possible to just use “for JSON”, without going into
> detail what these query expressions operate on.
> *(Carsten Bormann)*
>
> There seems to be an agreement for "*JSONPath: Query expressions for
> JSON*". I like that also.
>
> ## Terminology
>
> > My own view is that the terminology should stay consistent with RFC
> 8259, and that the word "object" should not be used for items that are
> not JSON objects in the sense of RFC 8259.
> *(Daniel P)*
>
> > To Carsten's point about what we call things, the number of
> distinguished
> terms per RFC8259 is pretty small: JSON text, value, object, array,
> number,
> string.  Having spent quite a bit of time specifying JSON DSLs, I find
> that
> using just those terms doesn't seem to get in the way or cause problems,
> so
> I'd argue that we should stick to them (and build up to higher-level
> constructs as required for JSONPath).
> >
> > … oh, and I forgot the very useful "member".
> *(Tim Bray)*
>
> > … and “element” (the things in arrays). *(Carsten Bormann)*
>
> > The problem with JSON value is that it also can be quite confusing due
> to the usual use of that term.  Pointing to a tree and saying “the values
> inside that tree” is not going to be felt as equivalent to “the set of all
> subtrees of that tree, including the tree itself”.  But if JSON value is
> the only term we have, it has to be.  Hence my preference to talk about
> data items when I mean the items themselves and not their “value”.
> *(Carsten Bormann)*
>
> > I think the key difficulty is whether each (key, value) pair in an
> object is "a thing" that can be identified and manipulated and potentially
> returned. (If we're talking analogies, then it's analogous to an attribute
> node in the XDM model).
> *(Michael Kay)*
>
> > ECMA-404 uses "name/value pair", which is what I understand the term
> "member" to mean (Douglas Crockford uses "member").
> *(Daniel P)*
>
> > I think the term “union” is poor. If we think of it as concatenation of
> results, then the result is as expected.
> *(Glyn Normington)*
>
> I understand, that within RFC8259 we have JSON values of different types.
> They are structured somehow, which is not so much of interest here.
>
> But while querying that structure with JSONPath it is vitally important to
> identify that hierarchical structure as a tree. So in fact we build up a
> higher-level construct here. We also need to call "the things" in the tree
> somehow. I was able to identify
>
> * "node" or "item" of a tree
> * "member" of an object
> * "name/value" or "key/value" pair alias "member"
> * "element" of an array
>
> but could not see an agreement here.
>
> I agree to Glyn calling the term "union" poor (s. below).
>
>
> ## Differentiation from JSON Pointer (JSONPath draft charter)
>
> > I anticipate being asked "Why is JSON Pointer not sufficient?" Indeed
> its abstract says:
> >
> >   JSON Pointer defines a string syntax for identifying a specific value
> within a JavaScript Object Notation (JSON) document.
> >
> >... which sounds awfully similar.  If we could include a sentence about
> that, or a link to an answer, that might be helpful.
> *(Murray S. Kucherawy)*
>
> > No - it's not similar in concept, they're separate things. If you really
> wanted to mention JSON Pointer, you could say something like "Note that
> while JSON Pointer (RFC xxxx) is already standardised, it is designed to
> provide a reference to a single, specific part of a JSON document, whereas
> JSONPath provides the ability to query a document and potentially return
> multiple values."
> *(Mark Nottingham)*
>
> >The short answer is that JSON pointer is good if you already know the
> structure of the JSON data item you want to point into, and you want to
> point to exactly one position in there.  If you need to do something that
> is closer to a “search” (which might also result in multiple positions),
> JSONPath gives you more rope.
> *(Carsten Bormann)*
>
> +1
>
> ## References to XPath
>
> > I wonder if the analogies between XPath and JSONPath are going to be
> helpful, or whether they're actually dangerous by implying equivalences
> between constructs that are in fact somewhat different?
> *(Michael Kay)*
>
> > I tend to agree. Although JSONPath was inspired by XPath, I wouldn't
> want to confuse the JSONPath spec by going into detailed comparisons at
> the risk of contradicting the normative text.
> *(Glyn Normington)*
>
> > Someone on StackOverflow today asked a question about JSONPath; they
> called it (and tagged it) XPath, we really don't want that kind of
> confusion.
> >
> > In addition, the reference to the XPath specification in 6.2 is out of
> date, and the comparison with XPath in Table 2 is very approximate and the
> terminology inaccurate: for example there is a mention of "node sets",
> which exist in XPath 1.0 but not in XPath 2.0, yet the citation is to XPath
> 2.0. For someone who knows the semantics of XPath the comparison raises all
> sorts of questions about sorting of results into document order,
> elimination of duplicates etc, which are complications this spec can well
> do without. (Though some answers are needed, for example if ..store..price
> matches the same price in more than one way, do you get more than one
> result? And if not, what does "the same price" actually mean?)
> *(Michael Kay)*
>
> It seemed to be important in 2007, while argumenting to have something
> like XPath for JSON. If nowadays the terminology used has changed
> significantly with XPath 2.0 and 3.0, we better leave that comparison table
> 2 out. I am quite passionless here.
>
> ## Array Slice Operator
>
> > Thanks! The ABNF for an array slice in that reference
> > ```
> > integer = [%x2D] (%x30 / (%x31-39 *%x30-39))
> >
> > array-slice = [ integer ] ws %x3A ws [ integer ]
> >                     [ ws %x3A ws [ integer ] ]
> >                              ; start:end or start:end:step
> > ```
> > is consistent with JMESPath, Python, and my understanding of
> ECMASCRIPT 4.
> > *(Daniel P)*
>
> > Did anyone else have an opinion on the behaviour of slices such as
> [::0]?
> The current draft allows this and says it returns an empty array, but
> there
> is good reason to say it should error so that the slice operation is then
> consistent with Python slicing. See below for more context.
> *(Glyn Normington)*
>
> It's good having read this thread and thus understand the current draft
> much better. I like the decision to be consistent with Python and also
> getting an empty selection set with `step=0`.
>
> FYI: there is a recent proposal for adding slice notation syntax to
> JavaScript, currently at stage 1 of the TC39 process.
>
> https://github.com/tc39/proposal-slice-notation
>
> Interestingly it won't have a step argument ...
>
>
> https://github.com/tc39/proposal-slice-notation#why-doesnt-this-include-a-step-argument-like-python-does
>
> ... because of syntax collision with the new `this-binding` syntax
> proposal `::`
>
> https://github.com/tc39/proposal-bind-operator
>
> However, we should not let us influence by this.
>
> ## Unions
>
> > I don't think any implementation would remove duplicates from a path
> such as `"$.store.book"`. I believe this is only somewhat controversial
> in the context of unions [,]. The name "union" suggests that distinct
> values be returned, compare with SQL unions. But Stefan Goessner's
> implementation doesn't do that, it concatenates all results that meet
> each criteria. There are a few JSONPath implementations that produce
> real unions with no duplicates instead of concatenated results, but I
> don't think that's the consensus.
> *(Daniel P)*
>
> > I think the term “union” is poor. If we think of it as concatenation of
> results, then the result is as expected.
> *(Glyn Normington)*
>
> > I agree with that comment, but it's partly because I'm used to SQL
> UNION,
> which is different. I prefer the JMESPath term for an analogous construct,
> MultiSelect List, https://jmespath.org/specification.html#multiselect-list.
>
> *(Daniel P)*
>
> Introducing the union operator `[,]` simply was meant an analogon to
> XPath's operator `'|'`. I cannot tell, if it was a simple combination of
> node sets in Xpath 1.0 or a true union without duplicates. I obviously was
> not aware of that subtle (essential ?) union characteristic.
>
> So I fully agree to Glyn Normington's '... the term “union” is poor'
> statement. Are there some better alternative terms, perhaps 'multi-index
> operator', 'index list', 'subscript list', etc.?
>
> ## Duplicates and Ordering
>
> > It was my impression that we were talking about duplicated nodes not
> duplicated values:
> >
> > Given th array [10,20,30]
> >
> > $..[0,1,0]
> >
> > Would yield only two results [10, 20]
> >
> > (Not that I'm advocating for removing duplicates, personally I think we
> shouldn't)
> *(Marko Mikulicic)*
>
> > You’re framing this as “removing duplicates”.
> Another view is that [10, 20, 10] would be “adding duplicates” (copies of
> the same node). Related are ordering issues:
> >
> > `$..[1,0] ➔ [20, 10] Or [10, 20]`
> >
> > I would expect the spec will leaves implementations some leeway here,
> but that should be based on an examination of existing implementations.
> *(Carsten Bormann)*
>
> > The mental model that leads to omitting duplicate nodes in the output is
> "selection": if you take an input array and select nodes with index 0,1 or
> 0, you get only 2 results (since selecting an index twice has no effect).
> >
> >OTOH, if you opt for a "collect" model, whenever you encounter a node
> that
> matches that query you add it to the result stream, thus the same nodes
> can
> be present multiple times in the result.
> >
> >I have a slight preference for the "collect" model, because the general
> case in jsonpath is to collect things that appear at various points in the
> json tree. For example:
> >
> >`{"a": {"b": 1, "c": 2}, "d": 3},  $.a.b yields [1] and not
> {"a":{"b":1}}`
> >
> >(i.e. jsonpath is not a filter and view operation but a pick and gather
> operation)
> *(Marko Mikulicic)*
>
> > In implementations that support paths (the majority don't), the query
> function takes a parameter that indicates values or paths. In both
> cases the query returns a JSON array of JSON values, in the latter
> case, a JSON array of normalized paths.
> *(Daniel P)*
>
> I must confess to never having thought about duplicates, let alone wanting
> to eliminate them. So I do like Marko's comparison of 'selection-model' vs.
> 'collection-model' a lot. I would opt for the latter. In this sense the
> result of a 'JSONPath query expression' should be termed a 'collection'.
>
> Regarding ordering I see something like a 'natural ordering', according to
> which
>
> `$..[0,1] ➔ [10, 20]`
> `$..[1,0] ➔ [20, 10]`
>
> would result with the example above.
>
> I do understand the use cases for reordering, duplicates removal,
> filtering, etc.. But this can always be seen as a postprocessing step on
> the resulting collection by handing it over to accompanying tools (think of
> pipe operator).
>
> Of course this cannot work on the result collection of values alone (s.
> duplicate nodes vs. duplicate values above), it rather requires a
> collection of (normalized) pathes. In this sense, I like this view:
>
> > In my opinion the right balance between powerfulness and enabling
> simple implementations has been so far one of the key factors that
> made JSONPath popular over other alternatives, even if it lacks
> support for aggregation functions.
> *(Davide Bettio)*
>
> ## Filter Expressions
>
> > Related to that, it would be helpful to determine if JSONPath filters
> apply to both JSON objects and arrays, or only to JSON arrays.
> *(Daniel P)*
>
> > I would support restricting filters to arrays, if others agree.
> *(Glyn Normington)*
>
> I tend to let implementations and their "normative force of the factual"
> decide here or in doubt agree to Glyn's restriction to arrays.
>
> I am very unhappy with confusing `$..book[(@.length-1)]`, where `'@'`
> addresses the array itself and implies that array has a `length` property.
> In filter expression examples `'@'` more consistently addresses the current
> array element.
>
> The invocation of 'the underlying scripting engine' wasn't meant a serious
> normative aspect, but rather a quick and dirty solution for JavaScript and
> PHP implementations at that time.
>
>
> ### Corner Case
>
> > Consider this perfectly legal JSON object
> >
> > ```{ "ab": 0,  "'a.b": 1,  "a-b": 2, "a": { "b": 3 } }```
> >
> >So `$.ab` is 0, `$.a.b` is 3, `$['a.b']` is 1, `$['a-b']` is 2. You'd
> like to say `$.a-b` but lots of libraries will refuse it because `"a-b"` is
> not a legal JavaScript "name" construct, that's why you have to say
> `$['a-b']`.
> >
> > But suppose your library would accept `$.a-b`.  Then `$.a-b` and
> `$['a-b']` would be synonyms, but `$.a.b` and `$['a.b']` wouldn't.
> *(Tim Bray)*
>
> Hmm ... this seems to be a hint to better exclude `'-'` from
> dot-child-selector syntax. I think I have read more discussion about that,
> currently don't know where.
>
> ## Respect Implementations
>
> > As I mentioned in the session, I think there's a non-trivial amount of
> risk here that some implementations won't be willing or able to move away
> from their current behaviours, even if interoperability would improve if
> they did so. However, there are ways to mitigate that (e.g., a separate
> 'rfcxxxx compliant' mode). Even so, it will be important to get good
> participation from as many current implementers as possible.
> *(Mark Nottingham)*
>
> > The WG will develop a standards-track JSONPath specification that
> is technically sound and complete, based on the common semantics
> and other aspects of existing implementations.  Where there are
> differences, the working group will analyze those differences and
> make choices that rough consensus considers technically best, with
> an aim toward minimizing disruption among the different JSONPath
> implementations.
> *(Barry Leiba)*
>
> > I'm OK with this, but for context: I've been a pretty intense JSONPath
> user
> in recent years, and AFAIK the spec, and the implementations, are mostly
> OK, so the choice between "make JSONPath good" and "don't invalidate
> implementations" is unlikely to come up. If it did, my predisposition
> would
> be to err on the side of not breaking implementations, but I don't think
> that's inconsistent with Barry's text.
> *(Tim Bray)*
>
> +1 to all.
>
> ## Error Handling
>
> > My mental model at the moment is that a JSONPath expression can be valid
> or erroneous; application of a valid expression yields a result (which may
> be empty), but does not raise errors.  That may not be the right model for
> all applications.
> *(Carsten Bormann)*
>
> > The  general approach that I've seen several times (including my
> Elixir implementation) is that an error is raised when there is a
> syntax error, therefore an invalid expression (e.g. $.foo[[5]) raises
> an error. Conversely a valid expression applied to a bogus input never
> raises an error (e.g. `$.foo.bar on "test" evals as []`).
> *(Davide Bettio)*
>
> > On the whole I think JSONPath is designed to be "forgiving", i.e. such
> things aren't errors, e.g. I think I read in the spec that filtering a
> non-array isn't an error, it's some kind of no-op. That approach isn't
> always best for everyone, but it's important to be consistent.
> *(Michael Kay)*
>
> > I would expect one component of this policy to be:
> >
> > Whether a JSONPath query is valid or not does not depend on the
> arguments it is applied to.
> >
> > I.e., you can look at the query and find out independently, without
> knowing any data, whether it is valid or not.
> *(Carsten Bormann)*
>
> I like and totally agree with the **forgiving mental model**, so having
> only syntax errors, which do not dependent on data.
>
> Thanks
> --
> sg
>
>
> --
> Jsonpath mailing list
> Jsonpath@ietf.org
> https://www.ietf.org/mailman/listinfo/jsonpath
>