Re: [Jsonpath] Some Comments ...

Stefan Gössner <stefan@goessner.net> Sat, 27 February 2021 08:44 UTC

Return-Path: <stefan@goessner.net>
X-Original-To: jsonpath@ietfa.amsl.com
Delivered-To: jsonpath@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8ED663A14B6 for <jsonpath@ietfa.amsl.com>; Sat, 27 Feb 2021 00:44:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id b2bCZBT9q624 for <jsonpath@ietfa.amsl.com>; Sat, 27 Feb 2021 00:44:32 -0800 (PST)
Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8582D3A0B5F for <jsonpath@ietf.org>; Sat, 27 Feb 2021 00:44:31 -0800 (PST)
Received: from [192.168.178.20] ([87.123.195.171]) by mrelayeu.kundenserver.de (mreue009 [212.227.15.167]) with ESMTPSA (Nemesis) id 1N79q8-1lsgJk3Vq0-017SqS for <jsonpath@ietf.org>; Sat, 27 Feb 2021 09:44:28 +0100
From: Stefan Gössner <stefan@goessner.net>
To: jsonpath@ietf.org
References: <98ed1a4f-82fd-3f94-a707-8569f89a5041@goessner.net>
Message-ID: <b028f688-c71a-058b-4948-1a87b4889ffd@goessner.net>
Date: Sat, 27 Feb 2021 09:44:29 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <98ed1a4f-82fd-3f94-a707-8569f89a5041@goessner.net>
Content-Type: multipart/alternative; boundary="------------1BB1DBF874F44B3F8AC6B340"
X-Provags-ID: V03:K1:FOCX5VKsYG0iZHu57JWLconX8PANiWWTXwBsz64VEA8S86e7h2F LOK6XG+Pn3pQmMaSWIBwIMZ1Ws/QDzAffcoUsq0vWi2c1AXItT45J3iugMP3RqlB908SHFa BGvHzIvjMEgniFkJ/4TU9k7sMTl7DWWu/0jSi3ccXyY3uFAY+SQ7TrUidWdHdP62XYt8BsC Ho+MYZqBU0NyC2kHyz8DQ==
X-UI-Out-Filterresults: notjunk:1;V03:K0:BLZyWi9HFNI=:60GTyk2VO4Wo4r8fmh/hGX eXEzfJj6jN9xcOQlrr4bH96x2rcqlR/8dMw+mpzGiWpYVP7sSJVbyV46Z6u1172ydlYhV5rIv tUb6AwhERgt1O9NVZamAQ00BNQboe2yWXeWYoHyEnKu3ylHuJtkN5NvGj45x7XWIh9dWtVFRd jDfUnZt7nXQkBdwA3JdGepGXxc8PR+2bkEw5cS0lsyCDCrY1ChBHt7X1g0XWc2JNrrdQfCaaz N2xmi606MegWV8MxFgvYOExp4ikEQwH8Wn3f2w1s1MhD4pTU6PJtzWFW/GiiV6oEbjLdNpnfe wgkanKLiZtUQK9TPXFQ+NHCsvTjqyEhjl98hf7/B1Ex2eCWlnIkXyVdea+Eb0A78Z+i4GMcIl 5/xKZiwQcA036Bj0sYA/v1YPCwgJarhxsgW8YHB1E12as4CkFWWE0JUYFzj3asLxZJPsG9vpx D+kyE41RJUviPi1cdpaN02t+/OxIVR0=
Archived-At: <https://mailarchive.ietf.org/arch/msg/jsonpath/Pq-BULb9aPiCX2MG_azhryqlxw4>
Subject: Re: [Jsonpath] Some Comments ...
X-BeenThere: jsonpath@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A summary description of the list to be included in the table on this page <jsonpath.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jsonpath/>
List-Post: <mailto:jsonpath@ietf.org>
List-Help: <mailto:jsonpath-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 08:44:36 -0000

I copied these comments forward to Github ... for better reading and 
further commenting.

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-jsonpath/issues/54

I would also like to try out that cool Github discussion tab, Daniel 
wrote about.

thanks
--
sg


Am 26.02.2021 um 19:30 schrieb Stefan Gössner:
> Hello List,
>
> It has been important to go through this list threads carefully. In 
> fact I should have done that at first. Now I can understand the 
> current draft and appreciate the work already done much better.
>
> I collected some citations (important from my point of view) with 
> comments already in Markdown.
>
>
> ## Title of the specification
>
> > JSONPath: A query language for JSON data.
> *(Carsten Bormann)*
>
> > I think I’d slightly prefer the term “syntax” to “language” because 
> “query language” has a smell of various things that end with the 
> letters “Q” and “L”.  But not passionate about that.
> *(Tim Bray)*
>
> > JSONPath: A query syntax for JSON.
> > Another wild-card idea: JSONPath: Query expressions for jSON
> *(Tim Bray)*
>
> > The beauty of this is that the plural form “query expressions” 
> implies a set of expressions, so it implies “language”.  It’s indeed 
> more than the grammar/syntax of those, so why not talk about the 
> expressions as a whole.  This also makes it possible to just use “for 
> JSON”, without going into detail what these query expressions operate on.
> *(Carsten Bormann)*
>
> There seems to be an agreement for "*JSONPath: Query expressions for 
> JSON*". I like that also.
>
> ## Terminology
>
> > My own view is that the terminology should stay consistent with RFC
> 8259, and that the word "object" should not be used for items that are
> not JSON objects in the sense of RFC 8259.
> *(Daniel P)*
>
> > To Carsten's point about what we call things, the number of 
> distinguished
> terms per RFC8259 is pretty small: JSON text, value, object, array, 
> number,
> string.  Having spent quite a bit of time specifying JSON DSLs, I find 
> that
> using just those terms doesn't seem to get in the way or cause 
> problems, so
> I'd argue that we should stick to them (and build up to higher-level
> constructs as required for JSONPath).
> >
> > … oh, and I forgot the very useful "member".
> *(Tim Bray)*
>
> > … and “element” (the things in arrays). *(Carsten Bormann)*
>
> > The problem with JSON value is that it also can be quite confusing 
> due to the usual use of that term.  Pointing to a tree and saying “the 
> values inside that tree” is not going to be felt as equivalent to “the 
> set of all subtrees of that tree, including the tree itself”.  But if 
> JSON value is the only term we have, it has to be.  Hence my 
> preference to talk about data items when I mean the items themselves 
> and not their “value”.
> *(Carsten Bormann)*
>
> > I think the key difficulty is whether each (key, value) pair in an 
> object is "a thing" that can be identified and manipulated and 
> potentially returned. (If we're talking analogies, then it's analogous 
> to an attribute node in the XDM model).
> *(Michael Kay)*
>
> > ECMA-404 uses "name/value pair", which is what I understand the term
> "member" to mean (Douglas Crockford uses "member").
> *(Daniel P)*
>
> > I think the term “union” is poor. If we think of it as concatenation 
> of results, then the result is as expected.
> *(Glyn Normington)*
>
> I understand, that within RFC8259 we have JSON values of different 
> types. They are structured somehow, which is not so much of interest 
> here.
>
> But while querying that structure with JSONPath it is vitally 
> important to identify that hierarchical structure as a tree. So in 
> fact we build up a higher-level construct here. We also need to call 
> "the things" in the tree somehow. I was able to identify
>
> * "node" or "item" of a tree
> * "member" of an object
> * "name/value" or "key/value" pair alias "member"
> * "element" of an array
>
> but could not see an agreement here.
>
> I agree to Glyn calling the term "union" poor (s. below).
>
>
> ## Differentiation from JSON Pointer (JSONPath draft charter)
>
> > I anticipate being asked "Why is JSON Pointer not sufficient?" 
> Indeed its abstract says:
> >
> >   JSON Pointer defines a string syntax for identifying a specific 
> value within a JavaScript Object Notation (JSON) document.
> >
> >... which sounds awfully similar.  If we could include a sentence about
> that, or a link to an answer, that might be helpful.
> *(Murray S. Kucherawy)*
>
> > No - it's not similar in concept, they're separate things. If you 
> really wanted to mention JSON Pointer, you could say something like 
> "Note that while JSON Pointer (RFC xxxx) is already standardised, it 
> is designed to provide a reference to a single, specific part of a 
> JSON document, whereas JSONPath provides the ability to query a 
> document and potentially return multiple values."
> *(Mark Nottingham)*
>
> >The short answer is that JSON pointer is good if you already know the 
> structure of the JSON data item you want to point into, and you want 
> to point to exactly one position in there.  If you need to do 
> something that is closer to a “search” (which might also result in 
> multiple positions), JSONPath gives you more rope.
> *(Carsten Bormann)*
>
> +1
>
> ## References to XPath
>
> > I wonder if the analogies between XPath and JSONPath are going to be 
> helpful, or whether they're actually dangerous by implying 
> equivalences between constructs that are in fact somewhat different?
> *(Michael Kay)*
>
> > I tend to agree. Although JSONPath was inspired by XPath, I wouldn't
> want to confuse the JSONPath spec by going into detailed comparisons at
> the risk of contradicting the normative text.
> *(Glyn Normington)*
>
> > Someone on StackOverflow today asked a question about JSONPath; they 
> called it (and tagged it) XPath, we really don't want that kind of 
> confusion.
> >
> > In addition, the reference to the XPath specification in 6.2 is out 
> of date, and the comparison with XPath in Table 2 is very approximate 
> and the terminology inaccurate: for example there is a mention of 
> "node sets", which exist in XPath 1.0 but not in XPath 2.0, yet the 
> citation is to XPath 2.0. For someone who knows the semantics of XPath 
> the comparison raises all sorts of questions about sorting of results 
> into document order, elimination of duplicates etc, which are 
> complications this spec can well do without. (Though some answers are 
> needed, for example if ..store..price matches the same price in more 
> than one way, do you get more than one result? And if not, what does 
> "the same price" actually mean?)
> *(Michael Kay)*
>
> It seemed to be important in 2007, while argumenting to have something 
> like XPath for JSON. If nowadays the terminology used has changed 
> significantly with XPath 2.0 and 3.0, we better leave that comparison 
> table 2 out. I am quite passionless here.
>
> ## Array Slice Operator
>
> > Thanks! The ABNF for an array slice in that reference
> > ```
> > integer = [%x2D] (%x30 / (%x31-39 *%x30-39))
> >
> > array-slice = [ integer ] ws %x3A ws [ integer ]
> >                     [ ws %x3A ws [ integer ] ]
> >                              ; start:end or start:end:step
> > ```
> > is consistent with JMESPath, Python, and my understanding of
> ECMASCRIPT 4.
> > *(Daniel P)*
>
> > Did anyone else have an opinion on the behaviour of slices such as 
> [::0]?
> The current draft allows this and says it returns an empty array, but 
> there
> is good reason to say it should error so that the slice operation is then
> consistent with Python slicing. See below for more context.
> *(Glyn Normington)*
>
> It's good having read this thread and thus understand the current 
> draft much better. I like the decision to be consistent with Python 
> and also getting an empty selection set with `step=0`.
>
> FYI: there is a recent proposal for adding slice notation syntax to 
> JavaScript, currently at stage 1 of the TC39 process.
>
> https://github.com/tc39/proposal-slice-notation
>
> Interestingly it won't have a step argument ...
>
> https://github.com/tc39/proposal-slice-notation#why-doesnt-this-include-a-step-argument-like-python-does 
>
>
> ... because of syntax collision with the new `this-binding` syntax 
> proposal `::`
>
> https://github.com/tc39/proposal-bind-operator
>
> However, we should not let us influence by this.
>
> ## Unions
>
> > I don't think any implementation would remove duplicates from a path
> such as `"$.store.book"`. I believe this is only somewhat controversial
> in the context of unions [,]. The name "union" suggests that distinct
> values be returned, compare with SQL unions. But Stefan Goessner's
> implementation doesn't do that, it concatenates all results that meet
> each criteria. There are a few JSONPath implementations that produce
> real unions with no duplicates instead of concatenated results, but I
> don't think that's the consensus.
> *(Daniel P)*
>
> > I think the term “union” is poor. If we think of it as concatenation 
> of results, then the result is as expected.
> *(Glyn Normington)*
>
> > I agree with that comment, but it's partly because I'm used to SQL 
> UNION,
> which is different. I prefer the JMESPath term for an analogous 
> construct,
> MultiSelect List, 
> https://jmespath.org/specification.html#multiselect-list.
> *(Daniel P)*
>
> Introducing the union operator `[,]` simply was meant an analogon to 
> XPath's operator `'|'`. I cannot tell, if it was a simple combination 
> of node sets in Xpath 1.0 or a true union without duplicates. I 
> obviously was not aware of that subtle (essential ?) union 
> characteristic.
>
> So I fully agree to Glyn Normington's '... the term “union” is poor' 
> statement. Are there some better alternative terms, perhaps 
> 'multi-index operator', 'index list', 'subscript list', etc.?
>
> ## Duplicates and Ordering
>
> > It was my impression that we were talking about duplicated nodes not
> duplicated values:
> >
> > Given th array [10,20,30]
> >
> > $..[0,1,0]
> >
> > Would yield only two results [10, 20]
> >
> > (Not that I'm advocating for removing duplicates, personally I think we
> shouldn't)
> *(Marko Mikulicic)*
>
> > You’re framing this as “removing duplicates”.
> Another view is that [10, 20, 10] would be “adding duplicates” (copies 
> of the same node). Related are ordering issues:
> >
> > `$..[1,0] ➔ [20, 10] Or [10, 20]`
> >
> > I would expect the spec will leaves implementations some leeway 
> here, but that should be based on an examination of existing 
> implementations.
> *(Carsten Bormann)*
>
> > The mental model that leads to omitting duplicate nodes in the 
> output is
> "selection": if you take an input array and select nodes with index 
> 0,1 or
> 0, you get only 2 results (since selecting an index twice has no effect).
> >
> >OTOH, if you opt for a "collect" model, whenever you encounter a node 
> that
> matches that query you add it to the result stream, thus the same 
> nodes can
> be present multiple times in the result.
> >
> >I have a slight preference for the "collect" model, because the general
> case in jsonpath is to collect things that appear at various points in 
> the
> json tree. For example:
> >
> >`{"a": {"b": 1, "c": 2}, "d": 3},  $.a.b yields [1] and not 
> {"a":{"b":1}}`
> >
> >(i.e. jsonpath is not a filter and view operation but a pick and gather
> operation)
> *(Marko Mikulicic)*
>
> > In implementations that support paths (the majority don't), the query
> function takes a parameter that indicates values or paths. In both
> cases the query returns a JSON array of JSON values, in the latter
> case, a JSON array of normalized paths.
> *(Daniel P)*
>
> I must confess to never having thought about duplicates, let alone 
> wanting to eliminate them. So I do like Marko's comparison of 
> 'selection-model' vs. 'collection-model' a lot. I would opt for the 
> latter. In this sense the result of a 'JSONPath query expression' 
> should be termed a 'collection'.
>
> Regarding ordering I see something like a 'natural ordering', 
> according to which
>
> `$..[0,1] ➔ [10, 20]`
> `$..[1,0] ➔ [20, 10]`
>
> would result with the example above.
>
> I do understand the use cases for reordering, duplicates removal, 
> filtering, etc.. But this can always be seen as a postprocessing step 
> on the resulting collection by handing it over to accompanying tools 
> (think of pipe operator).
>
> Of course this cannot work on the result collection of values alone 
> (s. duplicate nodes vs. duplicate values above), it rather requires a 
> collection of (normalized) pathes. In this sense, I like this view:
>
> > In my opinion the right balance between powerfulness and enabling
> simple implementations has been so far one of the key factors that
> made JSONPath popular over other alternatives, even if it lacks
> support for aggregation functions.
> *(Davide Bettio)*
>
> ## Filter Expressions
>
> > Related to that, it would be helpful to determine if JSONPath filters
> apply to both JSON objects and arrays, or only to JSON arrays.
> *(Daniel P)*
>
> > I would support restricting filters to arrays, if others agree.
> *(Glyn Normington)*
>
> I tend to let implementations and their "normative force of the 
> factual" decide here or in doubt agree to Glyn's restriction to arrays.
>
> I am very unhappy with confusing `$..book[(@.length-1)]`, where `'@'` 
> addresses the array itself and implies that array has a `length` 
> property. In filter expression examples `'@'` more consistently 
> addresses the current array element.
>
> The invocation of 'the underlying scripting engine' wasn't meant a 
> serious normative aspect, but rather a quick and dirty solution for 
> JavaScript and PHP implementations at that time.
>
>
> ### Corner Case
>
> > Consider this perfectly legal JSON object
> >
> > ```{ "ab": 0,  "'a.b": 1,  "a-b": 2, "a": { "b": 3 } }```
> >
> >So `$.ab` is 0, `$.a.b` is 3, `$['a.b']` is 1, `$['a-b']` is 2. You'd 
> like to say `$.a-b` but lots of libraries will refuse it because 
> `"a-b"` is not a legal JavaScript "name" construct, that's why you 
> have to say `$['a-b']`.
> >
> > But suppose your library would accept `$.a-b`.  Then `$.a-b` and 
> `$['a-b']` would be synonyms, but `$.a.b` and `$['a.b']` wouldn't.
> *(Tim Bray)*
>
> Hmm ... this seems to be a hint to better exclude `'-'` from 
> dot-child-selector syntax. I think I have read more discussion about 
> that, currently don't know where.
>
> ## Respect Implementations
>
> > As I mentioned in the session, I think there's a non-trivial amount 
> of risk here that some implementations won't be willing or able to 
> move away from their current behaviours, even if interoperability 
> would improve if they did so. However, there are ways to mitigate that 
> (e.g., a separate 'rfcxxxx compliant' mode). Even so, it will be 
> important to get good participation from as many current implementers 
> as possible.
> *(Mark Nottingham)*
>
> > The WG will develop a standards-track JSONPath specification that
> is technically sound and complete, based on the common semantics
> and other aspects of existing implementations.  Where there are
> differences, the working group will analyze those differences and
> make choices that rough consensus considers technically best, with
> an aim toward minimizing disruption among the different JSONPath
> implementations.
> *(Barry Leiba)*
>
> > I'm OK with this, but for context: I've been a pretty intense 
> JSONPath user
> in recent years, and AFAIK the spec, and the implementations, are mostly
> OK, so the choice between "make JSONPath good" and "don't invalidate
> implementations" is unlikely to come up. If it did, my predisposition 
> would
> be to err on the side of not breaking implementations, but I don't think
> that's inconsistent with Barry's text.
> *(Tim Bray)*
>
> +1 to all.
>
> ## Error Handling
>
> > My mental model at the moment is that a JSONPath expression can be 
> valid or erroneous; application of a valid expression yields a result 
> (which may be empty), but does not raise errors. That may not be the 
> right model for all applications.
> *(Carsten Bormann)*
>
> > The  general approach that I've seen several times (including my
> Elixir implementation) is that an error is raised when there is a
> syntax error, therefore an invalid expression (e.g. $.foo[[5]) raises
> an error. Conversely a valid expression applied to a bogus input never
> raises an error (e.g. `$.foo.bar on "test" evals as []`).
> *(Davide Bettio)*
>
> > On the whole I think JSONPath is designed to be "forgiving", i.e. 
> such things aren't errors, e.g. I think I read in the spec that 
> filtering a non-array isn't an error, it's some kind of no-op. That 
> approach isn't always best for everyone, but it's important to be 
> consistent.
> *(Michael Kay)*
>
> > I would expect one component of this policy to be:
> >
> > Whether a JSONPath query is valid or not does not depend on the 
> arguments it is applied to.
> >
> > I.e., you can look at the query and find out independently, without 
> knowing any data, whether it is valid or not.
> *(Carsten Bormann)*
>
> I like and totally agree with the *forgiving mental model*, so having  
> only syntax errors, which do not dependent on data.
>
> Thanks
> -- 
> sg