[Jsonpath] Some Comments ...

Stefan Gössner <stefan@goessner.net> Fri, 26 February 2021 18:30 UTC

Return-Path: <stefan@goessner.net>
X-Original-To: jsonpath@ietfa.amsl.com
Delivered-To: jsonpath@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BC4AD3A146F for <jsonpath@ietfa.amsl.com>; Fri, 26 Feb 2021 10:30:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h6pZJ5Ob6-LU for <jsonpath@ietfa.amsl.com>; Fri, 26 Feb 2021 10:30:33 -0800 (PST)
Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 369583A146B for <jsonpath@ietf.org>; Fri, 26 Feb 2021 10:30:32 -0800 (PST)
Received: from [192.168.178.20] ([88.130.51.46]) by mrelayeu.kundenserver.de (mreue010 [212.227.15.167]) with ESMTPSA (Nemesis) id 1MA7X0-1l5DEi25D0-00BbHm for <jsonpath@ietf.org>; Fri, 26 Feb 2021 19:30:30 +0100
To: jsonpath@ietf.org
From: Stefan Gössner <stefan@goessner.net>
Message-ID: <98ed1a4f-82fd-3f94-a707-8569f89a5041@goessner.net>
Date: Fri, 26 Feb 2021 19:30:30 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K1:YQpAomK8k0LcGvRCSLR3obdI9ot/PY9TunUt2uVQkjC3mCTBm36 x7Oczd9tUgZZVxJl5F2YNtQEPURO9/Pkvi3ySZk6FYYWnMXCPXmlYvQxhbzRspJ2VopFlbr qbnuO1y4SOx8DmPpuwxwg6Tm2tuDB9OyNAR3LU7+hm9pm5Qh0Gn2voWh301lVoyhPsoDGsG 4KotxsIK7+3rSlukkebsw==
X-UI-Out-Filterresults: notjunk:1;V03:K0:DJINDhNhw1s=:PvgV6EbVIvrp+o+mZwxlXe k9PCg9fQ3szlsC1gctdmpQwgtYd8yLzf20JvtmTGD4yhY9dllDDWSNcJriQcXXJXMAWp7Yo0F 7PynC5g7KfM185E36A2hjRda7TyopjTii5my06OPc+EMdNXZ7oSOSPLfoA0YbQCwxd4fkKg/a 5zexG0MaClJ5jugB29qQT2t6PCTm+nf0hUMBocxSc8TJAV6hoyFHQ3R/Nnh6yg5G+Kmfvr5Ss 9JjBtDrEVeNhcIYUsQaSR79hxsqZdGOla2VO4SorPKcVRz53h0ik+ETQ0WMVC4BrgtVn4giZv fhZQinu3TQCH6IbHeuEaO47lJR7ogBIaGBGN7Yq6d798obcTrh6z3SeLfp9xdKsEoGekmi0x6 MlMClpSOZI/wOLA2XV6n0NAcDqkSjC6jOvQjIfssqThPG8FDh00ERgL3bU1WTxLKgepNOWGos e8ymtSYhXg==
Archived-At: <https://mailarchive.ietf.org/arch/msg/jsonpath/N_qme3oEOze6knlpS1jtJBcAG6s>
Subject: [Jsonpath] Some Comments ...
X-BeenThere: jsonpath@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A summary description of the list to be included in the table on this page <jsonpath.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jsonpath/>
List-Post: <mailto:jsonpath@ietf.org>
List-Help: <mailto:jsonpath-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Feb 2021 18:30:37 -0000

Hello List,

It has been important to go through this list threads carefully. In fact 
I should have done that at first. Now I can understand the current draft 
and appreciate the work already done much better.

I collected some citations (important from my point of view) with 
comments already in Markdown.


## Title of the specification

 > JSONPath: A query language for JSON data.
*(Carsten Bormann)*

 > I think I’d slightly prefer the term “syntax” to “language” because 
“query language” has a smell of various things that end with the letters 
“Q” and “L”.  But not passionate about that.
*(Tim Bray)*

 > JSONPath: A query syntax for JSON.
 > Another wild-card idea: JSONPath: Query expressions for jSON
*(Tim Bray)*

 > The beauty of this is that the plural form “query expressions” 
implies a set of expressions, so it implies “language”.  It’s indeed 
more than the grammar/syntax of those, so why not talk about the 
expressions as a whole.  This also makes it possible to just use “for 
JSON”, without going into detail what these query expressions operate on.
*(Carsten Bormann)*

There seems to be an agreement for "*JSONPath: Query expressions for 
JSON*". I like that also.

## Terminology

 > My own view is that the terminology should stay consistent with RFC
8259, and that the word "object" should not be used for items that are
not JSON objects in the sense of RFC 8259.
*(Daniel P)*

 > To Carsten's point about what we call things, the number of distinguished
terms per RFC8259 is pretty small: JSON text, value, object, array, number,
string.  Having spent quite a bit of time specifying JSON DSLs, I find that
using just those terms doesn't seem to get in the way or cause problems, so
I'd argue that we should stick to them (and build up to higher-level
constructs as required for JSONPath).
 >
 > … oh, and I forgot the very useful "member".
*(Tim Bray)*

 > … and “element” (the things in arrays). *(Carsten Bormann)*

 > The problem with JSON value is that it also can be quite confusing 
due to the usual use of that term.  Pointing to a tree and saying “the 
values inside that tree” is not going to be felt as equivalent to “the 
set of all subtrees of that tree, including the tree itself”.  But if 
JSON value is the only term we have, it has to be.  Hence my preference 
to talk about data items when I mean the items themselves and not their 
“value”.
*(Carsten Bormann)*

 > I think the key difficulty is whether each (key, value) pair in an 
object is "a thing" that can be identified and manipulated and 
potentially returned. (If we're talking analogies, then it's analogous 
to an attribute node in the XDM model).
*(Michael Kay)*

 > ECMA-404 uses "name/value pair", which is what I understand the term
"member" to mean (Douglas Crockford uses "member").
*(Daniel P)*

 > I think the term “union” is poor. If we think of it as concatenation 
of results, then the result is as expected.
*(Glyn Normington)*

I understand, that within RFC8259 we have JSON values of different 
types. They are structured somehow, which is not so much of interest here.

But while querying that structure with JSONPath it is vitally important 
to identify that hierarchical structure as a tree. So in fact we build 
up a higher-level construct here. We also need to call "the things" in 
the tree somehow. I was able to identify

* "node" or "item" of a tree
* "member" of an object
* "name/value" or "key/value" pair alias "member"
* "element" of an array

but could not see an agreement here.

I agree to Glyn calling the term "union" poor (s. below).


## Differentiation from JSON Pointer (JSONPath draft charter)

 > I anticipate being asked "Why is JSON Pointer not sufficient?" Indeed 
its abstract says:
 >
 >   JSON Pointer defines a string syntax for identifying a specific 
value within a JavaScript Object Notation (JSON) document.
 >
 >... which sounds awfully similar.  If we could include a sentence about
that, or a link to an answer, that might be helpful.
*(Murray S. Kucherawy)*

 > No - it's not similar in concept, they're separate things. If you 
really wanted to mention JSON Pointer, you could say something like 
"Note that while JSON Pointer (RFC xxxx) is already standardised, it is 
designed to provide a reference to a single, specific part of a JSON 
document, whereas JSONPath provides the ability to query a document and 
potentially return multiple values."
*(Mark Nottingham)*

 >The short answer is that JSON pointer is good if you already know the 
structure of the JSON data item you want to point into, and you want to 
point to exactly one position in there.  If you need to do something 
that is closer to a “search” (which might also result in multiple 
positions), JSONPath gives you more rope.
*(Carsten Bormann)*

+1

## References to XPath

 > I wonder if the analogies between XPath and JSONPath are going to be 
helpful, or whether they're actually dangerous by implying equivalences 
between constructs that are in fact somewhat different?
*(Michael Kay)*

 > I tend to agree. Although JSONPath was inspired by XPath, I wouldn't
want to confuse the JSONPath spec by going into detailed comparisons at
the risk of contradicting the normative text.
*(Glyn Normington)*

 > Someone on StackOverflow today asked a question about JSONPath; they 
called it (and tagged it) XPath, we really don't want that kind of 
confusion.
 >
 > In addition, the reference to the XPath specification in 6.2 is out 
of date, and the comparison with XPath in Table 2 is very approximate 
and the terminology inaccurate: for example there is a mention of "node 
sets", which exist in XPath 1.0 but not in XPath 2.0, yet the citation 
is to XPath 2.0. For someone who knows the semantics of XPath the 
comparison raises all sorts of questions about sorting of results into 
document order, elimination of duplicates etc, which are complications 
this spec can well do without. (Though some answers are needed, for 
example if ..store..price matches the same price in more than one way, 
do you get more than one result? And if not, what does "the same price" 
actually mean?)
*(Michael Kay)*

It seemed to be important in 2007, while argumenting to have something 
like XPath for JSON. If nowadays the terminology used has changed 
significantly with XPath 2.0 and 3.0, we better leave that comparison 
table 2 out. I am quite passionless here.

## Array Slice Operator

 > Thanks! The ABNF for an array slice in that reference
 > ```
 > integer = [%x2D] (%x30 / (%x31-39 *%x30-39))
 >
 > array-slice = [ integer ] ws %x3A ws [ integer ]
 >                     [ ws %x3A ws [ integer ] ]
 >                              ; start:end or start:end:step
 > ```
 > is consistent with JMESPath, Python, and my understanding of
ECMASCRIPT 4.
 > *(Daniel P)*

 > Did anyone else have an opinion on the behaviour of slices such as [::0]?
The current draft allows this and says it returns an empty array, but there
is good reason to say it should error so that the slice operation is then
consistent with Python slicing. See below for more context.
*(Glyn Normington)*

It's good having read this thread and thus understand the current draft 
much better. I like the decision to be consistent with Python and also 
getting an empty selection set with `step=0`.

FYI: there is a recent proposal for adding slice notation syntax to 
JavaScript, currently at stage 1 of the TC39 process.

https://github.com/tc39/proposal-slice-notation

Interestingly it won't have a step argument ...

https://github.com/tc39/proposal-slice-notation#why-doesnt-this-include-a-step-argument-like-python-does

... because of syntax collision with the new `this-binding` syntax 
proposal `::`

https://github.com/tc39/proposal-bind-operator

However, we should not let us influence by this.

## Unions

 > I don't think any implementation would remove duplicates from a path
such as `"$.store.book"`. I believe this is only somewhat controversial
in the context of unions [,]. The name "union" suggests that distinct
values be returned, compare with SQL unions. But Stefan Goessner's
implementation doesn't do that, it concatenates all results that meet
each criteria. There are a few JSONPath implementations that produce
real unions with no duplicates instead of concatenated results, but I
don't think that's the consensus.
*(Daniel P)*

 > I think the term “union” is poor. If we think of it as concatenation 
of results, then the result is as expected.
*(Glyn Normington)*

 > I agree with that comment, but it's partly because I'm used to SQL UNION,
which is different. I prefer the JMESPath term for an analogous construct,
MultiSelect List, https://jmespath.org/specification.html#multiselect-list.
*(Daniel P)*

Introducing the union operator `[,]` simply was meant an analogon to 
XPath's operator `'|'`. I cannot tell, if it was a simple combination of 
node sets in Xpath 1.0 or a true union without duplicates. I obviously 
was not aware of that subtle (essential ?) union characteristic.

So I fully agree to Glyn Normington's '... the term “union” is poor' 
statement. Are there some better alternative terms, perhaps 'multi-index 
operator', 'index list', 'subscript list', etc.?

## Duplicates and Ordering

 > It was my impression that we were talking about duplicated nodes not
duplicated values:
 >
 > Given th array [10,20,30]
 >
 > $..[0,1,0]
 >
 > Would yield only two results [10, 20]
 >
 > (Not that I'm advocating for removing duplicates, personally I think we
shouldn't)
*(Marko Mikulicic)*

 > You’re framing this as “removing duplicates”.
Another view is that [10, 20, 10] would be “adding duplicates” (copies 
of the same node). Related are ordering issues:
 >
 > `$..[1,0] ➔ [20, 10] Or [10, 20]`
 >
 > I would expect the spec will leaves implementations some leeway here, 
but that should be based on an examination of existing implementations.
*(Carsten Bormann)*

 > The mental model that leads to omitting duplicate nodes in the output is
"selection": if you take an input array and select nodes with index 0,1 or
0, you get only 2 results (since selecting an index twice has no effect).
 >
 >OTOH, if you opt for a "collect" model, whenever you encounter a node that
matches that query you add it to the result stream, thus the same nodes can
be present multiple times in the result.
 >
 >I have a slight preference for the "collect" model, because the general
case in jsonpath is to collect things that appear at various points in the
json tree. For example:
 >
 >`{"a": {"b": 1, "c": 2}, "d": 3},  $.a.b yields [1] and not 
{"a":{"b":1}}`
 >
 >(i.e. jsonpath is not a filter and view operation but a pick and gather
operation)
*(Marko Mikulicic)*

 > In implementations that support paths (the majority don't), the query
function takes a parameter that indicates values or paths. In both
cases the query returns a JSON array of JSON values, in the latter
case, a JSON array of normalized paths.
*(Daniel P)*

I must confess to never having thought about duplicates, let alone 
wanting to eliminate them. So I do like Marko's comparison of 
'selection-model' vs. 'collection-model' a lot. I would opt for the 
latter. In this sense the result of a 'JSONPath query expression' should 
be termed a 'collection'.

Regarding ordering I see something like a 'natural ordering', according 
to which

`$..[0,1] ➔ [10, 20]`
`$..[1,0] ➔ [20, 10]`

would result with the example above.

I do understand the use cases for reordering, duplicates removal, 
filtering, etc.. But this can always be seen as a postprocessing step on 
the resulting collection by handing it over to accompanying tools (think 
of pipe operator).

Of course this cannot work on the result collection of values alone (s. 
duplicate nodes vs. duplicate values above), it rather requires a 
collection of (normalized) pathes. In this sense, I like this view:

 > In my opinion the right balance between powerfulness and enabling
simple implementations has been so far one of the key factors that
made JSONPath popular over other alternatives, even if it lacks
support for aggregation functions.
*(Davide Bettio)*

## Filter Expressions

 > Related to that, it would be helpful to determine if JSONPath filters
apply to both JSON objects and arrays, or only to JSON arrays.
*(Daniel P)*

 > I would support restricting filters to arrays, if others agree.
*(Glyn Normington)*

I tend to let implementations and their "normative force of the factual" 
decide here or in doubt agree to Glyn's restriction to arrays.

I am very unhappy with confusing `$..book[(@.length-1)]`, where `'@'` 
addresses the array itself and implies that array has a `length` 
property. In filter expression examples `'@'` more consistently 
addresses the current array element.

The invocation of 'the underlying scripting engine' wasn't meant a 
serious normative aspect, but rather a quick and dirty solution for 
JavaScript and PHP implementations at that time.


### Corner Case

 > Consider this perfectly legal JSON object
 >
 > ```{ "ab": 0,  "'a.b": 1,  "a-b": 2, "a": { "b": 3 } }```
 >
 >So `$.ab` is 0, `$.a.b` is 3, `$['a.b']` is 1, `$['a-b']` is 2. You'd 
like to say `$.a-b` but lots of libraries will refuse it because `"a-b"` 
is not a legal JavaScript "name" construct, that's why you have to say 
`$['a-b']`.
 >
 > But suppose your library would accept `$.a-b`.  Then `$.a-b` and 
`$['a-b']` would be synonyms, but `$.a.b` and `$['a.b']` wouldn't.
*(Tim Bray)*

Hmm ... this seems to be a hint to better exclude `'-'` from 
dot-child-selector syntax. I think I have read more discussion about 
that, currently don't know where.

## Respect Implementations

 > As I mentioned in the session, I think there's a non-trivial amount 
of risk here that some implementations won't be willing or able to move 
away from their current behaviours, even if interoperability would 
improve if they did so. However, there are ways to mitigate that (e.g., 
a separate 'rfcxxxx compliant' mode). Even so, it will be important to 
get good participation from as many current implementers as possible.
*(Mark Nottingham)*

 > The WG will develop a standards-track JSONPath specification that
is technically sound and complete, based on the common semantics
and other aspects of existing implementations.  Where there are
differences, the working group will analyze those differences and
make choices that rough consensus considers technically best, with
an aim toward minimizing disruption among the different JSONPath
implementations.
*(Barry Leiba)*

 > I'm OK with this, but for context: I've been a pretty intense 
JSONPath user
in recent years, and AFAIK the spec, and the implementations, are mostly
OK, so the choice between "make JSONPath good" and "don't invalidate
implementations" is unlikely to come up. If it did, my predisposition would
be to err on the side of not breaking implementations, but I don't think
that's inconsistent with Barry's text.
*(Tim Bray)*

+1 to all.

## Error Handling

 > My mental model at the moment is that a JSONPath expression can be 
valid or erroneous; application of a valid expression yields a result 
(which may be empty), but does not raise errors.  That may not be the 
right model for all applications.
*(Carsten Bormann)*

 > The  general approach that I've seen several times (including my
Elixir implementation) is that an error is raised when there is a
syntax error, therefore an invalid expression (e.g. $.foo[[5]) raises
an error. Conversely a valid expression applied to a bogus input never
raises an error (e.g. `$.foo.bar on "test" evals as []`).
*(Davide Bettio)*

 > On the whole I think JSONPath is designed to be "forgiving", i.e. 
such things aren't errors, e.g. I think I read in the spec that 
filtering a non-array isn't an error, it's some kind of no-op. That 
approach isn't always best for everyone, but it's important to be 
consistent.
*(Michael Kay)*

 > I would expect one component of this policy to be:
 >
 > Whether a JSONPath query is valid or not does not depend on the 
arguments it is applied to.
 >
 > I.e., you can look at the query and find out independently, without 
knowing any data, whether it is valid or not.
*(Carsten Bormann)*

I like and totally agree with the *forgiving mental model*, so having  
only syntax errors, which do not dependent on data.

Thanks
--
sg