Re: [Json] JSON Content Rules

Pete Cordell <petejson@codalogic.com> Wed, 28 February 2018 16:02 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DC52B12EB12 for <json@ietfa.amsl.com>; Wed, 28 Feb 2018 08:02:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, LOTS_OF_MONEY=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CXyFR48KZ6Ei for <json@ietfa.amsl.com>; Wed, 28 Feb 2018 08:02:52 -0800 (PST)
Received: from ppsa-online.com (ppsa-online.com [217.199.162.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3CD9E12EB28 for <json@ietf.org>; Wed, 28 Feb 2018 08:02:51 -0800 (PST)
Received: (qmail 22162 invoked from network); 28 Feb 2018 15:52:48 +0000
Received: from host109-157-202-75.range109-157.btcentralplus.com (HELO ?192.168.1.72?) (109.157.202.75) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (DHE-RSA-AES128-SHA encrypted, authenticated); 28 Feb 2018 15:52:48 +0000
To: Daniel P <danielaparker@gmail.com>
Cc: JSON WG <json@ietf.org>
References: <CA+mwktJU4xVHxRzgd=dcCKvUv3Om3qeBEhqTaW2sniLQ95+QDA@mail.gmail.com> <a36fc644-d3be-201e-b044-ed371fe7e52b@codalogic.com> <CA+mwktJ-YZBGExPeCTxCcwo6F1Ln5ZaDajRMOnm=RimUxnFqnQ@mail.gmail.com> <dbca1021-72ed-8c5f-7849-33f12bc420eb@codalogic.com>
From: Pete Cordell <petejson@codalogic.com>
Message-ID: <068ce4bd-4d12-1724-5b2d-8ee0f761bd70@codalogic.com>
Date: Wed, 28 Feb 2018 16:02:46 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <dbca1021-72ed-8c5f-7849-33f12bc420eb@codalogic.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-GB
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/XbjjJsDCTa8lDabUs5nYW5e63Bw>
Subject: Re: [Json] JSON Content Rules
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Feb 2018 16:02:56 -0000

Quick update on this issue...

Andy and I discussed this, and we've gone along with the suggestion to 
not need the distinct syntax.  In other words, to allow:

    $string-rule = "a string"
    $member-rule = "name" : $string-rule

In my code, I check after reading a string or regular expression not 
preceded by a member name to see if it is followed by a colon.  If it 
is, it's a member-name, otherwise it's a string type.  This isn't 
obvious in the ABNF, but does the job.

That's about the only significant 'gotcha' in JCR.  I'm not sure if it's 
legitimate to capture such insights in an RFC, or it's just left up to 
implementers to re-discover it via testing etc.

Thanks for everyone's input on this.

Pete Cordell
Codalogic Ltd
Read & write XML in C++, http://www.xml2cpp.com

On 24/01/2018 16:09, Pete Cordell wrote:
> Hi Daniel,
> 
> On 24/01/2018 14:26, Daniel P wrote:
>> A problem with both the original specification and the suggested 
>> revisions is that they both involve adding inconsistent bits of syntax 
>> to resolve ambiguities, "=" if such and such and "=": if so and so, or 
>> a double quote if such and such and a single one if so and so. A 
>> user's mind rebels against that.
>>
>> I don't understand why it's necessary to have a distinction between 
>> name rule specifications and value rule specifications at all. The 
>> user may express something impossible by writing
>>
>> { $bar : $foo }
>>
>> $bar = [ integer, integer, integer ]
>> $foo = "foo"
> 
> It looks like there is a misunderstanding here (possibly of minor 
> consequence, but worth clearing up).
> 
> JCR works in terms of rules.  So `$bar = [ integer, integer, integer ]` 
> is a named rule (called 'bar' for those less familiar with JCR).  It 
> happens to be a value rule.  There are also member rules, which consist 
> of a member name and a value rule.
> 
> JCR doesn't do macro substitution.  So the `$name = ...` syntax is not 
> specifying a macro that can be dropped into anywhere.  So, currently, 
> you can't do `{ $bar : $foo }` in JCR.
> 
> Personally, I don't see this as a problem.  I can see many cases where 
> types need to be specified independent of the member names that use 
> them.  But I don't think there are many practical cases where a member 
> name needs to be specified independent of its type.
> 
> That said, that clarification probably doesn't change your main point.
> 
>> but who cares? it's going to get caught.
> 
> My background is C++.  One aspect of that language is that it's hard to 
> parse correctly.  The consequence of that is that the tooling lacks well 
> behind many other languages that are easier to parse.  I don't want that 
> to be the case for JCR.  My theory is that, the easier it is to parse, 
> the better the tools will be, with more consistency etc.
> 
> Also, I want to make it easy to produce accurate, helpful error reports 
> that will reduce the time developers need to work out what's wrong with 
> their JCR.
> 
> The structure of a rule can be pedantic, and with them parsers have a 
> much better chance of saying "your error is here".  Using a macros 
> approach potentially means that the errors are detected well away from 
> their source, which makes them harder to give helpful error messages 
> for, and harder to fix.
> 
> Normally I would push any complexities into the tool, and relieve the 
> user of them as much as possible.  But I don't see JCR as a 
> multi-million dollar industry.  More of a cottage industry akin to 
> Bill's ABNF parser (https://tools.ietf.org/tools/bap/abnf.cgi).  So it's 
> a balance.  Given the choice of better tools, and remembering to use 
> single quotes; or worse tooling and not having to remember to use single 
> quotes, I think developers would go for the former.
> 
> Single quoted strings also helps the human developer.  They can look at 
> 'foo' and know it's a value.  They don't have to look into the context 
> of where it's used to work out whether it's a name or value.
> 
> I hope that help,
> 
> Pete.
> http://www.xml2cpp.com
> 
> ---------------- old stuff for possible context -----------------------
>>
>> On Wed, Jan 24, 2018 at 7:19 AM, Pete Cordell <petejson@codalogic.com 
>> <mailto:petejson@codalogic.com>> wrote:
>>
>>     Hi Daniel,
>>
>>     As Andy said, it's been bugging us too.
>>
>>     The fundamental problem is (was!) that a quoted string (e.g. "foo")
>>     can either be a member name or a string value.  Also, a regex (e.g.
>>     /bar/) could either be a member name regex or a value regex.
>>
>>     The ambiguity could potentially be resolved with back tracking in
>>     the parser, but I've wanted to try to make JCR as simple to parse as
>>     possible in the hope that this will facilitate its adoption.  (I
>>     think what I've been promoting is called an LR(1) grammar.)  Hence
>>     the approach of adding the extra ':' or 'type' token to do the
>>     disambiguation.
>>
>>     The new change linked to by Andy at
>>     https://github.com/arineng/jcrvalidator/issues/112
>>     <https://github.com/arineng/jcrvalidator/issues/112> essentially
>>     disambiguates this by adopting slightly different syntax for the
>>     various uses.  So we've ended up with:
>>
>>         "foo" - A member name (but see caveat below)
>>         'bar' - A string value (new syntax, but also see below)
>>         /^baz\d+$/ - A regex value
>>         `biff\d+` - A member name regex (new syntax)
>>
>>     Now the various usages are unambiguous, and we don't need the colon
>>     to differentiate whether a quoted string is a member name or 
>> value, etc.
>>
>>     Now to the caveat...
>>
>>     We wanted to still allow JCR to be a superset of JSON syntax (to
>>     facilitate easy creation of JCR from example JSON).  So we wanted to
>>     allow:
>>
>>          { "name" : "Fred" }
>>
>>     Hence, in situations where the parser has read a member name and
>>     colon, and knows that what follows is a value, a string value can
>>     either be single quoted or double quoted; 'Fred' or "Fred".
>>
>>     The result is that the following are all valid:
>>
>>          $r1 = "name" : "Fred" ; A member rule
>>          $r2 = "name" : 'Fred' ; Also a member rule
>>          $r3 = 'Fred'          ; A string value
>>          $r4 = /^Fred/'        ; A regex value
>>          $r5 = `p[0-9]` : integer ; Member rule with regex name
>>          $r6 = string          ; The big win - Colon no longer needed
>>
>>     The following would be an error:
>>
>>          $e1 = "Fred" $e2 = string
>>
>>     because when parsing $e1 and seeing "Fred", the parsing would
>>     interpret "Fred" as a member name, and therefore complain when
>>     encountering $e2 without first seeing a colon.  (You'd have to do
>>     the following instead: $e1 = 'Fred' $e2 = string)
>>
>>     Personally I think the revision is a lot neater than what we had,
>>     and I hope it's not too difficult for developers to grok.  I look
>>     forward to your comments.
>>
>>     Thanks,
>>
>>     Pete.
>>     --     
>> ---------------------------------------------------------------------
>>     Pete Cordell
>>     http://www.xml2cpp.com
>>     ---------------------------------------------------------------------
>>     On 23/01/2018 22:13, Daniel P wrote:
>>
>>         Hello everyone,
>>
>>         I would like to solicit feedback from members of this forum on
>>         one feature of the JSON Content Rules specification, draft 09
>>         
>> <https://datatracker.ietf.org/doc/draft-newton-json-content-rules/?include_text=1 
>>
>>         
>> <https://datatracker.ietf.org/doc/draft-newton-json-content-rules/?include_text=1>>, 
>>
>>         as I'm considering to build an implementation.
>>
>>         The specification states: "There are two forms of rule name
>>         assignments: assignments of  primitive types and assignments of
>>         all other types.  Rule name assignments to primitive type
>>         specifications [e.g. string] separate the rule name from the
>>         type specification with the character sequence '=:', whereas
>>           rule name assignments for all other type specifications [e.g.
>>         array] only require the separation using the '=' character ...
>>         This syntax is necessary so  that JCR parsers may readily
>>         distinguish between rule name assignments involving string and
>>         regular expressions primitive types and member names of member
>>         specifications."
>>
>>         An example (I hope I have this right):
>>
>>         { $bar-name : $bar-val, "foo" : $foo-val }
>>
>>         ; member name specification
>>         $bar-name = /^bar[0-9]$/
>>
>>         ; primitive type specification
>>         $foo-val =: "foo"
>>
>>         ; non primitive type specification
>>         $bar-val = [ integer, integer, integer ]
>>
>>         In what otherwise appears to me to be a fairly clean
>>         specification, I'm having some difficulty digesting this syntax,
>>         with "=" if such and such, and "=:" if so and so. I would be
>>         interested if anyone else on this list has any thoughts about 
>> this.
>>
>>         Thanks,
>>         Daniel Parker
>>         https://github.com/danielaparker/jsoncons
>>         <https://github.com/danielaparker/jsoncons>
>>
>>
>>
>>         _______________________________________________
>>         json mailing list
>>         json@ietf.org <mailto:json@ietf.org>
>>         https://www.ietf.org/mailman/listinfo/json
>>         <https://www.ietf.org/mailman/listinfo/json>
>>
>>
> 
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json