Re: [Json] JSON Content Rules

Pete Cordell <petejson@codalogic.com> Wed, 24 January 2018 12:19 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 48AA812025C for <json@ietfa.amsl.com>; Wed, 24 Jan 2018 04:19:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nHSLQjwG0NHe for <json@ietfa.amsl.com>; Wed, 24 Jan 2018 04:19:20 -0800 (PST)
Received: from ppsa-online.com (ppsa-online.com [217.199.162.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 171971200F1 for <json@ietf.org>; Wed, 24 Jan 2018 04:19:19 -0800 (PST)
Received: (qmail 10880 invoked from network); 24 Jan 2018 12:09:33 +0000
Received: from host86-137-73-204.range86-137.btcentralplus.com (HELO ?192.168.1.72?) (86.137.73.204) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (DHE-RSA-AES128-SHA encrypted, authenticated); 24 Jan 2018 12:09:33 +0000
To: Daniel P <danielaparker@gmail.com>, JSON WG <json@ietf.org>
References: <CA+mwktJU4xVHxRzgd=dcCKvUv3Om3qeBEhqTaW2sniLQ95+QDA@mail.gmail.com>
From: Pete Cordell <petejson@codalogic.com>
Message-ID: <a36fc644-d3be-201e-b044-ed371fe7e52b@codalogic.com>
Date: Wed, 24 Jan 2018 12:19:14 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2
MIME-Version: 1.0
In-Reply-To: <CA+mwktJU4xVHxRzgd=dcCKvUv3Om3qeBEhqTaW2sniLQ95+QDA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-GB
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/F1RvsVSVFOH2tq5cONrxkOYXU0Y>
Subject: Re: [Json] JSON Content Rules
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jan 2018 12:19:23 -0000

Hi Daniel,

As Andy said, it's been bugging us too.

The fundamental problem is (was!) that a quoted string (e.g. "foo") can 
either be a member name or a string value.  Also, a regex (e.g. /bar/) 
could either be a member name regex or a value regex.

The ambiguity could potentially be resolved with back tracking in the 
parser, but I've wanted to try to make JCR as simple to parse as 
possible in the hope that this will facilitate its adoption.  (I think 
what I've been promoting is called an LR(1) grammar.)  Hence the 
approach of adding the extra ':' or 'type' token to do the disambiguation.

The new change linked to by Andy at 
https://github.com/arineng/jcrvalidator/issues/112 essentially 
disambiguates this by adopting slightly different syntax for the various 
uses.  So we've ended up with:

    "foo" - A member name (but see caveat below)
    'bar' - A string value (new syntax, but also see below)
    /^baz\d+$/ - A regex value
    `biff\d+` - A member name regex (new syntax)

Now the various usages are unambiguous, and we don't need the colon to 
differentiate whether a quoted string is a member name or value, etc.

Now to the caveat...

We wanted to still allow JCR to be a superset of JSON syntax (to 
facilitate easy creation of JCR from example JSON).  So we wanted to allow:

     { "name" : "Fred" }

Hence, in situations where the parser has read a member name and colon, 
and knows that what follows is a value, a string value can either be 
single quoted or double quoted; 'Fred' or "Fred".

The result is that the following are all valid:

     $r1 = "name" : "Fred" ; A member rule
     $r2 = "name" : 'Fred' ; Also a member rule
     $r3 = 'Fred'          ; A string value
     $r4 = /^Fred/'        ; A regex value
     $r5 = `p[0-9]` : integer ; Member rule with regex name
     $r6 = string          ; The big win - Colon no longer needed

The following would be an error:

     $e1 = "Fred" $e2 = string

because when parsing $e1 and seeing "Fred", the parsing would interpret 
"Fred" as a member name, and therefore complain when encountering $e2 
without first seeing a colon.  (You'd have to do the following instead: 
$e1 = 'Fred' $e2 = string)

Personally I think the revision is a lot neater than what we had, and I 
hope it's not too difficult for developers to grok.  I look forward to 
your comments.

Thanks,

Pete.
-- 
---------------------------------------------------------------------
Pete Cordell
http://www.xml2cpp.com
---------------------------------------------------------------------
On 23/01/2018 22:13, Daniel P wrote:
> Hello everyone,
> 
> I would like to solicit feedback from members of this forum on one 
> feature of the JSON Content Rules specification, draft 09 
> <https://datatracker.ietf.org/doc/draft-newton-json-content-rules/?include_text=1>, 
> as I'm considering to build an implementation.
> 
> The specification states: "There are two forms of rule name assignments: 
> assignments of  primitive types and assignments of all other types.  
> Rule name assignments to primitive type specifications [e.g. string] 
> separate the rule name from the type specification with the character 
> sequence '=:', whereas  rule name assignments for all other type 
> specifications [e.g. array] only require the separation using the '=' 
> character ... This syntax is necessary so  that JCR parsers may readily 
> distinguish between rule name assignments involving string and regular 
> expressions primitive types and member names of member specifications."
> 
> An example (I hope I have this right):
> 
> { $bar-name : $bar-val, "foo" : $foo-val }
> 
> ; member name specification
> $bar-name = /^bar[0-9]$/
> 
> ; primitive type specification
> $foo-val =: "foo"
> 
> ; non primitive type specification
> $bar-val = [ integer, integer, integer ]
> 
> In what otherwise appears to me to be a fairly clean specification, I'm 
> having some difficulty digesting this syntax, with "=" if such and such, 
> and "=:" if so and so. I would be interested if anyone else on this list 
> has any thoughts about this.
> 
> Thanks,
> Daniel Parker
> https://github.com/danielaparker/jsoncons
> 
> 
> 
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>