Re: [Json] Schema Requirements (Was: Re: Nudging the English-language vs. formalisms discussion forward)

Phillip Hallam-Baker <hallam@gmail.com> Thu, 20 February 2014 18:31 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25DD21A0224 for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 10:31:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UjsT2gGpgLRd for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 10:31:04 -0800 (PST)
Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) by ietfa.amsl.com (Postfix) with ESMTP id 044A31A01AF for <json@ietf.org>; Thu, 20 Feb 2014 10:31:03 -0800 (PST)
Received: by mail-la0-f42.google.com with SMTP id hr13so1604675lab.29 for <json@ietf.org>; Thu, 20 Feb 2014 10:30:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0q+hzyTzJBWKUHJQ+VizAjrUnzfX+bZt5eJ2JwneFMc=; b=rZjvFjyxD4roBCaXumoH1HBWCsyQ5WkREJ2NepK1288Kn4Pv165ZGRL9UW2ltE7Ijp jSNi6azSxqDaceFisGev2vqNsdeIcraWeFW1uurzXsNQ2+B/bFxsG+QiYgWfre8PZR8b F66Ee0ntpDejIuGBWdpXxyzoGdq81szOyCB/XJ+1O/UPQn4N5xfb86wI9HGMUahD8JH7 BfXbnE59Bumsv5lVicn/kKJjpYTUhlxZmGW6DVBh5kv8fRc46EoReHKdQSQPD3/CVC7m g+jrdW5oMP21Ve+4qCTkkioMexXixC2N4CN4rBvphJGqeCeV1+WaWzKppaQ5SqkXkbcf EUnw==
MIME-Version: 1.0
X-Received: by 10.152.203.193 with SMTP id ks1mr2085669lac.0.1392921059573; Thu, 20 Feb 2014 10:30:59 -0800 (PST)
Received: by 10.112.37.168 with HTTP; Thu, 20 Feb 2014 10:30:59 -0800 (PST)
In-Reply-To: <CAK3OfOiogM36fR9oobh3D61ybsV6ZVbTb+WGjD8OZ71ALey5Qw@mail.gmail.com>
References: <C87F9B96-E028-4F0E-A950-B39D3F68FFE7@vpnc.org> <CAMm+LwhUh_yN-hzaoDWfrO_H2iGvYvj99BCE4EcYmgqCPqXoVQ@mail.gmail.com> <CAHBU6itpttXBfVQGKw=u==k_XSdrht81+m_YDNZP6RM+=9CNow@mail.gmail.com> <CAK3OfOjHkBFOzJSx=bhhoQJ8Z2bWyEXK52dNyYGWVb9FAj99ow@mail.gmail.com> <CAHBU6itzQ0rzU3EUYUqzm2qhx03qk1mpx2sehS_zeiw1ypcEgw@mail.gmail.com> <CAK3OfOhfjkbq6eREkt=MBVL1C9ubh-6My3Lvg-mnOxD0+cpN1Q@mail.gmail.com> <CAHBU6isZbew8O1HJ+XcFsMCR42iDoO_uemPXVwa3=vM5A=MngA@mail.gmail.com> <CAK3OfOgmVsNJqrqCfsD7h37axssOoaX3DGHqO=bTn5bWrA+MFA@mail.gmail.com> <A4B53816-6FBF-4A37-8BC9-F0A9D0867BCD@tzi.org> <357740A8AA0F4316BE630917321FAB4D@codalogic> <B1EBE05A69362F001777F807@cyrus.local> <47BB9131737D42218A6382DEF45BBE2C@codalogic> <CAMm+LwgmHjoLu2=zTOERN8LO74hWpp45yy2epd2JzqDRM9oFfg@mail.gmail.com> <AF211B67DB3D453D9DE8F8FA53886F73@codalogic> <CAMm+LwguTBkGQBHN+e2kU6XxECsic9Kcvda+7X6KDNe0TQxq4w@mail.gmail.com> <FE06CD427A4044B995F57C4926A1C8C2@codalogic> <CAK3OfOiogM36fR9oobh3D61ybsV6ZVbTb+WGjD8OZ71ALey5Qw@mail.gmail.com>
Date: Thu, 20 Feb 2014 13:30:59 -0500
Message-ID: <CAMm+LwhTprkCHBhK=xxZrqKLR+b3zE3K71MbZt+gTgAC9OxvBA@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: multipart/alternative; boundary=001a113470d2f4ce3f04f2dab481
Archived-At: http://mailarchive.ietf.org/arch/msg/json/kfELKdN3qXfFB2lMkwN_xPP_brA
Cc: Carsten Bormann <cabo@tzi.org>, Pete Cordell <petejson@codalogic.com>, JSON WG <json@ietf.org>
Subject: Re: [Json] Schema Requirements (Was: Re: Nudging the English-language vs. formalisms discussion forward)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Feb 2014 18:31:07 -0000

On Thu, Feb 20, 2014 at 12:22 PM, Nico Williams <nico@cryptonector.com>wrote;wrote:

> On Thu, Feb 20, 2014 at 10:55 AM, Pete Cordell <petejson@codalogic.com>
> wrote:
> > My position is that, having recognised that Dates represent a case where
> > microformats are useful, perhaps we should not assume that these are the
> > only cases.  IP addresses?  Crypto OIDs?  Dates on Mars?
>
> There are two ways to deal with alternative representations:
>
>  - convert to/from a canonical representation and use that one for
> interchange
>  - use a discriminated union (XDR) / CHOICE (ASN.1, same thing)
>
> I think any decent schema will need to allow for the latter, and not
> just because of types that have multiple possible representations.
>

In the code as it is right now it is possible to specify different formats
as different tags:

ConvolutedTime Structure
    DateTime                Date
    Format RFC1123     Date1
    Format RFC822       Date2

Would allow for any of the following:
    {"Date":"2002-10-02T10:00:00Z"}
    {"Date1":"Thursday, 20 Feb 2014 18:14:16 GMT"}
    {"Date2":"Thu, 20 Feb 2014 18:14:16 GMT"}

What I don't support at the moment is a constraint that says that only one
of the "Date", "Date1" and "Date2" tags is permitted. This could be
specified as:

ConvolutedTime Choice
    DateTime                Date
    Format RFC1123     Date1
    Format RFC822       Date2

[Protogen does allow braces to be used instead of indentation to denote
blocks but does not need end of statement or statement separators. If
people really insist on semicolons I can add a production to the lexer so
they are just treated like whitespace.]

 - which grammar parsing algorithms we want to support: LR, LALR(1),
> LALR(k), GLR, ...
>

I would hope we make the syntax so simple that we don't need the power of
LR parsers. They are models of human languages rather than computer
languages.



>  - the basic metaphor:
>     - types! (a-la ASN.1, only without the tags, no IOS, ...)
>     - pattern matching rules! (something like collections of
> XPath-like expressions)
>     - something else
>

Since the target languages are likely to be C#, JS and Java, I suggest that
we use dot separated tags and array indexes as path extractors.

Given:

{"first" : {"second" : {"third" : [{ "fourth": 1}, { "fourth": 42}, {
"fourth": 666}]}}}

first.second.third[1].fourth = 42



> The metaphor thing gets to, in part, the purpose of the schema:
>
>  - documentation for sure, validation no doubt, but, code generation
> (into C, Java, JS, C#, ...)?  (IMO: yes)
>

I have code generation for C# and C. I can add Java without difficulty.

It is not clear to me that code generation is necessary or particularly
useful for scripting languages with late binding like JS. In those
situations you can just say

X = JSON_Parse (text)

Answer = first.second.third[1].fourth



> Finally:
>
>  - extensibility (meta: to have it or not; if yes, how)
>

Hopefully formats are enough.


 - modularity (meta: to have it or not; if yes, how)
>

I am adding a Using directive right now. It seems to do the trick.



> Oh, and:
>
>  - one schema language, or more?  (IMO: inevitably we'll end up with more)
>

There will be multiple schemas. I think the progression will work like it
did for JSON itself. People will choose the one they like and there will
eventually be a convergence.

-- 
Website: http://hallambaker.com/