[Json] Schema Requirements (Was: Re: Nudging the English-language vs. formalisms discussion forward)

"Pete Cordell" <petejson@codalogic.com> Thu, 20 February 2014 16:55 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1FA361A0212 for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 08:55:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.994
X-Spam-Level: ***
X-Spam-Status: No, score=3.994 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, FH_HOST_EQ_D_D_D_D=0.765, HELO_MISMATCH_COM=0.553, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, STOX_REPLY_TYPE=0.439, STOX_REPLY_TYPE_WITHOUT_QUOTES=1.757] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DGgWLw1mojuq for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 08:55:22 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) by ietfa.amsl.com (Postfix) with ESMTP id CDC131A0205 for <json@ietf.org>; Thu, 20 Feb 2014 08:55:21 -0800 (PST)
Received: (qmail 22663 invoked from network); 20 Feb 2014 16:54:35 +0000
Received: from host81-155-177-242.range81-155.btcentralplus.com (HELO codalogic) (81.155.177.242) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (RC4-MD5 encrypted, authenticated); 20 Feb 2014 16:54:35 +0000
Message-ID: <FE06CD427A4044B995F57C4926A1C8C2@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: Phillip Hallam-Baker <hallam@gmail.com>
References: <C87F9B96-E028-4F0E-A950-B39D3F68FFE7@vpnc.org><CAMm+LwhUh_yN-hzaoDWfrO_H2iGvYvj99BCE4EcYmgqCPqXoVQ@mail.gmail.com><CAHBU6itpttXBfVQGKw=u==k_XSdrht81+m_YDNZP6RM+=9CNow@mail.gmail.com><CAK3OfOjHkBFOzJSx=bhhoQJ8Z2bWyEXK52dNyYGWVb9FAj99ow@mail.gmail.com><CAHBU6itzQ0rzU3EUYUqzm2qhx03qk1mpx2sehS_zeiw1ypcEgw@mail.gmail.com><CAK3OfOhfjkbq6eREkt=MBVL1C9ubh-6My3Lvg-mnOxD0+cpN1Q@mail.gmail.com><CAHBU6isZbew8O1HJ+XcFsMCR42iDoO_uemPXVwa3=vM5A=MngA@mail.gmail.com><CAK3OfOgmVsNJqrqCfsD7h37axssOoaX3DGHqO=bTn5bWrA+MFA@mail.gmail.com><A4B53816-6FBF-4A37-8BC9-F0A9D0867BCD@tzi.org><357740A8AA0F4316BE630917321FAB4D@codalogic><B1EBE05A69362F001777F807@cyrus.local><47BB9131737D42218A6382DEF45BBE2C@codalogic><CAMm+LwgmHjoLu2=zTOERN8LO74hWpp45yy2epd2JzqDRM9oFfg@mail.gmail.com><AF211B67DB3D453D9DE8F8FA53886F73@codalogic> <CAMm+LwguTBkGQBHN+e2kU6XxECsic9Kcvda+7X6KDNe0TQxq4w@mail.gmail.com>
X-Unsent: 1
X-Vipre-Scanned: 01BF98E600682C01BF9A33
Date: Thu, 20 Feb 2014 16:55:01 -0000
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Archived-At: http://mailarchive.ietf.org/arch/msg/json/YtFvpz7xUwp16suaJnkOibvtWFA
Cc: Carsten Bormann <cabo@tzi.org>, JSON WG <json@ietf.org>
Subject: [Json] Schema Requirements (Was: Re: Nudging the English-language vs. formalisms discussion forward)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Feb 2014 16:55:25 -0000

----- Original Message From: "Phillip Hallam-Baker" <hallam@gmail.com>
> On Thu, Feb 20, 2014 at 10:22 AM, Pete Cordell 
> <petejson@codalogic.com>wrote:
>
>> ----- Original Message From: "Phillip Hallam-Baker"
>>
>>
>>  IETF has not decided how to make JSON protocols composable.
>>>
>>
>> Isn't that why we're here?
>
>
> It is why the group is here. But my point is that the group should decide
> how to compose protocols and the people developing schemas should support
> those tropes that are chosen.
>
> So it is a separate problem. If people decide on how to do this, I will be
> happy to extend my tool. But I don't think how this problem is solved
> should be a way to decide between tools.

As it's framed at the moment, nobody has to decide between tools (at least 
not in the JSON working group).

> Yes, QNames are broken, so don't do that.  Think more packages in Java,
>> namespaces in C#.  As simple as doing something like:
>>
>>    int port;
>>    com.ietf.sip.contact contact;
>>
>> yields JSON of:
>>
>>    "port": 25, "contact": "...whatever..."
>>
>> We don't have to make things complicated!
>
>
> That is a model that I could very easily support as my tool is written in
> C#. In fact it already supports that kind of type label. All I would have
> to do is add in 'Using' specifiers at the top.
>
> The only thing it doesn't support are those semicolons. I find a carriage
> return is enough.

The details of how it's done are not important right now.  I only presented 
an example syntax as a way of illustrating what I'm proposing.

> Incidentally, this is what I meant when I said that discussion of the
> schema ideas and merger might be more useful than more proposals.
>
>
>> I disagree, I don't think the schema should address sub-syntax at all
>>> except for a small number of encodings that are RFCs such as RFC3339
>>> timestamps, URIs and DNS labels
>>
>>
>
>
>> I disagree with this.  Dates alone illustrate that there are
>> 'microformats' that can result in a much more compact, meaningful and
>> useful format than say:
>>
>>    { "date": 25, "month": 12, "year": 2015, "hour": 12, "min": 0 }
>>
>
> But we already have an RFC for Date - RFC 3339. (Actually we have others
> but we will forget those).
>
> So this is one of the microformats that I define in the tool as intrinsic.
> This also allows for the use of Integer encodings for DateTimes using
> seconds since the start of some epoch.

My position is that, having recognised that Dates represent a case where 
microformats are useful, perhaps we should not assume that these are the 
only cases.  IP addresses?  Crypto OIDs?  Dates on Mars?

>> Another location format might be:
>>
>>    56°14'23.45"N,18°5'16.65"W
>>
>
> { [56,14,23,45], [-18,-5,-16,-65]}
>
> Forcing such formats to be JSONified may cause as much error as forcing
>> IEEE 754 numbers to be decimalised.
>>
>
> It is certainly possible to do the job wrong but the above does not result
> in any loss of precision.
>
> The only case where I would feel comfortable with the microformat above is
> if the protocol was interfacing to some application where that format was
> already defined and translation between the formats would be needed.

Exactly.  That's the use-case - where the format is already standard for the 
domain in question.

> So let's learn from the past and recognise that we can't build them all in
>> upfront and design our format accordingly.
>
>
> There may be a need for a microformat. But that should be rare and should
> probably be specified in ABNF rather than the JSON schema.

Exactly.  It's saying there's something special here that is beyond my (the 
JSON schema's) ability to define.  It's also admitting that it's a rare case 
and the JSON schema doesn't want to be able to define it.  However, just 
because it doesn't want to do it, doesn't mean it wants to prevent you from 
defining something if it's useful to you.  EBNF with narrative in an RFC 
would be a suitable way to define it's format.

> In most real world examples you are going to get far more leverage by
>>> sticking to only encoding in JSON data model and then finding an 
>>> efficient
>>> way to encode JSON rather than inventing ad-hoc microformats that are
>>> neither JSON nor JSON data model, are going to require a custom parser 
>>> and
>>> are not going to compress.
>>>
>>
>> I'm not interested in compression beyond zip and co.  It may sound harsh
>> of me, but my feeling is if this or the errors in floating point numbers 
>> is
>> critical to you, then use something else.  We don't need something that 
>> is
>> all things, to all people.
>
>
> If someone is saying 'I only want text, space isn't important' then fine,
> just use text.
>
> But if someone is saying 'lets use JSON but then lets avoid making it too
> verbose using these microformats to shave a few bytes' then I would much
> prefer to just use an efficient encoding.

Microformats are nothing to do with minimising byte count.  They are about 
representing data in a natural form for the application domain.

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com