Re: [Json] JSON for Internet messages

Tatu Saloranta <tsaloranta@gmail.com> Wed, 03 July 2013 20:07 UTC

Return-Path: <tsaloranta@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3CE8A11E8226 for <json@ietfa.amsl.com>; Wed, 3 Jul 2013 13:07:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XLxEfkkjCxNA for <json@ietfa.amsl.com>; Wed, 3 Jul 2013 13:07:04 -0700 (PDT)
Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) by ietfa.amsl.com (Postfix) with ESMTP id ECA1521F99BB for <json@ietf.org>; Wed, 3 Jul 2013 13:07:03 -0700 (PDT)
Received: by mail-wi0-f177.google.com with SMTP id ey16so532155wid.16 for <json@ietf.org>; Wed, 03 Jul 2013 13:07:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Lk7+WhP0GQ5Wk712x5/DdfO2by+o+6eNQ764a1/EG00=; b=psyKsk29LnpFt5VsLEm8YwrEirPd2fkD6fqWADvOuSuvkzApWMtLZt7TBacjPt/3HG ykIkTWIWD6ZmLyKwMKS6Vhdy+gmUXknU73fiC5N7nn4eYw3bw7AzMks6S0uXISYH/Lnu KFQS7mQYFENz+9exnix3A/WcEQ6myTadhDgBHvn2UlcZKwuUmwDE+PaoAz2Yl/fRQYk7 Q2vtOZz3a9GB2g1p5HBEdl0naw2FtevVrEHNqBFzokh/90s8jObxf8UiwwQrgW4mO4cM oVJa7KAw8fwqvmWF6/qgggV9fX8/KHGoDTlHdO8jLuVXv6eAK2ltWWqOpzm5jOnFEcID Vu+A==
MIME-Version: 1.0
X-Received: by 10.180.188.36 with SMTP id fx4mr18918201wic.55.1372882023041; Wed, 03 Jul 2013 13:07:03 -0700 (PDT)
Received: by 10.227.72.74 with HTTP; Wed, 3 Jul 2013 13:07:02 -0700 (PDT)
In-Reply-To: <CAO1wJ5TO8DRCvXctk1D5_LjK8me2vUX4adJmLrmJWzi=RNj_eg@mail.gmail.com>
References: <CAHBU6it55C5vCNLBki1LvjpWd4fANY8LdC4fzxj3a2G_+q=qSA@mail.gmail.com> <CAO1wJ5TO8DRCvXctk1D5_LjK8me2vUX4adJmLrmJWzi=RNj_eg@mail.gmail.com>
Date: Wed, 03 Jul 2013 13:07:02 -0700
Message-ID: <CAGrxA26FXTWnJ7WNyuuZfU0aE-EQmTae32-0Or3Az8k_=ApM3Q@mail.gmail.com>
From: Tatu Saloranta <tsaloranta@gmail.com>
To: Jacob Davies <jacob@well.com>
Content-Type: multipart/alternative; boundary="001a11c25cd84d525a04e0a10128"
Cc: Tim Bray <tbray@textuality.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] JSON for Internet messages
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2013 20:07:05 -0000

On Wed, Jul 3, 2013 at 12:29 PM, Jacob Davies <jacob@well.com> wrote:

> On Wed, Jul 3, 2013 at 9:34 AM, Tim Bray <tbray@textuality.com> wrote:
> > So I care a lot about JSON, but I don’t care in the slightest about
> usages
> > where the JSON isn’t being used for application-level message protocol
> > payloads (are there such usages?  I'm curious).
>
> There are a number of practical use cases for JSON that don't fall
> into the neat "externally delimited stream of bytes described by an
> application/json media-type" category. That is a separate question to
> whether the RFC needs to do much to accomodate them.
>
> 1. Files named ".json" and containing a single JSON object or array.
> 2. Text documents containing embedded JSON - informally, as in
> documentation or books, or formally, as in Javascript in HTML or .js
> files.
> 3. Database records where text or binary columns store embedded JSON
> representing complex nested objects.
>
> Case 1 is handled the same as a message described as application/json,
> no big deal. Cases 2 & 3 may require accommodating non-Unicode
> encodings, and may have some special security concerns (e.g.
> additional escaping for HTML or Javascript embedding). It would be
> nice if the spec separated format from encoding, but it's unlikely to
> be the source of any significant problems since in most cases the
> handling of encoding will have happened by the time you parse the
> text, and the security concerns can be addressed in best-practices.
>
> > Also, I’ve never encountered a scenario where the messages were of
> sufficient size that
> > anyone gave a rat’s ass about streaming.
>
> Me either, and even if there were such cases I find the concerns about
> maintaining state unconvincing - maintaining a seen-keys dictionary
> for each level of nesting you're in is not all that much of an
> imposition or incompatible with a streaming API.
>
>
Believe it or not, but this actually would become major overhead for
streaming level; as well as for data-binding tools of statically typed
languages (Java, C#).
I am sure that there are clever ways to limit the overhead, but a naive
implementation in Java would give us 30-50% processing time boost, which
generally adds no value (assuming "last value wins" is adhered to and/or if
generator can avoid producing dups).

Common use case for me is as follows:

1. Input comes as a typed object
2. Serializer accesses properties and guarantees uniqueness of keys, calls
generator
3. Generator simply outputs tokens as requested (maintaining basic stack to
ensure syntactic output)
4. Receiving end uses parser to expose JSON as tokens, without dup checks
5. Deserializer maps properties to local objects created: if duplicates
were encountered, last value would stick due overwrites

in such a case, duplicates are not problematic; and this without explicit
checks.
Having to check duplicates on both sending and receiving end (at
parser/generator level) would most likely add +50% overhead for processing
time, because overhead on generator is relatively higher (JSON can be
written about 2x as fast as decoded).
Dup detection could work more efficiently on deserializer level, but would
require separate null markers to distinguish between JSON nulls and lack of
property.

Alternate case of using a Tree representation ("JsonNode" etc) would have
similar solution to duplicate problem: tree serializer would not produce
dups (since tree can not contain them); and tree deserializer would either
use last, or (if preferred) signal an error.

-+ Tatu +-