Re: [Json] fun with streaming, was The names within an object SHOULD be unique.

Tatu Saloranta <tsaloranta@gmail.com> Tue, 30 July 2013 17:56 UTC

Return-Path: <tsaloranta@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 358E311E81E1 for <json@ietfa.amsl.com>; Tue, 30 Jul 2013 10:56:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g55kUtkHDoUQ for <json@ietfa.amsl.com>; Tue, 30 Jul 2013 10:56:37 -0700 (PDT)
Received: from mail-we0-x22b.google.com (mail-we0-x22b.google.com [IPv6:2a00:1450:400c:c03::22b]) by ietfa.amsl.com (Postfix) with ESMTP id F3BF611E810B for <json@ietf.org>; Tue, 30 Jul 2013 10:56:36 -0700 (PDT)
Received: by mail-we0-f171.google.com with SMTP id q55so5387761wes.2 for <json@ietf.org>; Tue, 30 Jul 2013 10:56:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EXIK3iYLnb5PWYFiFtC4D0x1fqizXF8vQEHf/GQ7nVM=; b=dH6NbM4BHaGzhrBzLXnL/+ClLRzzKB5RR1+k38ApkLL6ehziBQqfzTJccgCTKCumJJ s8fwelG3GlTh6C8KHDLe12TrgSbUjJ9nLAx4O6rs+s6MWuAws0pO9ZCudz8iBPzY4+pe Y+To/i3nJYSfTge6oZe/dL1144csEqYH7DkJxLzQ9klN/s3OqZ+oauNo2hsM7n+ylMQx fnR/bf0axPNLYJubALRnU1iQz5rQHaIPej8TLQQRzJEwIK2lOZHlAI9tLI+oooeZgAy2 LBfoHCQxIFWhvfeebtApt6wxB7v1wgtNcxE98y9lZNrDGfUNe3/ocITXByJsQZwkzM0l jisA==
MIME-Version: 1.0
X-Received: by 10.180.19.196 with SMTP id h4mr1734079wie.38.1375206996108; Tue, 30 Jul 2013 10:56:36 -0700 (PDT)
Received: by 10.216.233.196 with HTTP; Tue, 30 Jul 2013 10:56:36 -0700 (PDT)
In-Reply-To: <CAHBU6it7vJZ7XXj2yy=VBLXVXAueNVf0EZb+CR9rCKn+hTLdcw@mail.gmail.com>
References: <20130730142623.GB17809@mercury.ccil.org> <20130730160719.3203.qmail@joyce.lan> <CAHBU6it7vJZ7XXj2yy=VBLXVXAueNVf0EZb+CR9rCKn+hTLdcw@mail.gmail.com>
Date: Tue, 30 Jul 2013 10:56:36 -0700
Message-ID: <CAGrxA27ut1MoGLO-kdH1LXjA9Ct7jmvh0G5XDzfaV6AgtaOv5Q@mail.gmail.com>
From: Tatu Saloranta <tsaloranta@gmail.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: multipart/alternative; boundary="bcaec53d55f17ef06a04e2be54be"
Cc: John Levine <johnl@taugh.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] fun with streaming, was The names within an object SHOULD be unique.
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Jul 2013 17:56:38 -0000

On Tue, Jul 30, 2013 at 9:39 AM, Tim Bray <tbray@textuality.com> wrote:

> On Tue, Jul 30, 2013 at 9:07 AM, John Levine <johnl@taugh.com> wrote:
>
>> >> I've never seen an implementation ...  Better, can you give an example
>> of such an
>>
>> >> implementation?
>> >
>> >I haven't either, but I could see the uses of an implementation
>>
>
> I don’t think “I could see the uses of” is a good reason for writing an
> Internet Standard.  Once again as with naked surrogates, there’s an
> invisible class of alleged implementors of JSON sending/receiving code for
> whom dupe-checking is intractable; except for we haven’t heard from anyone
> whose application actually depends on this behavior.
>
> Has anyone any personal knowledge of an app whose correct function depends
> on the use of duplicate keys?  -T
>
>
You are conflating the issues here.

If you ask whether someone depends on use of duplicate keys, I don't, and
have never seen use I consider valid. One mentioned use case for "comments"
does not seem valid to me.

But your comment on "alleged" intractability is different (and tone
unnecessary).

This is a common use case for processing large JSON files; either output as
JSON arrays, or just sequences of space-separate objects. Typical data sets
are log output, processing from map/reduce style jobs and batch jobs.

In those cases, object mapper or tree builder can build representation for
single object, using streaming parser, only reading that subset of content.
Reading the whole data set would be prohibitive and unnecessary.
Duplicate name detection at higher level is doable. Doing it at streaming
parser level is unnecessary and expensive relative to necessary parts of
decoding.
Separation between streaming part and higher layers is for good separation
of concern, as well as practical matter for reusing components.

I use such processing pipelines regularly, and based on number of questions
on various forums (Jackson user/dev mailing lists, StackOverflow) it is a
common use case.

-+ Tatu +-