Re: [Json] Call for real-world examples of how parsers deal with duplicate keys

Tatu Saloranta <tsaloranta@gmail.com> Thu, 06 June 2013 23:12 UTC

Return-Path: <tsaloranta@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9DCC321E8053 for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 16:12:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.172
X-Spam-Level:
X-Spam-Status: No, score=-2.172 tagged_above=-999 required=5 tests=[AWL=0.427, BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PecP6qSok1rU for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 16:12:14 -0700 (PDT)
Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) by ietfa.amsl.com (Postfix) with ESMTP id A13D021E8050 for <json@ietf.org>; Thu, 6 Jun 2013 16:12:13 -0700 (PDT)
Received: by mail-wg0-f52.google.com with SMTP id z12so1967870wgg.19 for <json@ietf.org>; Thu, 06 Jun 2013 16:12:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=y+ZyAtoffNxkRRvc6UqwDMXNuiFOX+pO3PB7Ht+pH3Y=; b=0T6ivVKZOD67NKm/gDH9nNYc6W7E6p+DmTYFqfKIaHheBXd82f4T/sm0UKKSm0BsAd XSo2fKkgBYB85bBwIyB++QjMe2lleoCZLVW3MHIgPf/VM5oAUQ/Wv3Kq75NEXgS8UX5q OhIKx8a+ZKRcvqGBeLuBcyCejaUJxJrAT4xSKUCVREBbgY9P4svHg9SrK0mkALfNo2DI QHirQmCQzUgb+LkIcZiwZeJCjlDHUm8AditNkrk34Dmm35Ij2ocLI7yi/DlsxdX7hPgP LwrI4MbN7Vh/0ePoJC0gI2YwPHj8Qpm5JkPd7ZD1SUwVa0eFokkj5tyyEqWSdD/gwVnv yOCA==
MIME-Version: 1.0
X-Received: by 10.180.185.44 with SMTP id ez12mr213235wic.7.1370560332804; Thu, 06 Jun 2013 16:12:12 -0700 (PDT)
Received: by 10.227.97.6 with HTTP; Thu, 6 Jun 2013 16:12:12 -0700 (PDT)
In-Reply-To: <CAK3OfOj2xhNa5EyuG2H-rD3mJXd6NuszZUTvTJAMFCJj8DUpVA@mail.gmail.com>
References: <C79C116D-16A4-41BA-9E5A-1055E6B9C941@vpnc.org> <CAGrxA26H7joheXdrp2+KGcZ0wewCxVVWfcmxtqHA=q3hOXHndQ@mail.gmail.com> <CAK3OfOj2xhNa5EyuG2H-rD3mJXd6NuszZUTvTJAMFCJj8DUpVA@mail.gmail.com>
Date: Thu, 06 Jun 2013 16:12:12 -0700
Message-ID: <CAGrxA24VB6MU5x1LRv0b+B0b13t+h2-+n1kwGP8grQvX8Rnwqg@mail.gmail.com>
From: Tatu Saloranta <tsaloranta@gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: multipart/alternative; boundary="001a11c22574c7b84304de847141"
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Call for real-world examples of how parsers deal with duplicate keys
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 23:12:14 -0000

On Thu, Jun 6, 2013 at 3:35 PM, Nico Williams <nico@cryptonector.com> wrote:

> On Thu, Jun 6, 2013 at 2:37 PM, Tatu Saloranta <tsaloranta@gmail.com>
> wrote:
> > On Java, Jackson library:
> >
> > - Exposes both entries (key/value pairs) at streaming parsing level
>
> I don't think we should disqualify that sort of streaming parser
> implementation.
>
> This leads me to conclude that we should distinguish between streaming
> and stateful parsers (my terms; please suggest better ones).  Stateful
> parsers MUST accept only the last value, while streaming parsers MAY
> (and probably always will) accept all duplicate keys' values.
>
> Nico
> --
>

I agree with this.

It also goes to terminology: term "parser" is being used quite liberally,
meaning anything from tokenizer, to the thing that builds higher level
object representation (JSON-centrics trees, or host language objects), or
combination of the whole thing.
In this respect, related thread that tries to divide specification into
different sections makes sense; physical structure and serialization are
most related to low-level tokenization/generation, and then optional
logical model(s) more to builders/serializers.

Same is true for encoding aspects: at logical model level, underlying
character encoding is irrelevant. But for low-level tokenization it
matters: question like how encoding is obtained; or if no encoding
information is available, what are possible encodings (only UTF-8? UTF-8
and UTF-16 since two can be auto-detected?).

To me separation of physical and logical layers makes sense, even if most
users are not aware of separation of the two: implementors can not ignore
this.

-+ Tatu +-