Re: [Json] Call for real-world examples of how parsers deal with duplicate keys

Tatu Saloranta <tsaloranta@gmail.com> Thu, 06 June 2013 23:12 UTC

MIME-Version: 1.0
In-Reply-To: <CAK3OfOj2xhNa5EyuG2H-rD3mJXd6NuszZUTvTJAMFCJj8DUpVA@mail.gmail.com>
References: <C79C116D-16A4-41BA-9E5A-1055E6B9C941@vpnc.org> <CAGrxA26H7joheXdrp2+KGcZ0wewCxVVWfcmxtqHA=q3hOXHndQ@mail.gmail.com> <CAK3OfOj2xhNa5EyuG2H-rD3mJXd6NuszZUTvTJAMFCJj8DUpVA@mail.gmail.com>
Date: Thu, 06 Jun 2013 16:12:12 -0700
Message-ID: <CAGrxA24VB6MU5x1LRv0b+B0b13t+h2-+n1kwGP8grQvX8Rnwqg@mail.gmail.com>
From: Tatu Saloranta <tsaloranta@gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: multipart/alternative; boundary="001a11c22574c7b84304de847141"
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Call for real-world examples of how parsers deal with duplicate keys
Precedence: list

On Thu, Jun 6, 2013 at 3:35 PM, Nico Williams <nico@cryptonector.com> wrote:

> On Thu, Jun 6, 2013 at 2:37 PM, Tatu Saloranta <tsaloranta@gmail.com>
> wrote:
> > On Java, Jackson library:
> >
> > - Exposes both entries (key/value pairs) at streaming parsing level
>
> I don't think we should disqualify that sort of streaming parser
> implementation.
>
> This leads me to conclude that we should distinguish between streaming
> and stateful parsers (my terms; please suggest better ones).  Stateful
> parsers MUST accept only the last value, while streaming parsers MAY
> (and probably always will) accept all duplicate keys' values.
>
> Nico
> --
>

I agree with this.

It also goes to terminology: term "parser" is being used quite liberally,
meaning anything from tokenizer, to the thing that builds higher level
object representation (JSON-centrics trees, or host language objects), or
combination of the whole thing.
In this respect, related thread that tries to divide specification into
different sections makes sense; physical structure and serialization are
most related to low-level tokenization/generation, and then optional
logical model(s) more to builders/serializers.

Same is true for encoding aspects: at logical model level, underlying
character encoding is irrelevant. But for low-level tokenization it
matters: question like how encoding is obtained; or if no encoding
information is available, what are possible encodings (only UTF-8? UTF-8
and UTF-16 since two can be auto-detected?).

To me separation of physical and logical layers makes sense, even if most
users are not aware of separation of the two: implementors can not ignore
this.

-+ Tatu +-

[Json] Call for real-world examples of how parser… Paul Hoffman
Re: [Json] Call for real-world examples of how pa… Stefan Drees
Re: [Json] Call for real-world examples of how pa… Stefan Drees
Re: [Json] Call for real-world examples of how pa… Riccardo Bernardini
Re: [Json] Call for real-world examples of how pa… Matt Miller (mamille2)
Re: [Json] Call for real-world examples of how pa… Bjoern Hoehrmann
Re: [Json] Call for real-world examples of how pa… Tatu Saloranta
Re: [Json] Call for real-world examples of how pa… Vinny A
Re: [Json] Call for real-world examples of how pa… Stephan Beal
Re: [Json] Call for real-world examples of how pa… Vinny A
Re: [Json] Call for real-world examples of how pa… Nico Williams
Re: [Json] Call for real-world examples of how pa… Nico Williams
Re: [Json] Call for real-world examples of how pa… Tatu Saloranta
Re: [Json] Call for real-world examples of how pa… Allen Wirfs-Brock