Re: [Json] Proposed minimal change for duplicate names in objects
Tatu Saloranta <tsaloranta@gmail.com> Sun, 07 July 2013 17:57 UTC
Return-Path: <tsaloranta@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix)
with ESMTP id 2606821F9A35 for <json@ietfa.amsl.com>;
Sun, 7 Jul 2013 10:57:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5
tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r49hglEHuKWk for
<json@ietfa.amsl.com>; Sun, 7 Jul 2013 10:57:50 -0700 (PDT)
Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com
[IPv6:2a00:1450:400c:c00::22f]) by ietfa.amsl.com (Postfix) with ESMTP id
3DB3621F9A2A for <json@ietf.org>; Sun, 7 Jul 2013 10:57:50 -0700 (PDT)
Received: by mail-wg0-f47.google.com with SMTP id l18so3148680wgh.14 for
<json@ietf.org>; Sun, 07 Jul 2013 10:57:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type; bh=sFfNV4iAr1DqQpDZMTlEauPhbht94noOwiluq1CofLk=;
b=zXHxe+Pe4R8R1iSzn3zhiyWAib5GARqDCrT9kr5IeKuH4VXmBCpzeObqjdzLHhE4Tp
rZnfVSPjagR4MUoKKNk4qUKctKmTaaoK5jerAMk8CYMn9CCxv5h8/tJZ1SoESiagU6ej
BFQZka9CROfNZE2gnRDQ2GG7J16E8HjesR/k3nhzmrfm+H6q4k6u9aInJpTcENhSMJ0E
v6UFU3Ouk7+BL3+g5fk6L3azO/jzgDfPuuxv3Ub5BGzfYsdaexJ5E16jKz8YDW2Fg9mi
y26Pp0bgKcNSADkgun24C8WeMPlFyGrY7vqcbg6n8Xy1bftR+Z2JeMTC7ptGxO/yEovz xj9g==
MIME-Version: 1.0
X-Received: by 10.180.208.17 with SMTP id ma17mr9866811wic.7.1373219868251;
Sun, 07 Jul 2013 10:57:48 -0700 (PDT)
Received: by 10.227.34.199 with HTTP; Sun, 7 Jul 2013 10:57:47 -0700 (PDT)
In-Reply-To: <CAHBU6iuWLcXv0QKR=Ow8gkzoZLmoZjqYCqXDXR8FLVb7w7M2Tw@mail.gmail.com>
References: <B86E1D4B-1DC8-4AD6-B8B3-E989599E0537@vpnc.org>
<CAK3OfOj3MNNhjwo2bMa5CgoqynzMRVvviBXC8szxt5D17Z7FDg@mail.gmail.com>
<51D3C63C.5030703@cisco.com> <51D48023.1020008@qti.qualcomm.com>
<20130703201143.GL32044@mercury.ccil.org>
<00cd01ce7a9f$19adeaa0$4d09bfe0$@augustcellars.com>
<00d701ce7aa6$cc5fe700$651fb500$@augustcellars.com>
<CAK3OfOiWrWCvNQneokyycV1Jb98M=UR-U7z0dhxUjzVdf+PwDw@mail.gmail.com>
<CAHBU6itdi3B1rWv2TiOYhL1QuOVxrFKt7OTWRoG+6TgV8Bc_uw@mail.gmail.com>
<CAK3OfOgOYA5fas0oomF5amjP1bR5F=0+uve7mFD4=FMoEV7sWg@mail.gmail.com>
<CAGrxA24y4D62XY-YnbDvKVwNKUickcEFxv1FUhc_yqG4KP-m-w@mail.gmail.com>
<CAHBU6iuWLcXv0QKR=Ow8gkzoZLmoZjqYCqXDXR8FLVb7w7M2Tw@mail.gmail.com>
Date: Sun, 7 Jul 2013 10:57:47 -0700
Message-ID: <CAGrxA252xregfKViPOH7ustTSwOoorrffwdbUuCrryVxmqHUNw@mail.gmail.com>
From: Tatu Saloranta <tsaloranta@gmail.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: multipart/alternative; boundary=001a11c3808e721f9504e0efaaa4
Cc: Nico Williams <nico@cryptonector.com>, Jim Schaad <ietf@augustcellars.com>,
"json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for duplicate names in objects
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>,
<mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>,
<mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Jul 2013 17:57:52 -0000
On Sat, Jul 6, 2013 at 9:58 PM, Tim Bray <tbray@textuality.com> wrote: > I’ll assume you’re right when you say dupe detection has been measured as > expensive at run time, but I’m thinking that if I were writing a reader I'd > implement a hash table with a test-and-set method, so I admit I'm surprised > by the finding. I think I’d need to see a little more research before I’d > accept that as a given. -T > > Point is that no hash table need be created at all. All one has is simple tokenizer (lexer). No tree is built, just parent stack with information to match close markers, report error locations. And bit of state information to validate well-formedness wrt separators. I agree that if a per-document symbol table/map is maintained, extra cost would probably be negligible and not worth worrying too much about. Hash table will typically be used at higher level, if building an object representation, such as tree model, or native objects: either as structure to build (tree), or mapping to object fields (name to method/field). At this level catching duplicates is easier and more efficient to handle, and I would not be worried about mandating such handling. So why separation to layers like this? While writing a naive tokenizer is an easy task, it takes much more effort to create an efficient one. But there are not many variations there, so once an efficient one exists, it is easy to use that as a building block, save time and have some confidence that simple encoding/decoding works correctly and efficiently. Conversely, higher level components vary a lot regarding functionality. This is similar to division between low-level SAX or Stax APIs, and higher-level DOM (tree) and JAXB (data binding) for XML on Java. I think C#/dotnet has similar separation between pull-parsing and higher-level abstractions, but it seems absent from most scripting languages. As to research, I don't know if this -- https://github.com/eishay/jvm-serializers/wiki -- helps, but contrasting these two entries: create ser deser total size +dfl json/jackson/manual 62 1137 1519 2656 468 253 json/jackson/tree 63 2045 2650 4695 485 259 gives difference between manually written data-binding (streaming parser, hand-written code to match names to fields), and building a HashMap-based simple tree model (built on streaming parser) as 2-to-1 performance difference (2656 nsec vs 4695 nsec). Part of this is with overhead of Maps vs. plain java objects, but not all. I could prototype with actual detection too (in fact, there is an RFE for streaming parser to do just that -- and I consider it a legitimate thing to do as optional settings) and see how much actual overhead a highly-tuned minimal-state parser will have on Java platform. -+ Tatu +- > > On Sat, Jul 6, 2013 at 9:37 PM, Tatu Saloranta <tsaloranta@gmail.com>wrote;wrote: > >> On Sat, Jul 6, 2013 at 7:57 PM, Nico Williams <nico@cryptonector.com>wrote;wrote: >> >>> On Sat, Jul 6, 2013 at 8:44 PM, Tim Bray <tbray@textuality.com> wrote: >>> > This feels like a no-brainer to me, but that’s probably because (as >>> I’ve >>> > said before) I’m an API guy, and the only use for JSON objects in my >>> world >>> > is to transfer a hash table or database record or whatever from here to >>> > there, and there, and in such a situation dupes can never be useful or >>> > intended and can only be a symptom of breakage (or, in the JOSE case, a >>> > symptom of a malicious attack on my crypto). >>> >>> I agree. As a security guy I would prefer if one way or another we >>> end up with no dup names, but as an "API guy" myself I think of the >>> streaming parsers (they offer an API after all). Just say the magic >>> words: "to hell with minimal state streaming parsers" or perhaps >>> something to the effect that *some* component of a layered application >>> MUST reject objects with dup names. It's either or. Let's choose. >>> >>> I'm happy with "some component of a layered application MUST reject >>> objects with duplicate names" -- I prefer this to the "no minimal >>> state streaming parsers" alternative. >>> >>> I will assume that in general objects rarely have lots of names, so >>> that parsers need not keep that much state in order to check for dups. >>> Requiring parsers to reject objects with dup names is my second >>> choice. >>> >> >> Just to make sure: I also do not have any use for duplicates, and >> consider them flaws in processing. I have never used duplicates for >> anything, nor find that interesting approach. >> The only concern really is that of mandating (or not) detection and/or >> prevention at lowest level components of commonly used processing stacks >> (low-level push/pull parser, higher-level library or app code that builds >> full representations), since this is significant cost, based on extensive >> profiling I have done at this level. >> >> Case of application code directly using streaming parser/generators is >> not nearly as common as that of frameworks using them to produce higher >> level abstractions. >> These higher level abstraction (JSON tree representation, binding to >> native objects) do either report errors such as duplicates, and at very >> least can detect it and use consistent handling. They can do it much more >> efficiently than low-level components since they have to build >> representations that then serve as data structures for detecting/preventing >> duplicates. >> >> Specific concern for me is this: if specification mandates detection >> and/or prevention for parser, generators, without any mention that 'parser' >> and 'generator' are logical concepts (thing that reads JSON, thing that >> writes JSON), I will have lots of users who promptly demand low-level >> components to force checks. >> And then I get to spend lots of time discussing why such checks can (and >> IMO should) still be pushed to higher level of processing. It is amazing >> how much FUD can be generated based on cursory reading of specifications. >> >> This concern extends to suggested "Internet JSON messages" specification. >> >> I would like simple suggestion that some component of the processing >> should/must detect and report duplicates; and prevent producing of same; >> or, lengthier explanation of what "parser" and "generator" mean (parser is >> such a horrible misnomer -- there is very very little parsing involved, >> it's just a lexer, and optional object builder -- but I digress). >> >> -+ Tatu +- >> >> >
- [Json] Proposed minimal change for duplicate name… Paul Hoffman
- Re: [Json] Proposed minimal change for duplicate … Vinny A
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Stefan Drees
- Re: [Json] Proposed minimal change for duplicate … Eliot Lear
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Eliot Lear
- Re: [Json] Proposed minimal change for duplicate … Stefan Drees
- Re: [Json] Proposed minimal change for duplicate … Markus Lanthaler
- Re: [Json] Proposed minimal change for duplicate … Paul Hoffman
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Eliot Lear
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Paul Hoffman
- Re: [Json] Proposed minimal change for duplicate … Bjoern Hoehrmann
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … Pete Resnick
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Eliot Lear
- Re: [Json] Proposed minimal change for duplicate … Tim Bray
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Tim Bray
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Stefan Drees
- Re: [Json] Proposed minimal change for duplicate … Matthew Morley
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Jorge
- Re: [Json] Proposed minimal change for duplicate … Bjoern Hoehrmann
- Re: [Json] Proposed minimal change for duplicate … Jim Schaad
- Re: [Json] Proposed minimal change for duplicate … Jim Schaad
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Tim Bray
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … Tim Bray
- Re: [Json] Proposed minimal change for duplicate … Jim Schaad
- [Json] Duplicate names: are they erroneous or not? John Cowan
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Proposed minimal change for duplicate … Nico Williams
- Re: [Json] Duplicate names: are they erroneous or… Stefan Drees
- Re: [Json] Duplicate names: are they erroneous or… Bjoern Hoehrmann
- Re: [Json] Proposed minimal change for duplicate … Bjoern Hoehrmann
- Re: [Json] Proposed minimal change for duplicate … Carsten Bormann
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … Stephan Beal
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Carsten Bormann
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Bjoern Hoehrmann
- Re: [Json] Proposed minimal change for duplicate … Carsten Bormann
- Re: [Json] Proposed minimal change for duplicate … Joe Hildebrand (jhildebr)
- Re: [Json] Proposed minimal change for duplicate … Vinny A
- Re: [Json] Proposed minimal change for duplicate … Eliot Lear
- Re: [Json] Proposed minimal change for duplicate … Markus Lanthaler
- Re: [Json] Proposed minimal change for duplicate … Stephan Beal
- Re: [Json] Proposed minimal change for duplicate … Eliot Lear
- Re: [Json] Proposed minimal change for duplicate … John Cowan
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … Tatu Saloranta
- Re: [Json] Proposed minimal change for duplicate … John Cowan