Re: [Json] Proposed minimal change for duplicate names in objects

Tim Bray <tbray@textuality.com> Sun, 07 July 2013 04:58 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB84621F9DB6 for <json@ietfa.amsl.com>; Sat, 6 Jul 2013 21:58:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.878
X-Spam-Level:
X-Spam-Status: No, score=-2.878 tagged_above=-999 required=5 tests=[AWL=0.098, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id K9AwIarYP84i for <json@ietfa.amsl.com>; Sat, 6 Jul 2013 21:58:25 -0700 (PDT)
Received: from mail-ve0-f182.google.com (mail-ve0-f182.google.com [209.85.128.182]) by ietfa.amsl.com (Postfix) with ESMTP id ACEC421F9D90 for <json@ietf.org>; Sat, 6 Jul 2013 21:58:25 -0700 (PDT)
Received: by mail-ve0-f182.google.com with SMTP id ox1so2683380veb.41 for <json@ietf.org>; Sat, 06 Jul 2013 21:58:24 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=1WAsLWLm4p340eVe/bAEG4u8RCB1+3NkGy03+3j9ND8=; b=lrwVKVvRGAK7Z0Cnc4nyQzj0w5uTzRX2XUGaxY/gmbJkukpPY/vb4pTJZZtLbOjgRs zL2bJasBQsi08wI2Z0dLFPZ+SvzHoTQ7ZQDRMKY//Ug4aehYp12nNH+cnuyxfl/bmGGA tp3iX1qCGNRF2G/DQPM/mS5sSNk2fTEvIya9INp0Su6Lwn8kNc+ju70g7Drd6qU85xOD V8CKdQwbjnNQYYFAqGfqCeGH3KUB4EUoxBSnq6BLeJpV1q6WNI2tywVE7vPfxdBByTek ef9KmnvwUQUsUwRBGpn+42qNpNfCrYXWFecA76v5myCevxJ/QhD/vpXcjHDbcOs3Ad08 nwog==
MIME-Version: 1.0
X-Received: by 10.52.90.115 with SMTP id bv19mr9234647vdb.108.1373173104769; Sat, 06 Jul 2013 21:58:24 -0700 (PDT)
Received: by 10.220.219.200 with HTTP; Sat, 6 Jul 2013 21:58:24 -0700 (PDT)
X-Originating-IP: [209.121.225.191]
In-Reply-To: <CAGrxA24y4D62XY-YnbDvKVwNKUickcEFxv1FUhc_yqG4KP-m-w@mail.gmail.com>
References: <B86E1D4B-1DC8-4AD6-B8B3-E989599E0537@vpnc.org> <CAK3OfOj3MNNhjwo2bMa5CgoqynzMRVvviBXC8szxt5D17Z7FDg@mail.gmail.com> <51D3C63C.5030703@cisco.com> <51D48023.1020008@qti.qualcomm.com> <20130703201143.GL32044@mercury.ccil.org> <00cd01ce7a9f$19adeaa0$4d09bfe0$@augustcellars.com> <00d701ce7aa6$cc5fe700$651fb500$@augustcellars.com> <CAK3OfOiWrWCvNQneokyycV1Jb98M=UR-U7z0dhxUjzVdf+PwDw@mail.gmail.com> <CAHBU6itdi3B1rWv2TiOYhL1QuOVxrFKt7OTWRoG+6TgV8Bc_uw@mail.gmail.com> <CAK3OfOgOYA5fas0oomF5amjP1bR5F=0+uve7mFD4=FMoEV7sWg@mail.gmail.com> <CAGrxA24y4D62XY-YnbDvKVwNKUickcEFxv1FUhc_yqG4KP-m-w@mail.gmail.com>
Date: Sat, 06 Jul 2013 21:58:24 -0700
Message-ID: <CAHBU6iuWLcXv0QKR=Ow8gkzoZLmoZjqYCqXDXR8FLVb7w7M2Tw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: Tatu Saloranta <tsaloranta@gmail.com>
Content-Type: multipart/alternative; boundary="20cf307f38de200ff404e0e4c7cc"
X-Gm-Message-State: ALoCoQk1UPCcfYtJa7OxUC79R2ycAzs9JbdhPgUxhMC/xj73RpFLLgfU94eUiKa425ga76zRAl5n
Cc: Nico Williams <nico@cryptonector.com>, Jim Schaad <ietf@augustcellars.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for duplicate names in objects
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Jul 2013 04:58:31 -0000

I’ll assume you’re right when you say dupe detection has been measured as
expensive at run time, but I’m thinking that if I were writing a reader I'd
implement a hash table with a test-and-set method, so I admit I'm surprised
by the finding.  I think I’d need to see a little more research before I’d
accept that as a given.  -T


On Sat, Jul 6, 2013 at 9:37 PM, Tatu Saloranta <tsaloranta@gmail.com> wrote:

> On Sat, Jul 6, 2013 at 7:57 PM, Nico Williams <nico@cryptonector.com>wrote:
>
>> On Sat, Jul 6, 2013 at 8:44 PM, Tim Bray <tbray@textuality.com> wrote:
>> > This feels like a no-brainer to me, but that’s probably because (as I’ve
>> > said before) I’m an API guy, and the only use for JSON objects in my
>> world
>> > is to transfer a hash table or database record or whatever from here to
>> > there, and there, and in such a situation dupes can never be useful or
>> > intended and can only be a symptom of breakage (or, in the JOSE case, a
>> > symptom of a malicious attack on my crypto).
>>
>> I agree.  As a security guy I would prefer if one way or another we
>> end up with no dup names, but as an "API guy" myself I think of the
>> streaming parsers (they offer an API after all).  Just say the magic
>> words: "to hell with minimal state streaming parsers" or perhaps
>> something to the effect that *some* component of a layered application
>> MUST reject objects with dup names.  It's either or.  Let's choose.
>>
>> I'm happy with "some component of a layered application MUST reject
>> objects with duplicate names" -- I prefer this to the "no minimal
>> state streaming parsers" alternative.
>>
>> I will assume that in general objects rarely have lots of names, so
>> that parsers need not keep that much state in order to check for dups.
>>  Requiring parsers to reject objects with dup names is my second
>> choice.
>>
>
> Just to make sure: I also do not have any use for duplicates, and consider
> them flaws in processing. I have never used duplicates for anything, nor
> find that interesting approach.
> The only concern really is that of mandating (or not) detection and/or
> prevention at lowest level components of commonly used processing stacks
> (low-level push/pull parser, higher-level library or app code that builds
> full representations), since this is significant cost, based on extensive
> profiling I have done at this level.
>
> Case of application code directly using streaming parser/generators is not
> nearly as common as that of frameworks using them to produce higher level
> abstractions.
> These higher level abstraction (JSON tree representation, binding to
> native objects) do either report errors such as duplicates, and at very
> least can detect it and use consistent handling. They can do it much more
> efficiently than low-level components since they have to build
> representations that then serve as data structures for detecting/preventing
> duplicates.
>
> Specific concern for me is this: if specification mandates detection
> and/or prevention for parser, generators, without any mention that 'parser'
> and 'generator' are logical concepts (thing that reads JSON, thing that
> writes JSON), I will have lots of users who promptly demand low-level
> components to force checks.
> And then I get to spend lots of time discussing why such checks can (and
> IMO should) still be pushed to higher level of processing. It is amazing
> how much FUD can be generated based on cursory reading of specifications.
>
> This concern extends to suggested "Internet JSON messages" specification.
>
> I would like simple suggestion that some component of the processing
> should/must detect and report duplicates; and prevent producing of same;
> or, lengthier explanation of what "parser" and "generator" mean (parser is
> such a horrible misnomer -- there is very very little parsing involved,
> it's just a lexer, and optional object builder -- but I digress).
>
> -+ Tatu +-
>
>