Re: [Json] The names within an object SHOULD be unique.

Stefan Drees <stefan@drees.name> Sun, 09 June 2013 10:58 UTC

Return-Path: <stefan@drees.name>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 281F421F8E93 for <json@ietfa.amsl.com>; Sun, 9 Jun 2013 03:58:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.156
X-Spam-Level:
X-Spam-Status: No, score=-1.156 tagged_above=-999 required=5 tests=[AWL=-1.022, BAYES_00=-2.599, HELO_EQ_DE=0.35, J_CHICKENPOX_22=0.6, J_CHICKENPOX_35=0.6, J_CHICKENPOX_45=0.6, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sMqYm8IIkKSg for <json@ietfa.amsl.com>; Sun, 9 Jun 2013 03:57:56 -0700 (PDT)
Received: from mout.web.de (mout.web.de [212.227.15.14]) by ietfa.amsl.com (Postfix) with ESMTP id 8A07E21F92BB for <json@ietf.org>; Sun, 9 Jun 2013 03:57:56 -0700 (PDT)
Received: from newyork.local.box ([93.129.74.105]) by smtp.web.de (mrweb103) with ESMTPSA (Nemesis) id 0Lm4hJ-1UCqWw3CYj-00a9I1; Sun, 09 Jun 2013 12:57:48 +0200
Message-ID: <51B45FAA.4070900@drees.name>
Date: Sun, 09 Jun 2013 12:57:46 +0200
From: Stefan Drees <stefan@drees.name>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Stephen Dolan <stephen.dolan@cl.cam.ac.uk>
References: <51AF8479.5080002@crockford.com> <CAK3OfOgtYoPRZ-Gj5G8AnNipDyxYs=6_KD=rQTxKbhDPX6FZNA@mail.gmail.com> <51b1168c.e686440a.5339.5fc4SMTPIN_ADDED_BROKEN@mx.google.com> <CAK3OfOhL3zXHfg9EEDWLXhjLQ1aBvvxikKAiR+nUpDHJaVh+Qg@mail.gmail.com> <51B1B47C.9060009@drees.name> <C86A9758-5BEF-415C-BD17-DC5E757FAA7E@yahoo.com> <51B1E909.2010402@drees.name> <CA+mHimN9=VZu4RRWcnk2F_uMi-+E-LDN2stb1MFNDP+o1R0WSg@mail.gmail.com> <51B1FE6A.80409@drees.name> <CA+mHimNuDwTF96v0PnEvFusCw-KEFT6QF4R9UeZ+8nbETB7oBw@mail.gmail.com>
In-Reply-To: <CA+mHimNuDwTF96v0PnEvFusCw-KEFT6QF4R9UeZ+8nbETB7oBw@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:Bo+OnDd9eJcARKv/d+7UYOovijaiTtyeQAzmbVKZ1x7 F21rG98qxdcpIMYXaBc0+zrimhLzsM/gL5otPU+IV1X+0y2rh2 JRFjVGLopae/z/D+GRw/Ou6Ik5hsDeagM5pW/SdIEVv3A6DmI6 CuCqNWJ0k5+j/i6Ej/qH/Pi9bhVbl9XVlgr7LlGmrZMzVEvu49 PbBnOTVpCESFLTIv0ELSw==
Cc: Vinny A <jsontest@yahoo.com>, Markus Lanthaler <markus.lanthaler@gmx.net>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] The names within an object SHOULD be unique.
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: stefan@drees.name
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Jun 2013 10:58:01 -0000

On 2013-06-08 23:07, Stephen Dolan wrote:
> On Fri, Jun 7, 2013 at 4:38 PM, Stefan Drees <stefan@drees.name> wrote:
>> I do think, that forcing the consumer to "keep" the last parsed (whatever
>> this means from the perspective of the source of the message) name as "the
>> one" is **not** "good", nor "more secure" nor "doesn't break anything in the
>> wild".
>

If TL;DR **please** at least goto "TL;DR", as there is an updated 
proposal. Thanks

> It is more secure, and I'm not aware of anything in the wild that it
> breaks. Perhaps my previous example of a security hole caused by this
> was overly terse, so here's a longer description.
>
> Consider a system that takes requests from the outside world. Each
> request is represented as a JSON object, and must include a "password"
> field and a "command" field.
>
> Some commands (like "call-reception") may be performed by anyone by
> passing a null password. Some commands (like "launch-missiles")
> require a password.
>
> The command-processing system, upon receiving a request, forwards the
> request to the authorization system, which determines whether the
> command is authorized. The attacker sends:
>
>      {"command": "launch-missiles",
>       "password":null,
>       "command": "call-reception"}
>
> The authorization system, using the standard JSON parser from Python
> (or Ruby or PHP or JS or whatever), parses this as having a "command"
> field which says "phone-reception", and authorizes the command.

I think you mean "call-reception", right?

>
> The command-processing system, using the hypothetical parser jsDumb,
> parses this as having a "command" field which says "launch-missiles".
> Since the authorization system has OKed the command, this ends badly.

Really having a hard time trying to imagine a use case, where someone 
came up with such an implementation of "separation of concerns".
But I'd have a good old name for it: "Broken phone" :-)

> Luckily for us, I do not believe jsDumb exists. A reasonably extensive
> survey in another thread has failed to find such a parser. There are
> of course streaming parsers such as Java's Jackson, which return all
> of the keys and values - this is fine, as if the command-processing
> system were to use such a parser it could simply error on requests
> having multiple "command" fields.
>
> Unfortunately, the hypothetical jsDumb is considered a compliant
> parser according to the current RFC, leading to the above security
> hole where two compliant parsers can return entirely contradictory
> results on the same document.

I bet, if such a hypothetical system existed, it would be better of,
when the authorization part yielded a transform as result of the check, 
as that would contain (besides additional transaction relevant data) as 
the only remaining user input:

        {"command":"call-reception"}

so the command-processing system of your plot, would only communicate 
through the authorization component.

If the "channel" would be secured only, anything goes ...

> The issue is not two parsers returning different amounts of
> information: Jackson returns more information about the document than
> JSON.parse, for instance. The issue is two parsers returning *entirely
> contradictory* information about the same document.
>
> It may be that jsDumb already exists and is deployed on millions of
> machines, and JSON is doomed to insecurity forever. I don't think this
> has happened yet - I don't know of any parser with jsDumb's behaviour.
> In that case, the updated RFC should make clear that this behaviour is
> wrong, to prevent future implementations having this hole.

Ahem, the "hole" appears to be two system components playing broken 
phone. The rest of that scenario is quite exchangeable, isn't it?

B.t.w: I have for sure invented far more dumb parsers, than that 
imaginary "jsDumb", trust me I just can't remember which ... ;-)

The RFC should IMO stay within the naturally fuzzy, but nevertheless 
distinguishable realm of being a very general format in use for long and 
by many, thus I would opt for REALLY SHOULD but not for MUST.

For detailed reasons for this please see my updated proposal and the 
relevant notes below (goto "TL;DR").

> Various phrases have been proposed, here's a couple:
>
> [Douglas Crockford]
> If duplicate names are encountered, the parse MAY fail (because some
> implementations correctly do that), or it MAY succeed by accepting the
> last duplicated key:value pair.
>
> [Alan Wirfs-Brock]
> The names within an object SHOULD be unique.  If a key is duplicated,
> a parser MUST take  <<use?? interpret??>> only the last of the
> duplicated key pairs.
>
> [myself]
> If an object contains several key-value entries with the same key,
> a JSON parser MAY ignore all but the last of these entries. A JSON
> parser MUST NOT ignore the last such entry.
>
> None of these disallow streaming parsers or other parsers that keep
> all of the key-value pairs of an object. None of them disallow objects
> with duplicate keys. The only thing disallowed is a parser which
> interprets `{a:1, a:2}` as equivalent to `{a:1}`.

updating my proposal for both the parser and generator sections yields:

TL;DR:

NEW:
"""
4.  Parsers

    A JSON parser transforms JSON text into another representation,
    MUST accept all texts that conform to the JSON grammar and MUST be
    prepared to either accept duplicate names in objects or reject the
    complete JSON text containing these.
    If the JSON parser transforms name/value pairs of JSON objects
    into maps inside the target representation by using the received
    names as keys, then it REALLY SHOULD always yield the values of the
    last encountered occurences of the name/value pairs as to satisfy
    the principle of least surprise.
    A JSON parser MAY accept non-JSON forms or extensions.

    An implementation may set limits on any of the following: the size
    of texts that it accepts, the maximum depth of nesting, the range
    and precision of numbers, and the length and character contents of
    strings.

5.  Generators

    A JSON generator produces JSON text.  The resulting text MUST
    strictly conform to the JSON grammar.

    Generators REALLY SHOULD NOT duplicate names in objects if they can
    avoid or detect such duplication.

"""

Notes:

1. Section 4 proposal part builds upon [1], where I introduced the
    possible number precision constraint as suggested by John Levine.

2. Inserted REALLY to form REALLY SHOULD NOT in generator section last
    paragraph as suggested by Joe Hildebrand in [2]

3. The new inserted text accomodating the "if not keep all, please keep
    last" proposal takes into account, that of course there are nice
    JSON parsers, that separate the transform of JSON to target storage
    reps from the parsing steps of the JSON message itself, by eg.
    providing callbacks (For instance YAJL c.f. [3]) others might offer
    method overrides / decorations or the like.
    Also a parser ignoring a token (or not) is not fully describing
    the problem we want to address, as we mix the scanning, parsing
    (where it would be applicable) with the target storage
    representation selected.
    I used yield, as a parser yields something (storage agnostically
    this remains true).

4. The formulation (as in 3.) also tries to spell out, that of course
    we do not have duplicate key:value pairs or the like (name/value
    pairs would be the correct intra JSON lingo) because we then had no
    problem at all, right :?)
    We have multiple pairs sharing the same name but not necessarily
    a common value.

5. Also (refering to the text as in 3.) we do not need a special recipe
    for the parser based on multiplicity of names, as keeping all or
    keeping the last occurence in the JSON text work for singleton cases
    alike.

6. (ibid) But we need to constrain on a single object (as my proposal
    does)! We really do not want to keep the last in this situation:

	{"foo":{"foo":{"bar":42}},"bar":"baz"}        sample 6.a

    But of course in this:

         {"foo":{"foo":{"bar":42}},"foo":"baz"}        sample 6.b

    where the result (keep last branch) would be undistinguisable from:

         {"foo":"baz"}                                 sample 6.c

7. Hopefully we do not forget, that besides the serialized ordering
    of names in a JSON object, this depends on the ordering the
    generator used. In the current RFC this is completely implementation
    dependent. All hope for at least somehow "dependable ordering" i.e.
    reusing sample 6.a after the insertion a generator input side
    deletion of say the "inner" foo's value setting it to null:

	{"foo":{"foo":null},"bar":"baz"}              sample 7.a

    And not (what would be completely ok):

	{"bar":"baz","foo":{"foo":null}}              sample 7.b

    where inner changes have impact on the outer ordering.


References:

[1]: http://www.ietf.org/mail-archive/web/json/current/msg00688.html
[2]: http://www.ietf.org/mail-archive/web/json/current/msg00428.html
[3]: https://github.com/lloyd/yajl

All the best,
Stefan.