Re: [Json] Limitations on number size?

Tatu Saloranta <tsaloranta@gmail.com> Wed, 10 July 2013 18:55 UTC

Return-Path: <tsaloranta@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5512921F8411 for <json@ietfa.amsl.com>; Wed, 10 Jul 2013 11:55:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.493
X-Spam-Level:
X-Spam-Status: No, score=-2.493 tagged_above=-999 required=5 tests=[AWL=0.106, BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7yyUpTZP1885 for <json@ietfa.amsl.com>; Wed, 10 Jul 2013 11:55:56 -0700 (PDT)
Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) by ietfa.amsl.com (Postfix) with ESMTP id E880421F853A for <json@ietf.org>; Wed, 10 Jul 2013 11:55:42 -0700 (PDT)
Received: by mail-wg0-f52.google.com with SMTP id b12so6314587wgh.7 for <json@ietf.org>; Wed, 10 Jul 2013 11:55:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=b/+mCILCv+yF+LLJCfKZ96pOhUqPqV7Pk3NRVT+LMnM=; b=cUVFV4LtUu6OGEyOby/Imvkz/DAq1YFo9OKWlovLBleNPr1b+v/ZPH+VahSB1pXtyK C+LB8QFkt/1dq+vAEfZInn/MHLJg/Wzn1uFiicv5FNHJmQ6zFxDr7uZF4MQuzJbp04bp sVwPFelBTDC5RqiYKB2S4qnQMOaSVmyp5rUu5ZN6kr+H/qEYYkwSAoWux22ulwdvcuzc 2w7I50XR8X2t5/oIOcObJLvl1MUBrm2jYkrXgsVgyON4mu4OAmNClyuZ7LLzjBKjZp0F CJnN7nHglc67hW7Urah6IbNUagZ4R2hdrW5fPPq/cONKICslNfxhvm8zFuX3YtmL6gMj PYoQ==
MIME-Version: 1.0
X-Received: by 10.180.206.70 with SMTP id lm6mr35712190wic.50.1373482541219; Wed, 10 Jul 2013 11:55:41 -0700 (PDT)
Received: by 10.227.34.199 with HTTP; Wed, 10 Jul 2013 11:55:41 -0700 (PDT)
In-Reply-To: <51DD3248.3020008@gmail.com>
References: <51DC0F95.7010407@gmail.com> <hf8ot8hnpa93pi3t54c4d5qcc3p5tnb3ca@hive.bjoern.hoehrmann.de> <CAK3OfOgTNaLpRthrRcU4Bo+3z1aXUOOn0Ord7RBPN8z6TtiiWw@mail.gmail.com> <51DC7F87.6060503@gmail.com> <CAGrxA24v5L7oCGxEOwecJSLCNiLrSWSt=jFJMA0M9E8fztNLag@mail.gmail.com> <51DC95B2.8080801@gmail.com> <20130709231139.GC8043@gmail.com> <51DCA042.4000303@gmail.com> <CAKd4nAjHE8_4hWMG7jSzv=_VsoKb-cqNdX4CR+6R-p1WkQnDTQ@mail.gmail.com> <51DD3248.3020008@gmail.com>
Date: Wed, 10 Jul 2013 11:55:41 -0700
Message-ID: <CAGrxA26wGChoR+w57JGFgOg+tf5+eRNoEVidSMepd3nUN5+QPA@mail.gmail.com>
From: Tatu Saloranta <tsaloranta@gmail.com>
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Content-Type: multipart/alternative; boundary=001a11c265bef9834b04e12cd219
Cc: Stephan Beal <sgbeal@googlemail.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Limitations on number size?
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jul 2013 18:55:57 -0000

On Wed, Jul 10, 2013 at 3:07 AM, Peter F. Patel-Schneider <
pfpschneider@gmail.com> wrote:

>
> On 07/10/2013 02:47 AM, Stephan Beal wrote:
>
>  On Wed, Jul 10, 2013 at 1:44 AM, Peter F. Patel-Schneider <
>> pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
>>
>>     That's a very unhappy situation.   My interest in JSON is to consume
>>     data in JSON documents (mostly to use as input into representation
>>     systems that also use the W3C semantic web languages RDF and OWL). If
>>     JSON is ambiguous (e.g., as to whether 0.0 and 0 encode/stand
>>     for/represent the same thing) then JSON isn't very suitable for
>>     transmitting data, at least for me.
>>
>>
>> While i think we will all agree that, at a technically pedantic level,
>> you're absolutely right, JSON has been in heavy use for about 10(?) years
>> now with _relatively_ few instances of this causing a problem.
>>
>
> Relatively is one of these weasel words that we all use.  I certainly
> agree that JSON is useful for transmitting certain kinds of data.
>

Same can be said about any data formats in wide usage.

This whole discussion could go on about XML as well: XML does not define
any numeric types at format level. Additional specifications (schema, specs
that require one of schema languages) have been defined to handle mapping
of potentially unbound numbers. When you think of JSON as counter-part to
XML specification, you may be less surprised to see physical limitations --
those may well belong to another layer. XML Schema has wide variety of
numeric types as well, including unlimited precision ones.


>
> However, the working group mailing archives contain evidence that there
> are indeed significant problems when using JSON to portably interchange
> data, particularly  binary data.


Why so? Base64 encoding is the de-facto standard for this, and very widely
used.
Of all things discussed, this seems among lesser problems.

And yes, even for transferring large amounts of binary data it is possible
to add incremental (.... streaming) processing. I use this for
synchronizing gigabyte sized blocks on a distributed storage system
(although usually using binary encoding of JSON called Smile: but
functionally JSON also works and is used for system tests).



>
>
>  Implementors tend to use whatever default limits the platform provides
>> (e.g. 32-bit on 32-bit platforms and 64 on 64-bit, and 6-digit precision in
>> doubles seems to be conventional in C libraries).
>> People using high-precision/very large/very small numbers are certainly
>> aware of the limitations/portability problems, and will (possibly after
>> falling on their face with JSON) pick a different format.
>>
>
> Are they really aware of all the potential problems?   And just what
> counts as high-precision/very large/very small?  Does 0. belong to any of
> these categories?
>
>
>  That's all fine and good - i haven't seen anyone here argue that JSON
>> needs to be _the_ data format. It needs to be a _useful_ format for a wide
>> range of applications, and it is that even if it's hard-coded to be limited
>> to 31-bit integer ranges. In my implementations i have had to be very aware
>> of system-level precision limits, but i simply document them, add build
>> options to use, e.g. 64-bit integers if available, and leave it at that.
>> Those details fall comfortably into the normal range of "implementation
>> defined" details, IMO, and do _not_ (IMO) fall into JSON's realm of
>> authority (JSON just needs to tell me the BNF for reading a number, though
>> one could argue that the BNF should/does also imply certain limits). It
>> would be impossible to enforce that arbitrary implementations must support
>> arbitrarily long numbers, just as it would be silly to arbitrarily limit
>> JSON to, say, 20-bit precision.
>>
>> Using your case of 0 vs 0.0. The vast, vast majority of JSON consumers
>> are JavaScript, and JS doesn't differentiate between doubles and integers,
>> so 0 is, in effect equivalent to 0.0. In fact, there are few real-world
>> applications using JSON where the two are _not_ equivalent (barring
>> scientific, high-precision, math-centric apps, of course, and those should
>> probably be looking for a different format which guarantees them their
>> desired ranges/limits).
>>
>
> I would appreciate some evidence to back up the claim that the vast, vast
> majority of JSON is handled in an environment where the JSON numbers 0 and
> 0.0 do indeed represent the same thing. The RDF W3C workiing group is in
> the last stages of putting its stamp of approval on JSON-LD, which presents
> the JSON numbers 0 and 0.0 to RDF as being
>

I would rather argue that cases where equality matters are minority use
cases.
So whether they are equal at format level is completely irrelevant.

You seem to be implying that this is a common concern -- I do not buy that
claim.
It does exist, but it is much more favored by theoretical thinkers than
actual system implementors.

-+ Tatu +-