Re: [Json] JSON Schema Language: extensibility and unspecified properties

Ulysse Carion <ulysse@segment.com> Sun, 18 August 2019 01:35 UTC

Return-Path: <ulysse@segment.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D71A21200EC for <json@ietfa.amsl.com>; Sat, 17 Aug 2019 18:35:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=segment.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZQeFzymckgUL for <json@ietfa.amsl.com>; Sat, 17 Aug 2019 18:35:10 -0700 (PDT)
Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CCF61200E5 for <json@ietf.org>; Sat, 17 Aug 2019 18:35:10 -0700 (PDT)
Received: by mail-io1-xd41.google.com with SMTP id e20so13803545iob.9 for <json@ietf.org>; Sat, 17 Aug 2019 18:35:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=segment.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=14XTRiRLToPbcQs6Xsp2y2+CBiN5m6EM7t/eYU5wxUY=; b=CEYyS/rYZqCGnIXmaNi//3CxkndZO0p4BPu7dysUbJJXQvskvLEMOmGGL5sr87/SXI jGeCxjVRlqH+0bRvhNRbeZJrSlaQoaToflljbrf+4oQvDhugDx9nIMN+AtNSlnY3chdy y3sH5At6sm52fGi9UcjOR32opdxkcyjpY/F6A=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=14XTRiRLToPbcQs6Xsp2y2+CBiN5m6EM7t/eYU5wxUY=; b=pEdwnPUQKbQejzdGmjAArbHa3CsM/XqUSUYEmkd0bwUZPvaKbFtTrSY/Wc16YBNQ20 H98rYKloFiRxiV8rZsnLcCO01xbTwif0n8GXuXDo44eqLZdveWYjJlI+q9XvRIImPJX1 wc8Z3jprJygnEYINsMMPzKd7rd8HaaTduOYzTHXRK7dRuFIMhg8cuv7UWAMyYOXLiXcT G7OTGm4kF2EM0YSMvCCTHK0ZKtHVcJ6w+dk5lUUDp7QTMJbzJPbKiPhRs0ei8u4vtSDr eDrCRgk2wxb3mT+RZJkk0W7gGqQZM2rRSepHKpgLV1f4gYHsO4cq4wb5TTYOo2YAzDhB bLgQ==
X-Gm-Message-State: APjAAAWhj5qusX40TBigCKqYk6X8LokIVsil4uQdq0pnDyaD637KA2En sYaGuwOhoBUHoNvI79cl+JJJpWX8+V5bEIsfujde5GMa
X-Google-Smtp-Source: APXvYqyihm4WH9W7Gv1bmugtLVx0DQwQKPQ1/tPMEMIvxAcGC1n6ZtxtscVxcGTBQ++Ia8dyELoVO4JQ/vVL3DBLmh4=
X-Received: by 2002:a6b:630b:: with SMTP id p11mr8389781iog.284.1566092109574; Sat, 17 Aug 2019 18:35:09 -0700 (PDT)
MIME-Version: 1.0
References: <CAJK=1RhXp85cz-pOAQPw2JM=CYHgGSygj4Hw0spht56jbzQE2g@mail.gmail.com> <53094378-B559-49E1-B42B-54FBA8BC35AA@tzi.org> <CAJK=1Rj6Q3CvpF9aYML=47SF_XP49=O2hLhcBo8gZCb73C0RAw@mail.gmail.com> <FDB93E41-9D7D-4BF2-8D01-F4D075774848@tzi.org> <CAJK=1RiE_+nHkeB77DericN498w1v9mf2hsBgnQtgsZTVM9N9A@mail.gmail.com> <118F844A-453D-497D-8107-CF2BD05AC313@tzi.org>
In-Reply-To: <118F844A-453D-497D-8107-CF2BD05AC313@tzi.org>
From: Ulysse Carion <ulysse@segment.com>
Date: Sat, 17 Aug 2019 18:34:58 -0700
Message-ID: <CAJK=1Rgqek+rh+dj2xNWD7WKS48oQoHiqhj5dDT2D3dD7OZs1Q@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: JSON WG <json@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/WwOspaHIh2uVPbNySPDdR6XE3xA>
Subject: Re: [Json] JSON Schema Language: extensibility and unspecified properties
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Aug 2019 01:35:13 -0000

Carsten, I think you are bringing up an important point. I should
clarify what is and isn't a type in the spec, and what the principle
is for deciding that.

I'd like to add a brief discussion of that reasoning in the
introduction of the next draft. Does this principle make sense to
folks here?

"Only include in the spec things which are (1) commonly (and portably)
used in the JSON ecosystem, and (2) which have a clear mapping to
programming languages in widespread use today."

By that standard, int64/uint64 fails (1), as it's unportable (i.e.,
not I-JSON) or implemented in a bunch mutually incompatible methods.
int53 fails (2) because no mainstream language I know of has support
for integers strictly up to 2^53.

I disagree with you about uint8. I think we agree it satisfies (1); it
seems we disagree on whether it passes (2)? Perhaps you were thinking
of Java's strictly-signed "byte". If so: by convention, Java folks
ignore the signed-ness of byte when uint8 is needed. This is perhaps a
bit sketchy (since you can so easily mix and match signed and unsigned
8-bit numbers), but it has gotten the job done in many applications.
So the spec's uint8 and int8 both correspond to Java's byte, in my
view.

On the principle above, I think I should remove {"type": "number"}
from the spec. It's unclear what "number" means. Does it mean
BigDecimal? If so, it's not portable and so fails (1). Does it mean
float64? Then just use {"type": "float64"} instead.

On Mon, Aug 12, 2019 at 10:31 PM Carsten Bormann <cabo@tzi.org> wrote:
>
> On Aug 13, 2019, at 07:03, Ulysse Carion <ulysse@segment.com> wrote:
> >
> >> However, it seems bizarre to support int8, int16, and int32, but not JSON’s generic interoperable integers.
> >
> > Supposing we add int53 to JSL, do you picture code generators
> > producing int64_t/long for {"type": "int53" }? Does that mean that,
> > before serializing an int64_t to something marked as int53 in JSL, the
> > application must first do an extra bounds check? Today, it's fairly
> > easy to generate code from JSL where serializing is an infallible
> > affair. But with int53, that property would be lost, because most type
> > systems cannot express int53.
>
> Many type systems don’t have a uint8, either.
>
> > I'm inclined to think this is something better handled by extensions.
> > Perhaps someone can define a "intt53" property to do something like:
> >
> > { "type": "number", "int53": true }
> >
> > The person writing the extension would document that the "int53"
> > property indicates whether a number is meant to represent a number in
> > the I-JSON range. Applications which don't understand this keyword
> > will still do something reasonable -- validate for some sort of
> > number, and code generate a "double" -- but applications which care
> > about this can handle this case specially. It also signals an intent
> > about how the number will be used.
>
> Again, the same is true for uint8.
>
> (This opens the whole “types vs. subtypes” discussion.)
>
> > I expect this sort of approach is how JSL may need to handle
> > big-number encoding schemes. There are so many different ways approach
> > that problem, and I think JSL is most useful if it establishes an
> > uncontroversial foundation, and then lets additional, out-of-spec
> > keywords tighten the schema further in a way which most folks don't
> > care about, and hence don't want to have to implement.
> >
> > Does that seem like a reasonable approach?
> >
> >> Actually, that’s something JSON can’t deal with.
> >
> > By this you mean that JSON prescribes "all numbers are doubles", and
> > so integers aren't really a good thing to try to foist onto JSON's
> > syntax?
>
> Integer is a mathematical concept.  JSON does not have a problem with that.
>
> The problem comes in when applications arbitrarily restrict the syntax.
>
> Not allowing a fractional or exponent part is akin to requiring two spaces of indentation before or a newline after a number.
>
> It gets worse when syntax variations are assigned different application semantics.
> (E.g., I’m aware of at least one “JSON-based” syntax where a string that starts with “\u0073” has semantics different from a string that starts with “s”.)
>
> You can’t do that to a format like JSON and still be part of the JSON ecosystem, because generic JSON decoders discard these syntax features, and generic JSON encoders generally can’t produce them.
>
> >> MS-Excel has repeatedly taught me that my university phone number is 2.1863921e7, but people still manage to call me :-)
> >
> > Could you expand what you mean here? I realize no joke is funny enough
> > to survive explanation, but I’m afraid I'm perhaps missing your point.
>
> Well, my phone number is 21863921, but some spreadsheets consider this a large number and turn it into an NR3 number…
>
> Grüße, Carsten
>