Re: [Json] JSON Schema Language: extensibility and unspecified properties

Ulysse Carion <ulysse@segment.com> Sun, 18 August 2019 01:56 UTC

Return-Path: <ulysse@segment.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 847EE12007C for <json@ietfa.amsl.com>; Sat, 17 Aug 2019 18:56:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=segment.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GVmQ5GyLvEPr for <json@ietfa.amsl.com>; Sat, 17 Aug 2019 18:55:58 -0700 (PDT)
Received: from mail-io1-xd2f.google.com (mail-io1-xd2f.google.com [IPv6:2607:f8b0:4864:20::d2f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 04AEF12006D for <json@ietf.org>; Sat, 17 Aug 2019 18:55:57 -0700 (PDT)
Received: by mail-io1-xd2f.google.com with SMTP id o9so13863124iom.3 for <json@ietf.org>; Sat, 17 Aug 2019 18:55:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=segment.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=HnmbsLZpddM/EtJ0W5xAs2MkeqetOPAbMfi545t3YLI=; b=P8rRxWNHcA7y/JAZ20wfhV69h0IXJSA9WAbD/XfxxCPtug4P7Cou1SBn12MpnKhNIA YgyD9LzVZG3omphWLKlGBHrJJlntaAxyOy8YS/Q7aaAKbCcfjVhAL6iw45jcs9eswNp0 2XFhAAgsIiM4dz4zbIXpPyY8uDiXPsCBYhv6w=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=HnmbsLZpddM/EtJ0W5xAs2MkeqetOPAbMfi545t3YLI=; b=gnGzZSut9ZG2ZGPbqb1KCmFFLI17QUo0tEDhG75w8VBVVuk8I3VnRjj6ip3jHZRXYB RO0KoMzRmkINP6Fd/FrO5pTchpA61JlI1GaUptyij8dAT8fYXERIu1Yl2gWOifz9poLH bUgfElZ/aBiKM0alfgii8AP2Df5GGJFaAsITq82ZcVlkJ/uibPleYmTWcCQViJXj14C4 fLgul2H+y5zISiFvJn23srtjEnKOEHfbIhkS7jiQ0oW4HH5miEoxGicBxnWr6wGUE2yk eQXcwuaMqAsi0F8a3VCBsykQpPHSHzX8u274onicOBnOPXJ87Do1VQtik7bNPv137wzB 4+bg==
X-Gm-Message-State: APjAAAVrl0sv2DGCT2owEgj8+vhQSrfZD3P2f2eeHv8L/wQJjEHWTrNp ipL6RaBM+oxWwaduK15SItWVJfetWn6hOGvJ+vXt8w==
X-Google-Smtp-Source: APXvYqwAOKbqH3LhP02YGC26NfUjnKyV97RpPvCCafMQCQBHf/nln285l7rdBIhhRADhDLJhDChfDULk/1R5lVQpG/4=
X-Received: by 2002:a6b:ce19:: with SMTP id p25mr19016233iob.201.1566093357160; Sat, 17 Aug 2019 18:55:57 -0700 (PDT)
MIME-Version: 1.0
References: <CAJK=1RhXp85cz-pOAQPw2JM=CYHgGSygj4Hw0spht56jbzQE2g@mail.gmail.com> <53094378-B559-49E1-B42B-54FBA8BC35AA@tzi.org> <CAJK=1Rj6Q3CvpF9aYML=47SF_XP49=O2hLhcBo8gZCb73C0RAw@mail.gmail.com> <FDB93E41-9D7D-4BF2-8D01-F4D075774848@tzi.org> <CAJK=1RiE_+nHkeB77DericN498w1v9mf2hsBgnQtgsZTVM9N9A@mail.gmail.com> <118F844A-453D-497D-8107-CF2BD05AC313@tzi.org> <CAJK=1Rgqek+rh+dj2xNWD7WKS48oQoHiqhj5dDT2D3dD7OZs1Q@mail.gmail.com>
In-Reply-To: <CAJK=1Rgqek+rh+dj2xNWD7WKS48oQoHiqhj5dDT2D3dD7OZs1Q@mail.gmail.com>
From: Ulysse Carion <ulysse@segment.com>
Date: Sat, 17 Aug 2019 18:55:46 -0700
Message-ID: <CAJK=1RjTwDN5vS6fi=GjQ5TwyGYMMRuj4f=Fz+Xhe4eQB0RY+Q@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: JSON WG <json@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/StbJ5n-0UxHZ4EpV_mOy_Rg3EDY>
Subject: Re: [Json] JSON Schema Language: extensibility and unspecified properties
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Aug 2019 01:56:01 -0000

Also: from off-list discussion, and from talking with folks in
industry about their experiences taking legacy systems and
retrofitting schemas onto them, I think the spec's current approach to
"unexpected/unspecified/additional" properties is inadequate.

In the current draft, whether to tolerate properties not mentioned in
a schema is a global option on the schema. This is not good enough,
because oftentimes folks only progressively lock down parts of their
schema -- some parts are better documented and can be locked down, but
other parts may be inscrutable or just too risky to lock down.

So I'm thinking, in the next draft, for schemas -- be they root
schemas or sub-schemas -- to be able to specify whether they'll allow
properties outside those mentioned in properties/optionalProperties.
Also, since "strict" is super vague, I'll rename it
"additionalProperties". It'll still take true/false, but now the
default is "false" and the value's meaning is flipped.

On Sat, Aug 17, 2019 at 6:34 PM Ulysse Carion <ulysse@segment.com> wrote:
>
> Carsten, I think you are bringing up an important point. I should
> clarify what is and isn't a type in the spec, and what the principle
> is for deciding that.
>
> I'd like to add a brief discussion of that reasoning in the
> introduction of the next draft. Does this principle make sense to
> folks here?
>
> "Only include in the spec things which are (1) commonly (and portably)
> used in the JSON ecosystem, and (2) which have a clear mapping to
> programming languages in widespread use today."
>
> By that standard, int64/uint64 fails (1), as it's unportable (i.e.,
> not I-JSON) or implemented in a bunch mutually incompatible methods.
> int53 fails (2) because no mainstream language I know of has support
> for integers strictly up to 2^53.
>
> I disagree with you about uint8. I think we agree it satisfies (1); it
> seems we disagree on whether it passes (2)? Perhaps you were thinking
> of Java's strictly-signed "byte". If so: by convention, Java folks
> ignore the signed-ness of byte when uint8 is needed. This is perhaps a
> bit sketchy (since you can so easily mix and match signed and unsigned
> 8-bit numbers), but it has gotten the job done in many applications.
> So the spec's uint8 and int8 both correspond to Java's byte, in my
> view.
>
> On the principle above, I think I should remove {"type": "number"}
> from the spec. It's unclear what "number" means. Does it mean
> BigDecimal? If so, it's not portable and so fails (1). Does it mean
> float64? Then just use {"type": "float64"} instead.
>
> On Mon, Aug 12, 2019 at 10:31 PM Carsten Bormann <cabo@tzi.org> wrote:
> >
> > On Aug 13, 2019, at 07:03, Ulysse Carion <ulysse@segment.com> wrote:
> > >
> > >> However, it seems bizarre to support int8, int16, and int32, but not JSON’s generic interoperable integers.
> > >
> > > Supposing we add int53 to JSL, do you picture code generators
> > > producing int64_t/long for {"type": "int53" }? Does that mean that,
> > > before serializing an int64_t to something marked as int53 in JSL, the
> > > application must first do an extra bounds check? Today, it's fairly
> > > easy to generate code from JSL where serializing is an infallible
> > > affair. But with int53, that property would be lost, because most type
> > > systems cannot express int53.
> >
> > Many type systems don’t have a uint8, either.
> >
> > > I'm inclined to think this is something better handled by extensions.
> > > Perhaps someone can define a "intt53" property to do something like:
> > >
> > > { "type": "number", "int53": true }
> > >
> > > The person writing the extension would document that the "int53"
> > > property indicates whether a number is meant to represent a number in
> > > the I-JSON range. Applications which don't understand this keyword
> > > will still do something reasonable -- validate for some sort of
> > > number, and code generate a "double" -- but applications which care
> > > about this can handle this case specially. It also signals an intent
> > > about how the number will be used.
> >
> > Again, the same is true for uint8.
> >
> > (This opens the whole “types vs. subtypes” discussion.)
> >
> > > I expect this sort of approach is how JSL may need to handle
> > > big-number encoding schemes. There are so many different ways approach
> > > that problem, and I think JSL is most useful if it establishes an
> > > uncontroversial foundation, and then lets additional, out-of-spec
> > > keywords tighten the schema further in a way which most folks don't
> > > care about, and hence don't want to have to implement.
> > >
> > > Does that seem like a reasonable approach?
> > >
> > >> Actually, that’s something JSON can’t deal with.
> > >
> > > By this you mean that JSON prescribes "all numbers are doubles", and
> > > so integers aren't really a good thing to try to foist onto JSON's
> > > syntax?
> >
> > Integer is a mathematical concept.  JSON does not have a problem with that.
> >
> > The problem comes in when applications arbitrarily restrict the syntax.
> >
> > Not allowing a fractional or exponent part is akin to requiring two spaces of indentation before or a newline after a number.
> >
> > It gets worse when syntax variations are assigned different application semantics.
> > (E.g., I’m aware of at least one “JSON-based” syntax where a string that starts with “\u0073” has semantics different from a string that starts with “s”.)
> >
> > You can’t do that to a format like JSON and still be part of the JSON ecosystem, because generic JSON decoders discard these syntax features, and generic JSON encoders generally can’t produce them.
> >
> > >> MS-Excel has repeatedly taught me that my university phone number is 2.1863921e7, but people still manage to call me :-)
> > >
> > > Could you expand what you mean here? I realize no joke is funny enough
> > > to survive explanation, but I’m afraid I'm perhaps missing your point.
> >
> > Well, my phone number is 21863921, but some spreadsheets consider this a large number and turn it into an NR3 number…
> >
> > Grüße, Carsten
> >