Re: [Json] JSON and int64s - any change in current best practice since I-JSON

Tim Bray <tbray@textuality.com> Tue, 23 January 2024 00:16 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8F419C14CF1D for <json@ietfa.amsl.com>; Mon, 22 Jan 2024 16:16:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kCOZ8o9j7sPi for <json@ietfa.amsl.com>; Mon, 22 Jan 2024 16:15:59 -0800 (PST)
Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2EBDCC18DB9E for <json@ietf.org>; Mon, 22 Jan 2024 16:15:59 -0800 (PST)
Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-55a8fd60af0so2656654a12.1 for <json@ietf.org>; Mon, 22 Jan 2024 16:15:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1705968957; x=1706573757; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hwW/NHQ3pL3pkktmuQNkjfXZUkWK42Sfr3+7pehZISA=; b=BRJCGil+7E4X0z+bC1OxT8XZqq1NwxaeKpC79FITto8p+saK3CJZ9M15oSnKeV8IPk aZyrKeKYnERy0FOqxUsUl2uWURL9wCbT5FQZdzCnYGrEMRoT1nL3q4Qy/yCJUA3rK+uu jyGFXIc9kS6SEqEG9uIO53IH+bZEO+By5XYJY=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705968957; x=1706573757; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hwW/NHQ3pL3pkktmuQNkjfXZUkWK42Sfr3+7pehZISA=; b=iknhqrxIounvbQkPwj3zok5Bhzz/2Y29nRpVet3gt7hvMIppN5tXGJjyy9uYu/fErG 1uBus+Dn1TgJSqe006HIJUsnU2dW6rSACwqIOEhFH3WQfiSr4p547eMpFJgy0Xxn0ABP QcCkEQvaLMPCet9lbsqjMt8Dad62Bkb2aVJZm9bxnQxYoOyRwJ6Tqg/+Zei0/4srOgUn XV6+vL8P2IoJxhehWtplx8q4q2tz02Aep5nyJ+eAe45jSQ1/10Gr8HFzP3MNaqlmQqlK FXygr2t9QaEhUUKOGOtXYqE8PaguvPL0bibTQHcGzOVWJvhZS/NjbCFvcA1aaBTPJNhn /QCQ==
X-Gm-Message-State: AOJu0YwWOszr6PaP71BZGdgY8yVDakPNnpEPB4IGYYL5gUv2P76nhocn bsqjXdGYfbgtB7/B4A6IIleK+0Ur9jwCvZeqwsRs9muF6oWD/PwAFVL5GqCXushDyAIB5rnEuQ8 uiDUAnvvtV3lYr9HFgHR2Vnmf0SC0lrJX0RKfkw==
X-Google-Smtp-Source: AGHT+IHbBKxF90egAg0gCanwvAgeAUOgj/GSvxQqXLH/9YqNa8mhPkci23ec2kElEyk5AUi5gG2NVE1FPrQIyyugvWs=
X-Received: by 2002:aa7:d8cb:0:b0:55c:21e3:6b38 with SMTP id k11-20020aa7d8cb000000b0055c21e36b38mr181355eds.56.1705968956762; Mon, 22 Jan 2024 16:15:56 -0800 (PST)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Mon, 22 Jan 2024 16:15:56 -0800
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Mon, 22 Jan 2024 16:15:53 -0800
MIME-Version: 1.0 (Mimestream 1.2.4)
References: <87527a42-aaac-4f39-b320-05f18a2808c1@codalogic.com> <C31BF4C8-9E6C-48F8-BF7B-D2C379273B3F@tzi.org> <CAHBU6it4SaLawSiBgK9ySkbxjtHE6CX-P3r=hzcVy4ksoQo-Cg@mail.gmail.com> <CAChr6SxHfLW-A1asAndKJz-AiyJv5QP18bi=_bNdKXw7zYHThw@mail.gmail.com> <CAChr6SweYdCWxSABZ7g20Zd-xBFzcK0Ritq53S7WtjSwc-vLmw@mail.gmail.com> <E5A68370-CC2F-4618-AB39-39A382656616@cursive.net> <807fea1b-a22b-4d6b-aa5d-720c9b12023c@codalogic.com> <09233A73-3A6B-4E6F-AEB8-596AC6442E24@cursive.net> <869950DC-647B-4481-AEF8-9E092384E99F@tzi.org> <CBD32B58-8328-4602-89C6-BC2A7A875A0D@cursive.net> <994E2C0A-4AE0-4720-8C67-913BBF033E11@tzi.org> <0BB09B30-B606-44CC-85DC-95A47E485316@cursive.net> <B22EDB2D-0AD1-4582-9191-EFB40E163F19@tzi.org> <F6EB02CA-C240-4FA1-92A8-C5BB883929C7@cursive.net> <29BD1557-59A1-4578-901B-C626ABBE9A78@tzi.org> <B25E10D2-17CF-4B3D-B04B-BABE3A209B90@cursive.net> <6A73993B-B54D-480D-AF79-081EE9D2E1C8@cursive.net>
In-Reply-To: <6A73993B-B54D-480D-AF79-081EE9D2E1C8@cursive.net>
From: Tim Bray <tbray@textuality.com>
Date: Mon, 22 Jan 2024 16:15:56 -0800
Message-ID: <CAHBU6is4qU7yMp+osJQ7VGSwH7meOz6CQ4T+zXwv1bqoGx4uXw@mail.gmail.com>
To: Joe Hildebrand <hildjj@cursive.net>
Cc: "json@ietf.org" <json@ietf.org>, cbor@ietf.org, Carsten Bormann <cabo@tzi.org>
Content-Type: multipart/alternative; boundary="000000000000aa8366060f91d993"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/-53WtYyPATGFl90qDrKcpniy4XI>
Subject: Re: [Json] JSON and int64s - any change in current best practice since I-JSON
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Jan 2024 00:16:03 -0000

 Thanks for doing that. If there were a basis to have this discussion, I
would suggest shortening that list quite a bit.  But I really don’t see any
let’s-fix-JSON energy out there, much and all as it’d be satisfying. I
could easily be wrong but would need to see some evidence.

On Jan 22, 2024 at 3:54:22 PM, Joe Hildebrand <hildjj@cursive.net> wrote:

> I put together a quick requirements doc:
> https://github.com/hildjj/draft-hildebrand-eson-requirements
> This is just a starting point for discussion, any of it can be changed.
>
> —
> Joe Hildebrand
>
> On Jan 18, 2024, at 6:57 PM, Joe Hildebrand <hildjj@cursive.net> wrote:
>
>
> > On Jan 18, 2024, at 8:56 AM, Carsten Bormann <cabo@tzi.org> wrote:
>
> >
>
> > Hi Joe,
>
> >
>
> > thanks for the list!
>
> > I’m CCing cbor@, as we are just finishing the extended diagnostic
> notation (EDN) document, and it is worth considering these points.
>
>
> Context for folks there: on the JSON list, there is a little bit of loose
> talk about a text-based format that would be a superset of JSON (perhaps?),
> would have defined numeric processing, including big integers, perhaps,
> maybe some of the features from JSON5 (https://json5.org/) might make
> commas optional (including trailing commas), etc.
>
>
> >> We would need to rework that grammar to make it suitable for
> interchange:
>
> >
>
> > [context: as a JSON extension]
>
>
> Yes.
>
>
> >> - Remove encoding indicators as mentioned
>
> >
>
> > Right.  When you say “remove”, you mean “do not include in JSON subset”?
>
> > (ABNF doesn’t have the “.feature” feature of CDDL…)
>
>
> I think by "JSON subset" you mean "subset of EDN used to define a text
> format".  That was confusing, because I was thinking "a text format that is
> a superset of JSON, perhaps influenced by EDN".  As such, "subset" and
> "superset" are referring to more or less the same thing.  I'm fine using
> your wording for the moment, with the understanding that the format we're
> talking about is not a subset of JSON, but a subset of EDN.
>
>
> I'm not tied to ABNF, but I don't think CDDL would work to describe the
> syntax of the notation itself.  It could be used to describe information
> encoded in the notation, though.
>
>
> >> - Reference IEEE754 for the 0x1.2p3 format, or remove it
>
> >
>
> > Good point.  Section 5.12.3 of IEEE Std 754-2019?
>
>
> I think so?  I can't find my copy of 754.
>
>
> >> - Remove embedded notation
>
> >
>
> > (I.e., not in JSON subset.)
>
>
> Yes.  This is only needed for expository text, not for interchange.
>
>
> >> - Remove ellipsis processing
>
> >
>
> > (I.e., not in JSON subset.)
>
>
> Yes.  This is only needed for expository text, not for interchange.
>
>
> >> - Discuss whether results should always be an array, as sequence is
> currently top-level
>
> >
>
> > Good question.  I gather the pickup of JSON sequences (RFC 7464) is not
> that great?  EDN is not entirely compatible with that as we don’t do the RS
> character, but do an explicit comma.
>
> > The JSON subset could simply use “item” as the entry point.
>
>
> Maybe I'm misunderstanding the ABNF in the EDN doc, but it looks like the
> top-level rule is `seq`, which is comma-separated.  I'm open to designing a
> streaming form of this notation at the same time, but that's a topic for
> discussion.
>
>
> >> - Discuss adding other JSON5 features
>
> >
>
> > Do you have some examples for what you would like to add?
>
> > What are we missing out for EDN?
>
>
> I like basically everything that JSON5 adds, except maybe escaping
> newlines; I think I'd prefer newlines to just be valid in strings without
> needing to be escaped.  I don't care about explicit `+` before numbers, but
> they don't bother me.  Reproduced here with comments:
>
>
> > Objects
>
> >    Object keys may be an ECMAScript 5.1 IdentifierName.
>
>
> Strong yes.  No need for quoting most keys in a world that has
> /\p{ID_Start}\p{ID_Continue}*/u
>
>
> >    Objects may have a single trailing comma.
>
>
> Strong yes.  I want to have the same coding standard in this format as I
> do for other languages.  I'm fine with all commas being optional for those
> that feel differently.
>
>
> > Arrays
>
> >    Arrays may have a single trailing comma.
>
>
> Yes.  See above.
>
>
> > Strings
>
> >    Strings may be single quoted.
>
>
> Same coding standard argument, plus there are times when I want to use "
> or ' without having to backslash-escape it.
>
>
> >    Strings may span multiple lines by escaping new line characters.
>
>
> This is for:
>
>
> ```json5
>
> "foo\
>
> bar"
>
> ```
>
>
> I'd prefer:
>
>
> ```eson
>
> "foo
>
> bar"
>
> ```
>
>
> >    Strings may include character escapes.
>
>
> This is for "\u{1F4A9}" in addition to "\uD83D\uDCA9", which is what you
> need to do in JSON.
>
>
> > Numbers
>
> >    Numbers may be hexadecimal.
>
>
> Fine.  Particularly nice for config files.  I don't think octal or binary
> is needed, but if someone else feels strongly, it's fine.
>
>
> >    Numbers may have a leading or trailing decimal point.
>
>
> Sure.  As long as we fix everything about numbers at least as well as
> I-JSON.
>
>
> >    Numbers may be IEEE 754 positive infinity, negative infinity, and NaN.
>
>
> Yes.
>
>
> >    Numbers may begin with an explicit plus sign.
>
>
> Shrug.
>
>
> > Comments
>
> >    Single and multi-line comments are allowed.
>
>
> Strong yes, and possibly the reason to do the work in the first place.
> Absolutely required for config files.
>
>
> > White Space
>
> >    Additional white space characters are allowed.
>
>
> Not important to me, but I don't speak any of the languages whose
> whitespace is now legal.  May as well include everything in Zs for
> future-proofing.
>
>
> >> We wouldn't get Tim's desired property of one character look-ahead.
>
> >
>
> > On constrained systems, you could always use CBOR…
>
>
> Tim can make his own argument here.  I think I'm on record as believing
> that CBOR is a good thing.
>
>
> >> Other than that, it's as good a place to start as anywhere else.
>
> >
>
> > Indeed.
>
> > I could imagine we finish the EDN-literal document for its CBOR target
> audience, and then write another document about the JSON subset (which
> would be less of a delta and more of a free-standing specification that is
> just anchored in the full version for CBOR).
>
>
> I think "inspired by" is as far as I'd commit to at the moment, until we
> have some agreement on requirements.
>
>
> > I just noticed that we do have a required comma between items in arrays
> and sequences.  I just confused this with CDDL (which I’m prone to do),
> where that comma is not required.  Leaving off the comma conflicts with
> implicit string concatenation (which is one way to address the 2D string
> problem).  So maybe there is a bit more work to do...
>
>
> I don't think this format needs string concat.
>
>
> --
>
> Joe Hildebrand
>
>
>
>