Re: [Json] JSON and int64s - any change in current best practice since I-JSON

Joe Hildebrand <hildjj@cursive.net> Fri, 19 January 2024 01:57 UTC

Return-Path: <hildjj@cursive.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5D3E8C14F749 for <json@ietfa.amsl.com>; Thu, 18 Jan 2024 17:57:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cursive.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PxYH3H1cUZnc for <json@ietfa.amsl.com>; Thu, 18 Jan 2024 17:57:13 -0800 (PST)
Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 347E2C151064 for <json@ietf.org>; Thu, 18 Jan 2024 17:57:12 -0800 (PST)
Received: by mail-qv1-xf35.google.com with SMTP id 6a1803df08f44-6818a9fe380so1813956d6.2 for <json@ietf.org>; Thu, 18 Jan 2024 17:57:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cursive.net; s=google; t=1705629431; x=1706234231; darn=ietf.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ypTvn8LY90rJriOmvWIWOwKMu1XgpUgjPbU4R2v47Fc=; b=JrnzwIRJpSH093D2vOAVTcv1Ghmat+a5nQu/OujcUhlqmDh/40s+TG+Ht70Ye+KK2l P2UR+kD5aoZi22RfVeqoZyWEoJzaS29dwys99jnRaxESNDqAfsI4pssVXPxRHkEoub9z cUtVJgsj+I9O1hqGxEuWaA+feEO+RvpkbV1k0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705629431; x=1706234231; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ypTvn8LY90rJriOmvWIWOwKMu1XgpUgjPbU4R2v47Fc=; b=own/R1BUeVCSFoHFXyWxbxf4bXZWBaXDvQb7dFq/z7deh0n3M0+DHI082Dg9b3sH4T VkM5f3r2Qh0aQiphi3CgbaYHTqzgUs0A37aiqAVzMFjSEeIkFqicwe29x/PzAziqzwpy j9OkYcsNNGDaneHoX3RA8+qvn3CAlEuD/qGafcrLIU7lq5goSmmAt7ykQdeGQnY+SE2D L5o2D1wo85AhbxHuMpHkcDrHT26wt76m6u1ObbZa5oC0y9e9uJnw2JRe04Nh2lp3cCC+ 4DyE+XSlZyvaxBgtTopjlrpp+iFMqr/mHtl6eUFKSxEa6IqobZhAOKkY4NbYkwij5tKz glfg==
X-Gm-Message-State: AOJu0Yyk//TED6wbm9pyIlO571yMNkIwV6vK21xenvXphHjXls0PCMDn Al92re38hAzW4stah4PbUBvl9r+Xtx/QF1WIkif5BuZnZXqCpD5f5QCBUE6jQQ==
X-Google-Smtp-Source: AGHT+IGhxjxbEstsY5VEDl6QMK6YXPcpGzqh2ZXUGHFp8nH8VsJFmV8RZFlJb9Km5OBhjvJ0W69aaA==
X-Received: by 2002:ad4:5d65:0:b0:680:f9a2:4c3e with SMTP id fn5-20020ad45d65000000b00680f9a24c3emr2312632qvb.38.1705629431686; Thu, 18 Jan 2024 17:57:11 -0800 (PST)
Received: from smtpclient.apple (pool-71-163-33-223.washdc.fios.verizon.net. [71.163.33.223]) by smtp.gmail.com with ESMTPSA id kr26-20020a0562142b9a00b00681876d15bbsm1409239qvb.49.2024.01.18.17.57.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2024 17:57:10 -0800 (PST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.300.61.1.2\))
From: Joe Hildebrand <hildjj@cursive.net>
In-Reply-To: <29BD1557-59A1-4578-901B-C626ABBE9A78@tzi.org>
Date: Thu, 18 Jan 2024 20:57:00 -0500
Cc: "json@ietf.org" <json@ietf.org>, cbor@ietf.org, Tim Bray <tbray@textuality.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <B25E10D2-17CF-4B3D-B04B-BABE3A209B90@cursive.net>
References: <87527a42-aaac-4f39-b320-05f18a2808c1@codalogic.com> <C31BF4C8-9E6C-48F8-BF7B-D2C379273B3F@tzi.org> <CAHBU6it4SaLawSiBgK9ySkbxjtHE6CX-P3r=hzcVy4ksoQo-Cg@mail.gmail.com> <CAChr6SxHfLW-A1asAndKJz-AiyJv5QP18bi=_bNdKXw7zYHThw@mail.gmail.com> <CAChr6SweYdCWxSABZ7g20Zd-xBFzcK0Ritq53S7WtjSwc-vLmw@mail.gmail.com> <E5A68370-CC2F-4618-AB39-39A382656616@cursive.net> <807fea1b-a22b-4d6b-aa5d-720c9b12023c@codalogic.com> <09233A73-3A6B-4E6F-AEB8-596AC6442E24@cursive.net> <869950DC-647B-4481-AEF8-9E092384E99F@tzi.org> <CBD32B58-8328-4602-89C6-BC2A7A875A0D@cursive.net> <994E2C0A-4AE0-4720-8C67-913BBF033E11@tzi.org> <0BB09B30-B606-44CC-85DC-95A47E485316@cursive.net> <B22EDB2D-0AD1-4582-9191-EFB40E163F19@tzi.org> <F6EB02CA-C240-4FA1-92A8-C5BB883929C7@cursive.net> <29BD1557-59A1-4578-901B-C626ABBE9A78@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3774.300.61.1.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/JIMpOfyX63KgLdJbB-Uci6lqFgI>
Subject: Re: [Json] JSON and int64s - any change in current best practice since I-JSON
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jan 2024 01:57:17 -0000

> On Jan 18, 2024, at 8:56 AM, Carsten Bormann <cabo@tzi.org> wrote:
> 
> Hi Joe,
> 
> thanks for the list!
> I’m CCing cbor@, as we are just finishing the extended diagnostic notation (EDN) document, and it is worth considering these points.

Context for folks there: on the JSON list, there is a little bit of loose talk about a text-based format that would be a superset of JSON (perhaps?), would have defined numeric processing, including big integers, perhaps, maybe some of the features from JSON5 (https://json5.org/), might make commas optional (including trailing commas), etc.

>> We would need to rework that grammar to make it suitable for interchange:
> 
> [context: as a JSON extension]

Yes.

>> - Remove encoding indicators as mentioned
> 
> Right.  When you say “remove”, you mean “do not include in JSON subset”?
> (ABNF doesn’t have the “.feature” feature of CDDL…)

I think by "JSON subset" you mean "subset of EDN used to define a text format".  That was confusing, because I was thinking "a text format that is a superset of JSON, perhaps influenced by EDN".  As such, "subset" and "superset" are referring to more or less the same thing.  I'm fine using your wording for the moment, with the understanding that the format we're talking about is not a subset of JSON, but a subset of EDN.

I'm not tied to ABNF, but I don't think CDDL would work to describe the syntax of the notation itself.  It could be used to describe information encoded in the notation, though.

>> - Reference IEEE754 for the 0x1.2p3 format, or remove it
> 
> Good point.  Section 5.12.3 of IEEE Std 754-2019?

I think so?  I can't find my copy of 754.

>> - Remove embedded notation
> 
> (I.e., not in JSON subset.)

Yes.  This is only needed for expository text, not for interchange.

>> - Remove ellipsis processing
> 
> (I.e., not in JSON subset.)

Yes.  This is only needed for expository text, not for interchange.

>> - Discuss whether results should always be an array, as sequence is currently top-level
> 
> Good question.  I gather the pickup of JSON sequences (RFC 7464) is not that great?  EDN is not entirely compatible with that as we don’t do the RS character, but do an explicit comma.
> The JSON subset could simply use “item” as the entry point.

Maybe I'm misunderstanding the ABNF in the EDN doc, but it looks like the top-level rule is `seq`, which is comma-separated.  I'm open to designing a streaming form of this notation at the same time, but that's a topic for discussion.

>> - Discuss adding other JSON5 features
> 
> Do you have some examples for what you would like to add?
> What are we missing out for EDN?

I like basically everything that JSON5 adds, except maybe escaping newlines; I think I'd prefer newlines to just be valid in strings without needing to be escaped.  I don't care about explicit `+` before numbers, but they don't bother me.  Reproduced here with comments:

> Objects
>     Object keys may be an ECMAScript 5.1 IdentifierName. 

Strong yes.  No need for quoting most keys in a world that has /\p{ID_Start}\p{ID_Continue}*/u

>     Objects may have a single trailing comma.

Strong yes.  I want to have the same coding standard in this format as I do for other languages.  I'm fine with all commas being optional for those that feel differently.

> Arrays
>     Arrays may have a single trailing comma.

Yes.  See above.

> Strings
>     Strings may be single quoted.

Same coding standard argument, plus there are times when I want to use " or ' without having to backslash-escape it.

>     Strings may span multiple lines by escaping new line characters.

This is for:

```json5
"foo\
bar"
```

I'd prefer:

```eson
"foo
bar"
```

>     Strings may include character escapes.

This is for "\u{1F4A9}" in addition to "\uD83D\uDCA9", which is what you need to do in JSON.

> Numbers
>     Numbers may be hexadecimal.

Fine.  Particularly nice for config files.  I don't think octal or binary is needed, but if someone else feels strongly, it's fine.

>     Numbers may have a leading or trailing decimal point.

Sure.  As long as we fix everything about numbers at least as well as I-JSON.

>     Numbers may be IEEE 754 positive infinity, negative infinity, and NaN.

Yes.

>     Numbers may begin with an explicit plus sign.

Shrug.

> Comments
>     Single and multi-line comments are allowed.

Strong yes, and possibly the reason to do the work in the first place.  Absolutely required for config files.

> White Space
>     Additional white space characters are allowed.

Not important to me, but I don't speak any of the languages whose whitespace is now legal.  May as well include everything in Zs for future-proofing.

>> We wouldn't get Tim's desired property of one character look-ahead.
> 
> On constrained systems, you could always use CBOR…

Tim can make his own argument here.  I think I'm on record as believing that CBOR is a good thing.

>> Other than that, it's as good a place to start as anywhere else.
> 
> Indeed.
> I could imagine we finish the EDN-literal document for its CBOR target audience, and then write another document about the JSON subset (which would be less of a delta and more of a free-standing specification that is just anchored in the full version for CBOR).

I think "inspired by" is as far as I'd commit to at the moment, until we have some agreement on requirements.

> I just noticed that we do have a required comma between items in arrays and sequences.  I just confused this with CDDL (which I’m prone to do), where that comma is not required.  Leaving off the comma conflicts with implicit string concatenation (which is one way to address the 2D string problem).  So maybe there is a bit more work to do...

I don't think this format needs string concat.

-- 
Joe Hildebrand