Re: [Cbor] [Json] JSON and int64s - any change in current best practice since I-JSON

Joe Hildebrand <hildjj@cursive.net> Mon, 22 January 2024 23:54 UTC

Return-Path: <hildjj@cursive.net>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E1E8FC14CF1D for <cbor@ietfa.amsl.com>; Mon, 22 Jan 2024 15:54:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cursive.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f3n0dGL_I2dP for <cbor@ietfa.amsl.com>; Mon, 22 Jan 2024 15:54:35 -0800 (PST)
Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AA091C14F6B4 for <cbor@ietf.org>; Mon, 22 Jan 2024 15:54:35 -0800 (PST)
Received: by mail-io1-xd2c.google.com with SMTP id ca18e2360f4ac-7bee8f7df35so157953639f.3 for <cbor@ietf.org>; Mon, 22 Jan 2024 15:54:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cursive.net; s=google; t=1705967674; x=1706572474; darn=ietf.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SAzMVYR8l74tQ1VJET9s0hZEb7PIC1/JcZxD7jJbSH8=; b=W3Fm+fkSo0gl7XYUhytV32hjGIHmQJDrLQ/dVIZ/pBybMyd/rQSORLOp6iBuGbVojK pShzKR1LYCABXSMwRVwgbwmqPOLmF2QYNxuIweZTHBD6Aq3M82KEzQbuQBFmmZ4hnPYJ Kh7bwo1bUUOhNLV0YU0+wCwXtd12c8gH3SIY4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705967674; x=1706572474; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SAzMVYR8l74tQ1VJET9s0hZEb7PIC1/JcZxD7jJbSH8=; b=dMITxiTbKHQ0GZsBhLbnNrxna0Zmrm2YH0pKsxa//crF/Ejiw7Url5NhQF1sdoRLMW utRaSX8BY7j4LI51jGfA9X8guXGsFXUKkI0XJUmyANR9cNbqwEPHD7/U+lrwpiBW5aqZ oPB24qdckjzHu/XKs1Wfk2Mgjk3uMIdnGRE9MoIc2exOg7BS5cZuI/VNuvCu3yvSi1qT +UnbXYyUV3JZhhTNcC16nfyFWRtiOABaSdxLFIHfmUzr3ndJZs0HhtkAqTmuBemUJhrm YH3qo5rA18gDvmNniUPxC5zKaVHSc5YgBpEJ22sCZDnO8dFVojt4fVWwjfmUj+8lyfg6 ba4w==
X-Gm-Message-State: AOJu0Yy/dBUx1SnRjE+z2XqXYfA1Npp8hZrSAYOD1TymGuLvplBAwOEX 8f9CpfOi9Rmuwn4QEeB8fMUX1QDQZtQieSE0Z9IASL2YVeL8IOCcE6TmqXlRzRY/xzL8qdk3wSE =
X-Google-Smtp-Source: AGHT+IHzBA28HgpNpa8ZI9u1nAFjnr4PiWMrXU8JiUwUsNEtmzxJMn65TGDKqfqKo5Qhmx0OxweEgA==
X-Received: by 2002:a05:6602:18d:b0:7be:f2f5:8466 with SMTP id m13-20020a056602018d00b007bef2f58466mr5514881ioo.29.1705967674404; Mon, 22 Jan 2024 15:54:34 -0800 (PST)
Received: from smtpclient.apple ([2601:282:2100:4fc9:1d98:3b8a:618f:7864]) by smtp.gmail.com with ESMTPSA id u3-20020a6b4903000000b007bf32a12cabsm5245344iob.39.2024.01.22.15.54.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2024 15:54:33 -0800 (PST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.300.61.1.2\))
From: Joe Hildebrand <hildjj@cursive.net>
In-Reply-To: <B25E10D2-17CF-4B3D-B04B-BABE3A209B90@cursive.net>
Date: Mon, 22 Jan 2024 16:54:22 -0700
Cc: "json@ietf.org" <json@ietf.org>, cbor@ietf.org, Tim Bray <tbray@textuality.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <6A73993B-B54D-480D-AF79-081EE9D2E1C8@cursive.net>
References: <87527a42-aaac-4f39-b320-05f18a2808c1@codalogic.com> <C31BF4C8-9E6C-48F8-BF7B-D2C379273B3F@tzi.org> <CAHBU6it4SaLawSiBgK9ySkbxjtHE6CX-P3r=hzcVy4ksoQo-Cg@mail.gmail.com> <CAChr6SxHfLW-A1asAndKJz-AiyJv5QP18bi=_bNdKXw7zYHThw@mail.gmail.com> <CAChr6SweYdCWxSABZ7g20Zd-xBFzcK0Ritq53S7WtjSwc-vLmw@mail.gmail.com> <E5A68370-CC2F-4618-AB39-39A382656616@cursive.net> <807fea1b-a22b-4d6b-aa5d-720c9b12023c@codalogic.com> <09233A73-3A6B-4E6F-AEB8-596AC6442E24@cursive.net> <869950DC-647B-4481-AEF8-9E092384E99F@tzi.org> <CBD32B58-8328-4602-89C6-BC2A7A875A0D@cursive.net> <994E2C0A-4AE0-4720-8C67-913BBF033E11@tzi.org> <0BB09B30-B606-44CC-85DC-95A47E485316@cursive.net> <B22EDB2D-0AD1-4582-9191-EFB40E163F19@tzi.org> <F6EB02CA-C240-4FA1-92A8-C5BB883929C7@cursive.net> <29BD1557-59A1-4578-901B-C626ABBE9A78@tzi.org> <B25E10D2-17CF-4B3D-B04B-BABE3A209B90@cursive.net>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3774.300.61.1.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/eGeEQtAER0-1t-f6DWsqU3qak2A>
Subject: Re: [Cbor] [Json] JSON and int64s - any change in current best practice since I-JSON
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jan 2024 23:54:40 -0000

I put together a quick requirements doc: https://github.com/hildjj/draft-hildebrand-eson-requirements
This is just a starting point for discussion, any of it can be changed.

— 
Joe Hildebrand

> On Jan 18, 2024, at 6:57 PM, Joe Hildebrand <hildjj@cursive.net> wrote:
> 
>> On Jan 18, 2024, at 8:56 AM, Carsten Bormann <cabo@tzi.org> wrote:
>> 
>> Hi Joe,
>> 
>> thanks for the list!
>> I’m CCing cbor@, as we are just finishing the extended diagnostic notation (EDN) document, and it is worth considering these points.
> 
> Context for folks there: on the JSON list, there is a little bit of loose talk about a text-based format that would be a superset of JSON (perhaps?), would have defined numeric processing, including big integers, perhaps, maybe some of the features from JSON5 (https://json5.org/), might make commas optional (including trailing commas), etc.
> 
>>> We would need to rework that grammar to make it suitable for interchange:
>> 
>> [context: as a JSON extension]
> 
> Yes.
> 
>>> - Remove encoding indicators as mentioned
>> 
>> Right.  When you say “remove”, you mean “do not include in JSON subset”?
>> (ABNF doesn’t have the “.feature” feature of CDDL…)
> 
> I think by "JSON subset" you mean "subset of EDN used to define a text format".  That was confusing, because I was thinking "a text format that is a superset of JSON, perhaps influenced by EDN".  As such, "subset" and "superset" are referring to more or less the same thing.  I'm fine using your wording for the moment, with the understanding that the format we're talking about is not a subset of JSON, but a subset of EDN.
> 
> I'm not tied to ABNF, but I don't think CDDL would work to describe the syntax of the notation itself.  It could be used to describe information encoded in the notation, though.
> 
>>> - Reference IEEE754 for the 0x1.2p3 format, or remove it
>> 
>> Good point.  Section 5.12.3 of IEEE Std 754-2019?
> 
> I think so?  I can't find my copy of 754.
> 
>>> - Remove embedded notation
>> 
>> (I.e., not in JSON subset.)
> 
> Yes.  This is only needed for expository text, not for interchange.
> 
>>> - Remove ellipsis processing
>> 
>> (I.e., not in JSON subset.)
> 
> Yes.  This is only needed for expository text, not for interchange.
> 
>>> - Discuss whether results should always be an array, as sequence is currently top-level
>> 
>> Good question.  I gather the pickup of JSON sequences (RFC 7464) is not that great?  EDN is not entirely compatible with that as we don’t do the RS character, but do an explicit comma.
>> The JSON subset could simply use “item” as the entry point.
> 
> Maybe I'm misunderstanding the ABNF in the EDN doc, but it looks like the top-level rule is `seq`, which is comma-separated.  I'm open to designing a streaming form of this notation at the same time, but that's a topic for discussion.
> 
>>> - Discuss adding other JSON5 features
>> 
>> Do you have some examples for what you would like to add?
>> What are we missing out for EDN?
> 
> I like basically everything that JSON5 adds, except maybe escaping newlines; I think I'd prefer newlines to just be valid in strings without needing to be escaped.  I don't care about explicit `+` before numbers, but they don't bother me.  Reproduced here with comments:
> 
>> Objects
>>    Object keys may be an ECMAScript 5.1 IdentifierName.
> 
> Strong yes.  No need for quoting most keys in a world that has /\p{ID_Start}\p{ID_Continue}*/u
> 
>>    Objects may have a single trailing comma.
> 
> Strong yes.  I want to have the same coding standard in this format as I do for other languages.  I'm fine with all commas being optional for those that feel differently.
> 
>> Arrays
>>    Arrays may have a single trailing comma.
> 
> Yes.  See above.
> 
>> Strings
>>    Strings may be single quoted.
> 
> Same coding standard argument, plus there are times when I want to use " or ' without having to backslash-escape it.
> 
>>    Strings may span multiple lines by escaping new line characters.
> 
> This is for:
> 
> ```json5
> "foo\
> bar"
> ```
> 
> I'd prefer:
> 
> ```eson
> "foo
> bar"
> ```
> 
>>    Strings may include character escapes.
> 
> This is for "\u{1F4A9}" in addition to "\uD83D\uDCA9", which is what you need to do in JSON.
> 
>> Numbers
>>    Numbers may be hexadecimal.
> 
> Fine.  Particularly nice for config files.  I don't think octal or binary is needed, but if someone else feels strongly, it's fine.
> 
>>    Numbers may have a leading or trailing decimal point.
> 
> Sure.  As long as we fix everything about numbers at least as well as I-JSON.
> 
>>    Numbers may be IEEE 754 positive infinity, negative infinity, and NaN.
> 
> Yes.
> 
>>    Numbers may begin with an explicit plus sign.
> 
> Shrug.
> 
>> Comments
>>    Single and multi-line comments are allowed.
> 
> Strong yes, and possibly the reason to do the work in the first place.  Absolutely required for config files.
> 
>> White Space
>>    Additional white space characters are allowed.
> 
> Not important to me, but I don't speak any of the languages whose whitespace is now legal.  May as well include everything in Zs for future-proofing.
> 
>>> We wouldn't get Tim's desired property of one character look-ahead.
>> 
>> On constrained systems, you could always use CBOR…
> 
> Tim can make his own argument here.  I think I'm on record as believing that CBOR is a good thing.
> 
>>> Other than that, it's as good a place to start as anywhere else.
>> 
>> Indeed.
>> I could imagine we finish the EDN-literal document for its CBOR target audience, and then write another document about the JSON subset (which would be less of a delta and more of a free-standing specification that is just anchored in the full version for CBOR).
> 
> I think "inspired by" is as far as I'd commit to at the moment, until we have some agreement on requirements.
> 
>> I just noticed that we do have a required comma between items in arrays and sequences.  I just confused this with CDDL (which I’m prone to do), where that comma is not required.  Leaving off the comma conflicts with implicit string concatenation (which is one way to address the 2D string problem).  So maybe there is a bit more work to do...
> 
> I don't think this format needs string concat.
> 
> -- 
> Joe Hildebrand