[Cbor] Re: I-D Action: draft-mcnally-deterministic-cbor-10.html

Anders Rundgren <anders.rundgren.net@gmail.com> Sun, 16 June 2024 14:50 UTC

Return-Path: <anders.rundgren.net@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A3EE9C1519A0 for <cbor@ietfa.amsl.com>; Sun, 16 Jun 2024 07:50:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1n0xBidtAhVl for <cbor@ietfa.amsl.com>; Sun, 16 Jun 2024 07:50:17 -0700 (PDT)
Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 10153C14F5ED for <cbor@ietf.org>; Sun, 16 Jun 2024 07:50:17 -0700 (PDT)
Received: by mail-lj1-x229.google.com with SMTP id 38308e7fff4ca-2ec002caeb3so47751791fa.2 for <cbor@ietf.org>; Sun, 16 Jun 2024 07:50:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718549414; x=1719154214; darn=ietf.org; h=content-transfer-encoding:content-language:to:subject:from :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=TxT7hZs2vRNYKGM8OunLSAzTrwA/ktvJoHCbwCuSVCE=; b=f/bfFya9ojEBd+lnYR3OG3d0cFd6SlUzugvV28DAdzy+IEINjYFyR/m8f+zCddQRLh c1LR+iu1GHC32S4qKKadHT6BPcevvfP9rtxX6YodYW4mEFLcJYwf7yAJGcmIHRgnrvxB 28XiIAN2yA6SPJD2mgybIz7FxIzC3UVBqGVeNk91V92UV2Tdqu4cXI8Kwhk0QxhFgt+2 bAuKwVrXC6TPP+eK+VU0btGbtPYNZ4+mJbUHpVOvAq0aBkChkv0kW7+aMR8yzHRqIJzv YfRUxSMN6PEXdX5s/ii5euAeDyWgi92M9RBqQg9EHR+MCflSNagGU79SnTkYhRrJ2I+2 gMNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718549414; x=1719154214; h=content-transfer-encoding:content-language:to:subject:from :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=TxT7hZs2vRNYKGM8OunLSAzTrwA/ktvJoHCbwCuSVCE=; b=Me8NvNuXmxf96WJBFFZh9YYYYU8J2L4rAK2fS77W4GZDUOAQ32dhfuizSp6dbnwF9v DNEDAzWFuXaQzz6OjQNgf+Mf3fpP6GLD0aioc+9BdhRCo9GPoSY/jEmqRsonooMJAZae /kA5Qa6dJo7ymM45vbZ8CxSwuP5gr4Fwhm7OG/n2kEgXn1ui1z7si+hfF+4vqkC2Pp3X 8niizq4ZTKjlBzKyI2clAONZhesV6mwUo6tk83dxIGsNwtM21VtmBYDEQwUuG3KSTpZl PWxkechZypxd/cAQTbD/xwiuUZyTjM5oryaRKwwxahjYart1I6KMb1PvVwDhCP/dSGrc 9KiA==
X-Gm-Message-State: AOJu0YxJ1uUuIEmpGnZZU3qgKq6g8+3VNITqCQVgbmDpHZEx2GA4Vait rUM5VH+aQEtCcvoDvJof15B0dmAOIjg9RSa8Kwh4gi3cRFLbvpKKEfJewB+0
X-Google-Smtp-Source: AGHT+IEju8pBznf5LHjWDR5tSItT9PkTPm3oOnw0+33NcKs8PT0mQDWsLjZ5zFJhrv8ViluDaom7cA==
X-Received: by 2002:a2e:a315:0:b0:2ea:85f0:9165 with SMTP id 38308e7fff4ca-2ec0e5c6772mr51668211fa.19.1718549414349; Sun, 16 Jun 2024 07:50:14 -0700 (PDT)
Received: from [192.168.0.101] (212-107-132-189.customers.ownit.se. [212.107.132.189]) by smtp.googlemail.com with ESMTPSA id 38308e7fff4ca-2ec05c05f8asm11461921fa.41.2024.06.16.07.50.13 for <cbor@ietf.org> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 16 Jun 2024 07:50:13 -0700 (PDT)
Message-ID: <a962e326-ab3f-4857-a1ee-2042cf87f32a@gmail.com>
Date: Sun, 16 Jun 2024 16:50:13 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Anders Rundgren <anders.rundgren.net@gmail.com>
To: "cbor@ietf.org" <cbor@ietf.org>
Content-Language: en-US
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Message-ID-Hash: GRMGZQJC57MRJU3NHZPQEORYQ3UZFI62
X-Message-ID-Hash: GRMGZQJC57MRJU3NHZPQEORYQ3UZFI62
X-MailFrom: anders.rundgren.net@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] Re: I-D Action: draft-mcnally-deterministic-cbor-10.html
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/I0ZDbsrQJ7LDcyXBtuskxHGhBsU>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

Hi CBOR WG,

Quoting LL: "dCBOR use should be very very rare. Like 100 times more rare than CDE."
Right or wrong, the draft gives no clues to the characteristics of such esoteric applications. Here is an attempt to rectify this situation.

THE CORE ISSUE

2.3  Numeric Reduction
    The purpose of determinism is to ensure that semantically equivalent data items
    are encoded into identical byte streams. Numeric reduction ensures that semantically
    equal numeric values (e.g. 2 and 2.0) are encoded into identical byte streams (e.g. 0x02)
    by encoding "Integral floating point values" (floating point values with a zero fractional part)
    as integers when possible.

This text can be interpreted like: numeric reduction is more deterministic ("better") than its CDE counterpart.
However, if we stick to the *Encoding Format* only, the schemes should be comparable [*].

What's left are rather *External Factors* that may motivate numeric reduction.

Revised text proposal:

2.3 Numeric Reduction
    To provide deterministic encoding in platforms that do not separate integer
    and floating point values (like JavaScript), numbers must be canonicalized.
    Numeric reduction ensures that semantically equal numeric values (e.g. 2 and 2.0)
    are encoded into identical byte streams (e.g. 0x02) by encoding "Integral floating point values"
    (floating point values with a zero fractional part) as integers when possible.

Since this is the essence of dCBOR, I would drop all other parts of dCBOR, since similar restrictions and limitations can be found in just about any other CBOR-using application as well.

In discussions on the mailing list other dCBOR features like support for advanced transformations have been mentioned as well.  However, they are unrelated to the dCBOR encoding scheme.

END OF UPDATE

This following section is just another view on the core issue.

Using CDE with JavaScript is not a major problem, as shown by this 300-line pretty full-featured CBOR encoder:
https://gist.github.com/cyberphone/d0f7a99da6afb397955645baad578bea#file-cde-js-L316

If wrapping floating point values feels too awkward, by using *Object Encapsulation*, application developers can get away from CBOR altogether. CBOR-related APIs would then be left for system and library developers to bother about:
https://github.com/cyberphone/CBOR.js/blob/main/test/xyz-encoder.js#L38
https://github.com/cyberphone/CBOR.js/blob/main/test/xyz-decoder.js#L42
In the examples, strict type checking ensures respectively verifies that the CBOR data conform to the "XYZ" specification.

Thanx,
Anders

*] Numeric reduction does though break the self-describing nature of CBOR (a serialized 2.0 ends up as 2).