Re: [Multiformats] Multiformats Considered Harmful

bumblefudge von CASA <bumblefudge@learningproof.xyz> Thu, 07 September 2023 12:10 UTC

Return-Path: <bumblefudge@learningproof.xyz>
X-Original-To: multiformats@ietfa.amsl.com
Delivered-To: multiformats@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EAA4C15107B for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 05:10:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.093
X-Spam-Level:
X-Spam-Status: No, score=-2.093 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=learningproof.xyz
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id siC8iSWQmpnG for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 05:10:09 -0700 (PDT)
Received: from mail-4317.proton.ch (mail-4317.proton.ch [185.70.43.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EEC63C151069 for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 05:10:08 -0700 (PDT)
Date: Thu, 07 Sep 2023 12:09:59 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=learningproof.xyz; s=protonmail; t=1694088605; x=1694347805; bh=oIKib+9aU9d8ypepG0VSj5kNl8jAXpFCLmmade9/GYs=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector; b=U8mlySu4jmBdkxGrN2eO6xkHOy6f3tCF/bEx1j7iDb+FESiZmP8VLD2BYthwRLosu 0aMO82R0aP6HM2twKZYbuuQZ37OLJhoGeJBaV+jBkK1IS6Q9w6wTUhYKn5YWeoNvCZ T2DoacRSvis0uMImAvKXp9mRg8pGXNk36xGaUU9P7n/2+rbwNpgKbznMqjrFNYWwlY HAU0VmJtoE2DCIQGMvR259mshzCxUHJ6QPSEi7xOeBkjNyRI3cUE8edfxggv3AWJVT 2u6jwjH87YHxkuUFDM9IEUXkL8OSkAu+OTrz0eMCupVOotsgwRzIV69A34Mb9GEv5c GYniYUI5QVyOQ==
To: "multiformats@ietfa.amsl.com" <multiformats@ietfa.amsl.com>
From: bumblefudge von CASA <bumblefudge@learningproof.xyz>
Message-ID: <I_83nnUevaY5K4VguFlDBh5qKl3Oe6PV_KnCD8QELrnCJqzE3_lU9x2AYiIIpCbxTudTQsQgjE5eEPprdlwDVFPsaKU-uZfo3_DJm5CMX7s=@learningproof.xyz>
Feedback-ID: 85909410:user:proton
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="b1_tFcxqddDkfVpxkMqvTOJjafaLDzSTj7TATFzkLpAiv4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/multiformats/WBYzcng10B9Tqfs__nktqo34U4c>
Subject: Re: [Multiformats] Multiformats Considered Harmful
X-BeenThere: multiformats@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion related to the various Multiformats data formats <multiformats.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/multiformats>, <mailto:multiformats-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/multiformats/>
List-Post: <mailto:multiformats@ietf.org>
List-Help: <mailto:multiformats-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/multiformats>, <mailto:multiformats-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Sep 2023 12:10:15 -0000

Dear Prof Bormann and Melvin Carvalho:

Thank you very much for your bibliographic erratum and expression of interest, respectively. They're incorporated below in the longer response.

Dear Mike Jones:

Thank you for taking the time to spell out your feedback. I believe that clarifying the spec in a few places can address most of your concerns.

> 1. Multiformats institutionalize the failure to make a choice, which is the opposite of what good standards do. Good standards make choices about representations of data structures resulting in interoperability, since every conforming implementation uses the same representation. In contrast, multiformats enable different implementations to use a multiplicity of different representations for the same data, harming interoperability. https://datatracker.ietf.org/doc/html/draft-multiformats-multibase-03#appendix-D.1 defines 23 equivalent and non-interoperable representations for the same data!

Multiformats are focused on evolvability. They were designed as a component of protocol suites intended to be actively maintained and to remain robust over an extended period of time such that protocol ossification was identified as a significant threat. You can think of them as reusable extensibility that promotes active use (in the [RFC9170](https://www.rfc-editor.org/rfc/rfc9170.html) sense), and as such they join a growing body of work on long-term protocol robustness.

I didn't want to make the draft too philosophical, but I agree that this intent could be clarified there more explicitly.

> 2. The stated purpose of "multibase<https://www.ietf.org/archive/id/draft-multiformats-multibase-08.html>" is "Unfortunately, it's not always clear what base encoding is used; that's where this specification comes in. It answers the question: Given data 'd' encoded into text 's', what base is it encoded with?", which is wholly unnecessary. Successful standards DEFINE what encoding is used where. For instance, https://www.rfc-editor.org/rfc/rfc7518.html#section-6.2.1.2 defines that "x" is base64url encoded. No guesswork or prefixing is necessary or useful.

Successful standards for self-describing encodings have also been adopted by communities rather than relying on fixed registries and protocols that define encodings in advance and universally. While it could be argued that interoperability with those kinds of protocol could be difficult, inefficient, or, in some cases, even zero-sum, I do not feel a universal claim to harmfulness is warranted, particularly without defining the criterion and context of interoperability.

> 3. Standardization of multiformats would result in unnecessary and unhelpful duplication of functionality - especially of key representations. The primary use of multiformats is for "publicKeyMultibase" - a representation of public keys that are byte arrays. For instance, the only use of multiformats by the W3C DID spec<https://www.w3.org/TR/did-core/> is for publicKeyMultibase. The IETF already has several perfectly good key representations, including X.509, JSON Web Key (JWK), and COSE_Key. There's not a compelling case for another one.

Representation of key material bytes is not the "primary use" at all-- it's not even in scope of this working group or mentioned in any of the drafts. The VCWG usage of multibase you mention was built on an earlier (and less complete) version of the multibase specification, which I hope is backwards compatible with the more recent drafts; since such usage is downstream and out of scope of this WG, however, I have not bothered confirming this despite being a member of both WGs.

The primary goal of bringing a Multiformats WG to IETF is to balance the IPFS usecases that primarily drove these specifications and algorithms until now against such external use cases as the `publicKeyMultibase` one you mention, which emerged outside the field of vision of the multiformats community. These new use cases compose the multiformats building blocks in different configurations or cherry-pick individual specifications or registries for new purposes. Melvin's request that the CID specification be hardened at IETF is a great example of non-IPFS use cases we would like to enable and de-risk by hardening each specific layer enough to be used independently. (Sidenote for Melvin: the main reason CID is not in-scope for the proposed WG is that it depends squarely on both multibase and multihash, so hardening CID concurrently with its dependencies would pre-empt breaking changes). The IETF seems to me the best place to reconcile what's already built with novel, independent, or external usages one layer at a time, in a deliberative and open way.

To make it more explicit, the "primary use case" to date of all the multiformats has been the annotation and compact representation of heterogenous and emergent forms of data balancing efficiency against open-endedness of transport, encoding, and structure. This has happened in the context of developing the IPFs and libp2p protocol suites and optimizing their robustness and evolvability. That only a subset of key representations are expressable natively in today's multiformats is an index of comprehensive key representation not having been a goal, and of different tradeoffs having been made. Indeed, allowing more conventional and standardized key expressions to be used in multiformats contexts is a goal of the working group, insofar as it would allow new and independent use-cases for each layer and specification. Is there anything in the charter or drafts that states otherwise?

> 4. publicKeyMultibase can only represent a subset of the key types used in practice. Representing many kinds of keys requires multiple values - for instance, RSA keys require both an exponent and a modulus. By comparison, the X.509, JWK, and COSE_Key formats are flexible enough to represent all kinds of keys. It makes little to no sense to standardize a key format that limits implementations to only certain kinds of keys.

Standardizing key formats is out of scope of this charter and these specifications so I'm not sure what this is referring to.

> 5. The "multihash<https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html>" specification relies on a non-standard representation of integers called "Dwarf". Indeed, the referenced Dwarf document lists itself as being at http://dwarf.freestandards.org/ - a URL that no longer exists!

The DWARF debugging standards still exist and are still maintained and advanced:
https://dwarfstd.org/dwarf5std.html

Thanks very much to Professor Bormann for the RFC reference, that would have been a better normative anchor.

> 6. The "Multihash Identifier Registry" at https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html#mh-registry duplicates the functionality of the IANA "Named Information Hash Algorithm Registry" at https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg, in that both assign (different) numeric identifiers for hash functions. If multihash goes forward, it should use the existing registry.

Taken in isolation as registries, they certainly seem duplicative, but I hope that IETF community will agree with me that the confusion of two overlapping solution spaces can be justified by how differently they are used. The numeric prefixes are used in different algorithms to prefix different shapes of data from differently-governed ecosystems in different schemata and optimizing for different variables. On a purely technical level, the padding and length/segmentation practices are quite different, so swapping multiformats prefixes for IANA named-info#hash-alg prefixes wouldn't make the resulting data any more interoperable. If anything, this 1:1 substitution would actually hinder interoperability by apparent but not substantial conformance to the equivalent IETF specifications that use the latter.

> 7. It's concerning that the draft charter<https://msporny.github.io/charter-ietf-multiformats/> states that "Changing current Multiformat header assignments in a way that breaks backward compatibility with production deployments" is out of scope. Normally IETF working groups are given free reign to make improvements during the standardization process.

The intent there is to of course empower the group to make improvements, as already agreed, but to be mindful of compatibility with deployed implementations. There are multiple implementations that have been deployed in production settings for several years and multiformats are used in protocols that are deployed at scale and included in massive amounts of publicly deployed URLs and stored content. In fact, the wide deployment of multiformats and the growing number of implementations are strong motivating factors behind this group. Breaking compatibilty in flight seems unwise if it can be avoided.

> 8. Finally, as a member of the W3C DID and W3C Verifiable Credentials working groups, I will state that it is misleading for the draft charter to say that "The outputs from this Working Group are currently being used by ... the W3C Verifiable Credentials Working Group, W3C Decentralized Identifiers Working Group...". The documents produced by these working groups intentionally contain no normative references to multiformats or any data structures derived from them. Where they are referenced, it is explicitly stated that the references are non-normative.

This feels a little more like W3C business than corrections to the proposal text, if I'm being honest. Would it be more precise to verbosely make explicit that "prototypes and implementations built by members of the VCWG use earlier forms of the multibase specification"? The current charter does not claim that W3C *specifications* rely normatively on the multibase specification, which they could not have done if they wanted to because the multibase specification is currently in community draft and would not meet W3C's editorial criteria for a normative reference.

---

In summary, I agree that we can be more explicit as to the benefits and existing use of multiformats. Barring further input, I will try getting these changes in before the IESG considers the proposal documents.

---
bumblefudge
janitor [@Chain Agnostic Standards Alliance](https://github.com/chainagnostic/CASA)
contractable via [learningProof UG](https://learningproof.xyz)
mostly berlin-based