Re: [Multiformats] Multiformats Considered Harmful

Orie Steele <orie@transmute.industries> Thu, 07 September 2023 18:49 UTC

Return-Path: <orie@transmute.industries>
X-Original-To: multiformats@ietfa.amsl.com
Delivered-To: multiformats@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2DC1CC151072 for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 11:49:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.086
X-Spam-Level:
X-Spam-Status: No, score=-7.086 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_REMOTE_IMAGE=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=transmute.industries
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4wUUO0-xNrEb for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 11:49:52 -0700 (PDT)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 66FA3C151066 for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 11:49:52 -0700 (PDT)
Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-52a3ff5f0abso1773126a12.1 for <multiformats@ietfa.amsl.com>; Thu, 07 Sep 2023 11:49:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=transmute.industries; s=google; t=1694112590; x=1694717390; darn=ietfa.amsl.com; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4pQ9WVhi/GrpTe9SHZFdtOQPRi/T0agEwEeXTYC8z9Q=; b=SMtN+KsCCFdWXdvDAQh1fMsXNQy4rZCKkTEe6SrF1UEx8x5U5RRwUWelKTot+tNz+c YiN9hvN+pJ0KfXhClSE6FcrEpFWaxpbPJ/eSppK9m7LBrUsY+UaaVj+8AF+032+iuOkO iUqDhkNoushVba9hERd8BoEdnRVRGri3SMDllUsmM/sS8FcEgD/IT6xcuwqwFRTuP19l BWiey+oKCmqBSyYH1VoisZiSC/zXeUTNmoJT695xdh/ozKUMm3kx/ZQl9M3gWZPdJBlp k7a90G7j7DJ9aZGHYVEefiA/ZI3qR49QJCze3aOjrsTLww+BhikCUwMg32tlqxJ10+z/ e20Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694112590; x=1694717390; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4pQ9WVhi/GrpTe9SHZFdtOQPRi/T0agEwEeXTYC8z9Q=; b=uPZerNGOgURACvN3Ec2VfDkhPGdO2X9ukNi6+GhGoLnhtYNEx7d6AQA/ATu0vd0Nm5 v2QyPA+9coXGgXnyiA/2UgbfPWbILlIvuPWXZuZ0fEnCIzu1ls4ioDeWE/p07K0SRkAX riJQibYT571ngxMllwG/A08nmeXAZuKLaz+EHw9/CxqMABhcThRs5Yzhk2M7vhLAA1Hv Rl0toz8ZCSkP5lcywKmV2ZCbz4RnEtH9YxRXOvCO5x6NywSrncHKyphSYeayvKrn/n7j 8FNsaUOkZo5JVEOeHr/8Sx7AMUSvFlpLx51PH+jhMASkDP7mg8x1NPSTYOPEFBYyYf0Q YZlA==
X-Gm-Message-State: AOJu0YxFUVHJCdHompTvYY40jgR2m7BtWc7uD9oe9TLAqVtWhBpMPvXZ kWTdl3gUC67nbOypb9GOmbgVB6YT0TD+TPoIk3Ld4w==
X-Google-Smtp-Source: AGHT+IFjUp2Ff8jIA3dkDzYoZpHToheRY4HecChA081BHV1LTR0U0wEynI3syE6EV4F63ZjkRKe2C4Q1KMtFOiRiJCY=
X-Received: by 2002:aa7:d694:0:b0:525:6d6e:ed53 with SMTP id d20-20020aa7d694000000b005256d6eed53mr97314edr.27.1694112590330; Thu, 07 Sep 2023 11:49:50 -0700 (PDT)
MIME-Version: 1.0
References: <I_83nnUevaY5K4VguFlDBh5qKl3Oe6PV_KnCD8QELrnCJqzE3_lU9x2AYiIIpCbxTudTQsQgjE5eEPprdlwDVFPsaKU-uZfo3_DJm5CMX7s=@learningproof.xyz>
In-Reply-To: <I_83nnUevaY5K4VguFlDBh5qKl3Oe6PV_KnCD8QELrnCJqzE3_lU9x2AYiIIpCbxTudTQsQgjE5eEPprdlwDVFPsaKU-uZfo3_DJm5CMX7s=@learningproof.xyz>
From: Orie Steele <orie@transmute.industries>
Date: Thu, 07 Sep 2023 13:49:39 -0500
Message-ID: <CAN8C-_JF7GydtgeJVDFV3wwgMWoU6uXeACwg2prhVrjjZLmhtg@mail.gmail.com>
To: bumblefudge von CASA <bumblefudge@learningproof.xyz>
Cc: "multiformats@ietfa.amsl.com" <multiformats@ietfa.amsl.com>
Content-Type: multipart/alternative; boundary="0000000000002812590604c953f2"
Archived-At: <https://mailarchive.ietf.org/arch/msg/multiformats/MezL-n-q7O__lUrB04YccZFEm1I>
Subject: Re: [Multiformats] Multiformats Considered Harmful
X-BeenThere: multiformats@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion related to the various Multiformats data formats <multiformats.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/multiformats>, <mailto:multiformats-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/multiformats/>
List-Post: <mailto:multiformats@ietf.org>
List-Help: <mailto:multiformats-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/multiformats>, <mailto:multiformats-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Sep 2023 18:49:57 -0000

Inline:

On Thu, Sep 7, 2023 at 7:10 AM bumblefudge von CASA <
bumblefudge@learningproof.xyz> wrote:

> Dear Prof Bormann and Melvin Carvalho:
>
> Thank you very much for your bibliographic erratum and expression of
> interest, respectively.  They're incorporated below in the longer response.
>
> Dear Mike Jones:
>
> Thank you for taking the time to spell out your feedback. I believe that
> clarifying the spec in a few places can address most of your concerns.
>
> 1. Multiformats institutionalize the failure to make a choice, which is
> the opposite of what good standards do. Good standards make choices about
> representations of data structures resulting in interoperability, since
> every conforming implementation uses the same representation. In contrast,
> multiformats enable different implementations to use a multiplicity of
> different representations for the same data, harming interoperability.
> https://datatracker.ietf.org/doc/html/draft-multiformats-multibase-03#appendix-D.1
> defines 23 equivalent and non-interoperable representations for the same
> data!
>
>
> Multiformats are focused on evolvability. They were designed as a
> component of protocol suites intended to be actively maintained and to
> remain robust over an extended period of time such that protocol
> ossification was identified as a significant threat.
>

Can you say more about this threat? Why is a stable base encoding a threat?


> You can think of them as reusable extensibility that promotes active use
> (in the RFC9170 <https://www.rfc-editor.org/rfc/rfc9170.html> sense), and
> as such they join a growing body of work on long-term protocol robustness.
>
> I didn't want to make the draft too philosophical, but I agree that this
> intent could be clarified there more explicitly.
>
> 2. The stated purpose of "multibase<
> https://www.ietf.org/archive/id/draft-multiformats-multibase-08.html>" is
> "Unfortunately, it's not always clear what base encoding is used; that's
> where this specification comes in. It answers the question: Given data 'd'
> encoded into text 's', what base is it encoded with?", which is wholly
> unnecessary. Successful standards DEFINE what encoding is used where. For
> instance, https://www.rfc-editor.org/rfc/rfc7518.html#section-6.2.1.2
> defines that "x" is base64url encoded. No guesswork or prefixing is
> necessary or useful.
>
>
> Successful standards for self-describing encodings have also been adopted
> by communities rather than relying on fixed registries and protocols that
> define encodings in advance and universally.
>

This is just a fancy way of saying the community made a registry that is
fixed in the sense that the community controls updates to it... that's how
IETF registries work too.

I've got some experience with community managed registries, and in my
experience, they can encourage commitment to lower quality.

For example, multicodec has DER encoded RSA keys, JCS encoded JWK keys
(including JCS encoded RSA and P-256), and compressed elliptic curve point
keys for P-256.... Ed25519 private keys contain public keys, but X25519
private keys don't.... they all have fixed code points, and if we change
what the code point means, it breaks interoperability.


> While it could be argued that interoperability with those kinds of
> protocol could be difficult, inefficient, or, in some cases, even zero-sum,
> I do not feel a universal claim to harmfulness is warranted, particularly
> without defining the criterion and context of interoperability.
>
>
I agree it's helpful to define interoperability, there are few dimensions
that apply to multiformats:

1. base encoding negotiation
2. hash function negotiation
3. public key negotiation
4. merkle tree encoding negotiation / other crypto primitives that might
come in the future...

The registry part is not the issue, the challenge is non overlapping
implementation of the registry:

https://github.com/multiformats/js-multiformats#implementations

After multibase and multihash are RFCs, you will see other RFCs that
profile on them similar to how RFCs profile on JOSE and COSE, for example:

1. base32 encoding
2. murmur3 hash
3. P-256 public keys

^ interop will be achieved through naming protocols that build on
multiformats, and restrict to a reasonable set of choices... not by adding
more registries.

This is to say that multiformats and multibase are foundations for profiles
that reproduce functionality that JOSE and COSE already cover:

1. base64url
2. iana hash registries
3. cose / jose key format registries

IPLD is already in the CBOR registry, its tag 42:
https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml

> 3. Standardization of multiformats would result in unnecessary and
> unhelpful duplication of functionality - especially of key representations.
> The primary use of multiformats is for "publicKeyMultibase" - a
> representation of public keys that are byte arrays. For instance, the only
> use of multiformats by the W3C DID spec<https://www.w3.org/TR/did-core/>
> is for publicKeyMultibase. The IETF already has several perfectly good key
> representations, including X.509, JSON Web Key (JWK), and COSE_Key. There's
> not a compelling case for another one.
>
>
> Representation of key material bytes is not the "primary use" at all--
> it's not even in scope of this working group or mentioned in any of the
> drafts. The VCWG usage of multibase you mention was built on an earlier
> (and less complete) version of the multibase specification, which I hope is
> backwards compatible with the more recent drafts; since such usage is
> downstream and out of scope of this WG, however, I have not bothered
> confirming this despite being a member of both WGs.
>
>
It has come up a fair number of times, and in particular proponents of
multiformats suggest that publicKeyMultibase is somehow superior to JWK or
COSE Key... This makes very little sense to me as a developer, having
implemented all 3.


> The primary goal of bringing a Multiformats WG to IETF is to balance the
> IPFS usecases that primarily drove these specifications and algorithms
> until now  against such external use cases as the `publicKeyMultibase` one
> you mention, which emerged outside the field of vision of the multiformats
> community. These new use cases compose the multiformats building blocks in
> different configurations or cherry-pick individual specifications or
> registries for new purposes. Melvin's request that the CID specification be
> hardened at IETF is a great example of non-IPFS use cases we would like to
> enable and de-risk by hardening each specific layer enough to be used
> independently. (Sidenote for Melvin: the main reason CID is not in-scope
> for the proposed WG is that it depends squarely on both multibase and
> multihash, so hardening CID concurrently with its dependencies would
> pre-empt breaking changes). The IETF seems to me the best place to
> reconcile what's already built with novel, independent, or external usages
> one layer at a time, in a deliberative and open way.
>
>
I agree, IETF can help untangle these layers, but that could result in some
breaking changes... If it can't, it's just an incremental rubber stamping
process.


> To make it more explicit, the "primary use case" to date of all the
> multiformats has been the annotation and compact representation of
> heterogenous and emergent forms of data balancing efficiency against
> open-endedness of transport, encoding, and structure. This has happened in
> the context of developing the IPFs and libp2p protocol suites and
> optimizing their robustness and evolvability. That only a subset of key
> representations are expressable natively in today's multiformats is an
> index of comprehensive key representation not having been a goal, and of
> different tradeoffs having been made. Indeed, allowing more conventional
> and standardized key expressions to be used in multiformats contexts is a
> goal of the working group, insofar as it would allow new and independent
> use-cases for each layer and specification. Is there anything in the
> charter or drafts that states otherwise?
>
> 4. publicKeyMultibase can only represent a subset of the key types used in
> practice. Representing many kinds of keys requires multiple values - for
> instance, RSA keys require both an exponent and a modulus. By comparison,
> the X.509, JWK, and COSE_Key formats are flexible enough to represent all
> kinds of keys. It makes little to no sense to standardize a key format that
> limits implementations to only certain kinds of keys.
>
>
> Standardizing key formats is out of scope of this charter and these
> specifications so I'm not sure what this is referring to.
>
>
It's based on the assertion that it's "good" for representing public keys
(and private keys).... it's not.


> 5. The "multihash<
> https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html>"
> specification relies on a non-standard representation of integers called
> "Dwarf". Indeed, the referenced Dwarf document lists itself as being at
> http://dwarf.freestandards.org/ - a URL that no longer exists!
>
>
> The DWARF debugging standards still exist and are still maintained and
> advanced:
> https://dwarfstd.org/dwarf5std.html
>
> Thanks very much to Professor Bormann for the RFC reference, that would
> have been a better normative anchor.
>
> 6. The "Multihash Identifier Registry" at
> https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html#mh-registry
> duplicates the functionality of the IANA "Named Information Hash Algorithm
> Registry" at
> https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg,
> in that both assign (different) numeric identifiers for hash functions. If
> multihash goes forward, it should use the existing registry.
>
>
> Taken in isolation as registries, they certainly seem duplicative, but I
> hope that IETF community will agree with me that the confusion of two
> overlapping solution spaces can be justified by how differently they are
> used. The numeric prefixes are used in different algorithms to prefix
> different shapes of data from differently-governed ecosystems in different
> schemata and optimizing for different variables. On a purely technical
> level, the padding and length/segmentation practices are quite different,
> so swapping multiformats prefixes for IANA named-info#hash-alg prefixes
> wouldn't make the resulting data any more interoperable. If anything, this
> 1:1 substitution would actually hinder interoperability by apparent but not
> substantial conformance to the equivalent IETF specifications that use the
> latter.
>
>
IMO, multiformat prefixing is just a uvarint version of URN prefixing,
which IETF uses in several places... The commentary on CID vs multihash
highlighted this.


> > 7. It's concerning that the draft charter<
> https://msporny.github.io/charter-ietf-multiformats/> states that
> "Changing current Multiformat header assignments in a way that breaks
> backward compatibility with production deployments" is out of scope.
> Normally IETF working groups are given free reign to make improvements
> during the standardization process.
>
> The intent there is to of course empower the group to make improvements,
> as already agreed, but to be mindful of compatibility with deployed
> implementations. There are multiple implementations that have been deployed
> in production settings for several years and multiformats are used in
> protocols that are deployed at scale and included in massive amounts of
> publicly deployed URLs and stored content. In fact, the wide deployment of
> multiformats and the growing number of implementations are strong
> motivating factors behind this group. Breaking compatibilty in flight seems
> unwise if it can be avoided.
>
>
If you can't give up control, and you won't make breaking changes, why
bring the work to IETF?


> > 8. Finally, as a member of the W3C DID and W3C Verifiable Credentials
> working groups, I will state that it is misleading for the draft charter to
> say that "The outputs from this Working Group are currently being used by
> ... the W3C Verifiable Credentials Working Group, W3C Decentralized
> Identifiers Working Group...". The documents produced by these working
> groups intentionally contain no normative references to multiformats or any
> data structures derived from them. Where they are referenced, it is
> explicitly stated that the references are non-normative.
>
> This feels a little more like W3C business than corrections to the
> proposal text, if I'm being honest. Would it be more precise to verbosely
> make explicit that "prototypes and implementations built by members of the
> VCWG use earlier forms of the multibase specification"? The current charter
> does not claim that W3C *specifications* rely normatively on the multibase
> specification, which they could not have done if they wanted to because the
> multibase specification is currently in community draft and would not meet
> W3C's editorial criteria for a normative reference.
>
>
So the only thing blocking W3C and other specs from making multformats
mandatory to implement is the fact that it's not an RFC or standard from a
reputable standards organization.

DID WG and W3C WG have gone as far as they can to recommend multiformats,
and this will unlock the next steps.



> ---
>
> In summary, I agree that we can be more explicit as to the benefits and
> existing use of multiformats. Barring further input, I will try getting
> these changes in before the IESG considers the proposal documents.
>
> ---
> bumblefudge
> janitor @Chain Agnostic Standards Alliance
> <https://github.com/chainagnostic/CASA>
> contractable via learningProof UG <https://learningproof.xyz>
> mostly berlin-based
> --
> Multiformats mailing list
> Multiformats@ietf.org
> https://www.ietf.org/mailman/listinfo/multiformats
>


-- 


ORIE STEELE
Chief Technology Officer
www.transmute.industries

<https://transmute.industries>