Re: [Multiformats] Multiformats Considered Harmful

Melvin Carvalho <melvincarvalho@gmail.com> Thu, 07 September 2023 14:05 UTC

Return-Path: <melvincarvalho@gmail.com>
X-Original-To: multiformats@ietfa.amsl.com
Delivered-To: multiformats@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EAC8C15109E for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 07:05:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i3EF88WilA_m for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 07:05:13 -0700 (PDT)
Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2AA8C15107B for <multiformats@ietfa.amsl.com>; Thu, 7 Sep 2023 07:05:13 -0700 (PDT)
Received: by mail-yb1-xb2a.google.com with SMTP id 3f1490d57ef6-d80121cba8cso69722276.0 for <multiformats@ietfa.amsl.com>; Thu, 07 Sep 2023 07:05:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694095512; x=1694700312; darn=ietfa.amsl.com; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4ReJ5I49LO3bEZKDwlCPGsbUWGQNzT5vwoMH//Jf0II=; b=cl6roLNZYVHVnsQmpkFxO9//0qQw+oy+dTx72otijxF8zQ1LZrtCIyaAZFdry8kMLd 1ryerqLCSFwrA6pVrU5QvKd0EPzzWd39CT7o4nzO116X6V+aQ6/DsSNWhtiUyxKCjzSu wrrxnM3i9XZ1src4opZdV9t11FMnCPBJANQPl4nkPEVuLv6Y3g0sQA182F+Qmr9a9FKt huEMBm3/7r6mX5uejIShPgbDpDSKl8AEBvYOEpwUvUdqm8TOWeAg0OmSWwCxGHNKLT7P 9lL2YmUDiddN19jlHeyk+mLCxMlnfLD828lqi7o3uyiGnLQHOWNPBtwRHDL/Nwk8ugks sXpg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1694095512; x=1694700312; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4ReJ5I49LO3bEZKDwlCPGsbUWGQNzT5vwoMH//Jf0II=; b=BtiBOI2eH9MRYc8+UAthc3uDRJSX7DDT6YG2EAB7tEd/wIxI4EcMaPHg/U84PjCIAI 6IwQquHkul1OjV4maFRJqgbAlv2xsCFP84hnTMaN+2i9Tnz4h+mceDCjCid3ro3Hbvp7 37erBn+KlYLBP1wfg+fDOBnzck48IzR9UJa5la6dhsV+6WkWcDJSQ5XDg/HJHG5Z/Fck fW46XX6MSBg4dGk+OMbs4vBlGCP61XLJT6OeLNkKWboZPnQ7SfA/zoM4s2Odbiqdskkt d6Ksbu3uOMyXhgFSi8ZuCbzMPBsSqULtBrvW8CXIhV6i1xqazJPza5M0kOhuICGgsOyY bVSg==
X-Gm-Message-State: AOJu0Yw159/nCLjGWXKsCL3Q1kIp9SOymzoiu7v/a0xNUkBfKzU4lcLf aAoaAjncIXoMbhdBq0XlEmn7wm1iGajzUqHGoaA=
X-Google-Smtp-Source: AGHT+IFQHfs9Ee37Z8SzM+AJQV4KE4D38/NgClQL+44+OuHrwhTTIT5fgBXvx1Ar/cEK02BHlmlVJgJODgfEsszWq70=
X-Received: by 2002:a25:2491:0:b0:bf2:b00c:f09 with SMTP id k139-20020a252491000000b00bf2b00c0f09mr19390396ybk.40.1694095511971; Thu, 07 Sep 2023 07:05:11 -0700 (PDT)
MIME-Version: 1.0
References: <I_83nnUevaY5K4VguFlDBh5qKl3Oe6PV_KnCD8QELrnCJqzE3_lU9x2AYiIIpCbxTudTQsQgjE5eEPprdlwDVFPsaKU-uZfo3_DJm5CMX7s=@learningproof.xyz>
In-Reply-To: <I_83nnUevaY5K4VguFlDBh5qKl3Oe6PV_KnCD8QELrnCJqzE3_lU9x2AYiIIpCbxTudTQsQgjE5eEPprdlwDVFPsaKU-uZfo3_DJm5CMX7s=@learningproof.xyz>
From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Thu, 07 Sep 2023 16:04:59 +0200
Message-ID: <CAKaEYhLuSuaEOxPtHO8UN4zqZNvbcaAcK14rMcaqTpJnX3TJmw@mail.gmail.com>
To: bumblefudge von CASA <bumblefudge@learningproof.xyz>
Cc: "multiformats@ietfa.amsl.com" <multiformats@ietfa.amsl.com>
Content-Type: multipart/alternative; boundary="00000000000034eded0604c559fb"
Archived-At: <https://mailarchive.ietf.org/arch/msg/multiformats/yksG22pBUel7piPy3vL-wqjv9OI>
Subject: Re: [Multiformats] Multiformats Considered Harmful
X-BeenThere: multiformats@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion related to the various Multiformats data formats <multiformats.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/multiformats>, <mailto:multiformats-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/multiformats/>
List-Post: <mailto:multiformats@ietf.org>
List-Help: <mailto:multiformats-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/multiformats>, <mailto:multiformats-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Sep 2023 14:05:18 -0000

čt 7. 9. 2023 v 14:10 odesílatel bumblefudge von CASA <
bumblefudge@learningproof.xyz> napsal:

> Dear Prof Bormann and Melvin Carvalho:
>
> Thank you very much for your bibliographic erratum and expression of
> interest, respectively.  They're incorporated below in the longer response.
>
> Dear Mike Jones:
>
> Thank you for taking the time to spell out your feedback. I believe that
> clarifying the spec in a few places can address most of your concerns.
>
> 1. Multiformats institutionalize the failure to make a choice, which is
> the opposite of what good standards do. Good standards make choices about
> representations of data structures resulting in interoperability, since
> every conforming implementation uses the same representation. In contrast,
> multiformats enable different implementations to use a multiplicity of
> different representations for the same data, harming interoperability.
> https://datatracker.ietf.org/doc/html/draft-multiformats-multibase-03#appendix-D.1
> defines 23 equivalent and non-interoperable representations for the same
> data!
>
>
> Multiformats are focused on evolvability. They were designed as a
> component of protocol suites intended to be actively maintained and to
> remain robust over an extended period of time such that protocol
> ossification was identified as a significant threat. You can think of them
> as reusable extensibility that promotes active use (in the RFC9170
> <https://www.rfc-editor.org/rfc/rfc9170.html> sense), and as such they
> join a growing body of work on long-term protocol robustness.
>
> I didn't want to make the draft too philosophical, but I agree that this
> intent could be clarified there more explicitly.
>
> 2. The stated purpose of "multibase<
> https://www.ietf.org/archive/id/draft-multiformats-multibase-08.html>" is
> "Unfortunately, it's not always clear what base encoding is used; that's
> where this specification comes in. It answers the question: Given data 'd'
> encoded into text 's', what base is it encoded with?", which is wholly
> unnecessary. Successful standards DEFINE what encoding is used where. For
> instance, https://www.rfc-editor.org/rfc/rfc7518.html#section-6.2.1.2
> defines that "x" is base64url encoded. No guesswork or prefixing is
> necessary or useful.
>
>
> Successful standards for self-describing encodings have also been adopted
> by communities rather than relying on fixed registries and protocols that
> define encodings in advance and universally. While it could be argued that
> interoperability with those kinds of protocol could be difficult,
> inefficient, or, in some cases, even zero-sum, I do not feel a universal
> claim to harmfulness is warranted, particularly without defining the
> criterion and context of interoperability.
>
> 3. Standardization of multiformats would result in unnecessary and
> unhelpful duplication of functionality - especially of key representations.
> The primary use of multiformats is for "publicKeyMultibase" - a
> representation of public keys that are byte arrays. For instance, the only
> use of multiformats by the W3C DID spec<https://www.w3.org/TR/did-core/>
> is for publicKeyMultibase. The IETF already has several perfectly good key
> representations, including X.509, JSON Web Key (JWK), and COSE_Key. There's
> not a compelling case for another one.
>
>
> Representation of key material bytes is not the "primary use" at all--
> it's not even in scope of this working group or mentioned in any of the
> drafts. The VCWG usage of multibase you mention was built on an earlier
> (and less complete) version of the multibase specification, which I hope is
> backwards compatible with the more recent drafts; since such usage is
> downstream and out of scope of this WG, however, I have not bothered
> confirming this despite being a member of both WGs.
>
> The primary goal of bringing a Multiformats WG to IETF is to balance the
> IPFS usecases that primarily drove these specifications and algorithms
> until now  against such external use cases as the `publicKeyMultibase` one
> you mention, which emerged outside the field of vision of the multiformats
> community. These new use cases compose the multiformats building blocks in
> different configurations or cherry-pick individual specifications or
> registries for new purposes. Melvin's request that the CID specification be
> hardened at IETF is a great example of non-IPFS use cases we would like to
> enable and de-risk by hardening each specific layer enough to be used
> independently. (Sidenote for Melvin: the main reason CID is not in-scope
> for the proposed WG is that it depends squarely on both multibase and
> multihash, so hardening CID concurrently with its dependencies would
> pre-empt breaking changes). The IETF seems to me the best place to
> reconcile what's already built with novel, independent, or external usages
> one layer at a time, in a deliberative and open way.
>

I appreciate the detailed explanation and the recognition of the questions
and concerns I raised earlier regarding the CID and its integration with
the URI scheme. However, I feel that there might be some misunderstandings
about my intentions and the context in which they were presented.

When I initiated the discussion about associating a CID with a URI scheme,
my primary objective was to use a widely deployed hash within the context
of a URI. This was in line with W3C's approach and the principles of linked
data. The idea was not to introduce a new standard within the IETF but to
enable the use of an existing identifier in a URI context, independently of
transport or DHT considerations.

While I understand and respect the broader goals of the Multiformats WG
within the IETF, it's essential to clarify that my interest was not in
pushing for the CID specification to be hardened at the IETF. Instead, I
was more inclined towards ensuring that this widely recognized identifier
could be adapted as a URI, as this would align better with existing web
standards and practices.

It's somewhat disheartening to see the direction in which this
specification has evolved. I hope that my clarification provides a clearer
picture of my stance and the context in which my initial questions were
raised.

I look forward to a productive dialogue that aligns more closely with the
original intentions and the broader goals of web standardization.


>
> To make it more explicit, the "primary use case" to date of all the
> multiformats has been the annotation and compact representation of
> heterogenous and emergent forms of data balancing efficiency against
> open-endedness of transport, encoding, and structure. This has happened in
> the context of developing the IPFs and libp2p protocol suites and
> optimizing their robustness and evolvability. That only a subset of key
> representations are expressable natively in today's multiformats is an
> index of comprehensive key representation not having been a goal, and of
> different tradeoffs having been made. Indeed, allowing more conventional
> and standardized key expressions to be used in multiformats contexts is a
> goal of the working group, insofar as it would allow new and independent
> use-cases for each layer and specification. Is there anything in the
> charter or drafts that states otherwise?
>
> 4. publicKeyMultibase can only represent a subset of the key types used in
> practice. Representing many kinds of keys requires multiple values - for
> instance, RSA keys require both an exponent and a modulus. By comparison,
> the X.509, JWK, and COSE_Key formats are flexible enough to represent all
> kinds of keys. It makes little to no sense to standardize a key format that
> limits implementations to only certain kinds of keys.
>
>
> Standardizing key formats is out of scope of this charter and these
> specifications so I'm not sure what this is referring to.
>
> 5. The "multihash<
> https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html>"
> specification relies on a non-standard representation of integers called
> "Dwarf". Indeed, the referenced Dwarf document lists itself as being at
> http://dwarf.freestandards.org/ - a URL that no longer exists!
>
>
> The DWARF debugging standards still exist and are still maintained and
> advanced:
> https://dwarfstd.org/dwarf5std.html
>
> Thanks very much to Professor Bormann for the RFC reference, that would
> have been a better normative anchor.
>
> 6. The "Multihash Identifier Registry" at
> https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html#mh-registry
> duplicates the functionality of the IANA "Named Information Hash Algorithm
> Registry" at
> https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg,
> in that both assign (different) numeric identifiers for hash functions. If
> multihash goes forward, it should use the existing registry.
>
>
> Taken in isolation as registries, they certainly seem duplicative, but I
> hope that IETF community will agree with me that the confusion of two
> overlapping solution spaces can be justified by how differently they are
> used. The numeric prefixes are used in different algorithms to prefix
> different shapes of data from differently-governed ecosystems in different
> schemata and optimizing for different variables. On a purely technical
> level, the padding and length/segmentation practices are quite different,
> so swapping multiformats prefixes for IANA named-info#hash-alg prefixes
> wouldn't make the resulting data any more interoperable. If anything, this
> 1:1 substitution would actually hinder interoperability by apparent but not
> substantial conformance to the equivalent IETF specifications that use the
> latter.
>
> > 7. It's concerning that the draft charter<
> https://msporny.github.io/charter-ietf-multiformats/> states that
> "Changing current Multiformat header assignments in a way that breaks
> backward compatibility with production deployments" is out of scope.
> Normally IETF working groups are given free reign to make improvements
> during the standardization process.
>
> The intent there is to of course empower the group to make improvements,
> as already agreed, but to be mindful of compatibility with deployed
> implementations. There are multiple implementations that have been deployed
> in production settings for several years and multiformats are used in
> protocols that are deployed at scale and included in massive amounts of
> publicly deployed URLs and stored content. In fact, the wide deployment of
> multiformats and the growing number of implementations are strong
> motivating factors behind this group. Breaking compatibilty in flight seems
> unwise if it can be avoided.
>
> > 8. Finally, as a member of the W3C DID and W3C Verifiable Credentials
> working groups, I will state that it is misleading for the draft charter to
> say that "The outputs from this Working Group are currently being used by
> ... the W3C Verifiable Credentials Working Group, W3C Decentralized
> Identifiers Working Group...". The documents produced by these working
> groups intentionally contain no normative references to multiformats or any
> data structures derived from them. Where they are referenced, it is
> explicitly stated that the references are non-normative.
>
> This feels a little more like W3C business than corrections to the
> proposal text, if I'm being honest. Would it be more precise to verbosely
> make explicit that "prototypes and implementations built by members of the
> VCWG use earlier forms of the multibase specification"? The current charter
> does not claim that W3C *specifications* rely normatively on the multibase
> specification, which they could not have done if they wanted to because the
> multibase specification is currently in community draft and would not meet
> W3C's editorial criteria for a normative reference.
>
> ---
>
> In summary, I agree that we can be more explicit as to the benefits and
> existing use of multiformats. Barring further input, I will try getting
> these changes in before the IESG considers the proposal documents.
>
> ---
> bumblefudge
> janitor @Chain Agnostic Standards Alliance
> <https://github.com/chainagnostic/CASA>
> contractable via learningProof UG <https://learningproof.xyz>
> mostly berlin-based
> --
> Multiformats mailing list
> Multiformats@ietf.org
> https://www.ietf.org/mailman/listinfo/multiformats
>