Re: [Multiformats] Multiformats Considered Harmful
Aaron Goldman <goldmanaaron@gmail.com> Mon, 18 September 2023 18:49 UTC
Return-Path: <goldmanaaron@gmail.com>
X-Original-To: multiformats@ietfa.amsl.com
Delivered-To: multiformats@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D4B54C14F6EC for <multiformats@ietfa.amsl.com>; Mon, 18 Sep 2023 11:49:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2h9GbKRdCD4A for <multiformats@ietfa.amsl.com>; Mon, 18 Sep 2023 11:49:49 -0700 (PDT)
Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 85C00C151073 for <multiformats@ietf.org>; Mon, 18 Sep 2023 11:49:49 -0700 (PDT)
Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-1d6b5292aebso1532225fac.1 for <multiformats@ietf.org>; Mon, 18 Sep 2023 11:49:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695062988; x=1695667788; darn=ietf.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=OQWgLFhC6Hp907YetHNb1aOFqT/Sz+oXwBmEfKGW3m0=; b=XdP107X/YOVHOu20NRuVWn84S071Z3kKG82tIE64BuXsj6qIia0CMWoWcRzEWeo49h VH3h+15ZkOAQU1rCZXVwh8Wm9Q637g6z7NsLntvnWj0/og3QVTtXRKJ07SPCEa0/LBnm zqm1BR1yuVsYgUJ6GIEvVSbs97G/Z1ih4/lrs6QJN8qSpXUlx5RQlsQ7riz2c6zokG6g tU3uTh0PMi/P0zgpQauqaDHY/zTP2fgLNaLau0+vLkFiFEDFWSvjMpH4CHDr52hOb1ob VRAeC7pqn/dOlBqD6J7JdAezW6X9JcdVe/ZQDo/QWhtE4ySstnfBOc5AWn35uclw5Kme WTDA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695062988; x=1695667788; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OQWgLFhC6Hp907YetHNb1aOFqT/Sz+oXwBmEfKGW3m0=; b=hHLQKRf3FvunGUTCCxMZtk4LKIC1DU+9z9+Yb2A6K/fe74K7qcYv9LoQOqto2kHJdl NlXjsEZzvppBit1oKhhKsle+jffrqrC+kFrcXk+dpftfZgwqVzlYuhBBClQqMuCt4gv7 g0RK6m2FB9WAPLnwphIN4dfFBC1bk5fcdcAmHy8kOo5LyWv/j0WfchkLv/ZQII+ncSC+ 7QydFMXnxDthKzZ5Yc/PT2UxsR7h6oL/5S0uw3E7Rfy9bkIH1SawurQVhG5zNHbNVimh KqmBpuxFQK3NLlTRGO0R+WsQqY9a20oK0sbDGP/aK5BptUA/TVLyQ2RFFsL2dqC1/dvB 8Tog==
X-Gm-Message-State: AOJu0Yw+n7LGzMqkt/Iszhl3BPlFXeKeUq+dFawe3ttK6TpP+18G7zwQ 4pMNmejzzx/ZGf9ScwI9wYPnGEk6M3FLm97G+hRg/8TneOE=
X-Google-Smtp-Source: AGHT+IF9J2t/ueTMHqWgtJR3kmO4hovszXFL9mAIIq05AjFNz7mQBZnmAFiSPq1ol3xGeM0jX0zsewHsxOGY+4xuH7A=
X-Received: by 2002:a05:6870:5cce:b0:1d6:b404:a50a with SMTP id et14-20020a0568705cce00b001d6b404a50amr8120730oab.31.1695062988305; Mon, 18 Sep 2023 11:49:48 -0700 (PDT)
MIME-Version: 1.0
From: Aaron Goldman <goldmanaaron@gmail.com>
Date: Mon, 18 Sep 2023 11:49:37 -0700
Message-ID: <CAE6sXqh5=xp1YG5ZKfcNp=_OfUTOJ7Q050U_JeUQMgR7_H3FUQ@mail.gmail.com>
To: multiformats@ietf.org
Content-Type: multipart/alternative; boundary="0000000000004a41380605a69b8a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/multiformats/w-etaODmYp82mfRa9FU7yo12ia0>
Subject: Re: [Multiformats] Multiformats Considered Harmful
X-BeenThere: multiformats@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion related to the various Multiformats data formats <multiformats.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/multiformats>, <mailto:multiformats-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/multiformats/>
List-Post: <mailto:multiformats@ietf.org>
List-Help: <mailto:multiformats-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/multiformats>, <mailto:multiformats-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Sep 2023 18:49:53 -0000
Sorry for the Markdown was composed with GitHub Markdown. https://github.com/msporny/charter-ietf-multiformats/issues/2 ### 1. > Multiformats institutionalize the failure to make a choice, which is the opposite of what good standards do. Good > standards make choices about representations of data structures resulting in interoperability, since every > conforming implementation uses the same representation. In contrast, Multiformats enable different implementations > to use a multiplicity of different representations for the same data, harming interoperability. > [ datatracker.ietf.org/doc/html/draft-Multiformats-Multibase-03#appendix-D.1](https://datatracker.ietf.org/doc/html/draft-Multiformats-Multibase-03#appendix-D.1) > defines 23 equivalent and non-interoperable representations for the same data! **Multibase** specifically and **Multiformats** more generally are standards for decoupling. A good example of a decoupling standard is [IPv4]( https://en.wikipedia.org/wiki/Internet_Protocol_version_4)/[IPv6](https://en.wikipedia.org/wiki/IPv6 ) and the [IP protocol numbers]( https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers). IPv4 has `Protocol` and IPv6 has the `Next Header` but they share the same [IANA registry]( https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml). We could call this a "failure to make a choice" as IP did not choose the format of the layers above and below IP, or we could view it as a deliberate decoupling of the layers of the network stack. Whether it was a good or bad design, it did enable innovation in what types of content IP is capable of encapsulating. There are 146 protocols in the registry and some routers don't implement them all, just preferring ICMP, UDP, and TCP but IPv4/IPv6 have still proved useful. The **Multibase** standard solves the problem of representing bytes in text strings with restricted character sets, without needing to know in advance what the restrictions will be. This is independent and separate from all the other **Multiformat** standards. The **Multiformat** standard solves the problem of providing a "tag" to specify what the next "value" is, same as IPv4's `Protocol` header or HTTP's `Content-Type` header. ### 2. > The stated purpose of "[Multibase]( https://www.ietf.org/archive/id/draft-Multiformats-Multibase-08.html)" is > "Unfortunately, it's not always clear what base encoding is used; that's where this specification comes in. It > answers the question: Given data ‘d' encoded into text ‘s', what base is it encoded with?", which is wholly > unnecessary. Successful standards DEFINE what encoding is used where. For instance, > [ rfc-editor.org/rfc/rfc7518.html#section-6.2.1.2](https://www.rfc-editor.org/rfc/rfc7518.html#section-6.2.1.2) > defines that "x" is base64url encoded. No guesswork or prefixing is necessary or useful. Some standards do specify a specific encoding. **Multibase** will not prevent any past or future standard from specifying that a text field is `Base64url`, for example. It dose enables future standards to specify that bytes are encoded as a **Multibase** string. **Multibase** is a set of encodings that will allow an array of bytes to be encoded as text with restriction on character set that may not always be known in advance. If we had a protocol that had a 32-byte number, and we needed to represent those bytes as text, we could represent them as: | Base | Literal | |-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | b256(bytes) | (non-ascii bytes not representable here) | | b85 | <FLd+nEV_Rn)~#~nQyryC$2%{WSf&rq?MT)cv84k | | b64 | 47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= | | b32 | 4OYMIQUY7QOBJGX36TEJS35ZEQT24QPEMSNZGTFESWMRW6CSXBKQ==== | | b16 | E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 | | integer_literal | 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 | | integer_literal | 102987336249554097029535212322581322789799900648198034993379397001115665086549 | | integer_literal | 0o16166061041230770160244657576462114557562220475344074431115623231222254621557024534125 | | integer_literal | 0b1110001110110000110001000100001010011000111111000001110000010100100110101111101111110100110010001001100101101111101110010010010000100111101011100100000111100100011001001001101110010011010011001010010010010101100110010001101101111000010100101011100001010101 | By using an integer literal, I can both describe the number and the base that the number is represented in. In this case, we represent hex in a text that only needs to be able to support `0123456789abcdefx`, binary with just `01b`, and so on. **Multibase** takes this further by requiring that the first byte (indicating the base) is one of the bytes from the alphabet of the encoding. This way we don't add a character requirement for no value. An example of this adding value is when **Multibase** was chosen for IPFS CIDs. The CIDs were traditionally in `base58btc`, which is case-sensitive. This worked well for representing bytes in the restricted text environment of file paths and URI paths. This could have easily been specified as a `base58btc` string, but fortunately they chose **Multibase** to decouple the bytes of the CID from the string representation. When the time came that they wanted to put CIDs into subdomains, the case-insensitive subdomains were a _more_ restricted text environment that they had not anticipated. They switched to `base32` which was not case-sensitive and thus able to represent the same bytes in a more restricted environment. **Multibase** is orthogonal to **Multiformats** and should be standardized as a way to represent bytes in a restricted text environment that is restricted in ways that are irrelevant to the bytes being represented. If we don't know whether our data will need to be represented as compact arbitrary bytes, 7-bit safe ascii, JSON non-escaped ascii, CSV non-escaped ascii, TSV non-escaped ascii, URL path-safe ascii, domain-name-safe ascii, decimal numbers only, some not yet known but soon to be important environment, etc. then encoding the bytes as **Multibase** has decoupling value. ### 3. > Standardization of Multiformats would result in unnecessary and unhelpful duplication of functionality – especially > of key representations. The primary use of Multiformats is for "publicKeyMultibase" – a representation of public > keys that are byte arrays. For instance, the only use of Multiformats by the [W3C DID spec](https://www.w3.org/TR/did-core/) > is for publicKeyMultibase. The IETF already has several perfectly good key representations, including X.509, JSON > Web Key (JWK), and COSE_Key. There's not a compelling case for another one. The standardization of **Multiformats** is independent of whether IETF chooses to standardize `publicKeyMultibase`. For example, the IPv4 `Protocol` header registers `70` `VISA` `VISA Protocol`. This does not imply that IETF needs to specify [VISA Protocol]( https://en.wikipedia.org/wiki/Virtual_instrument_software_architecture). In fact, as far as we can tell, it is the IVI Foundation that maintains that standard. In the exact same way, the only interaction between **Multiformats** standardization and `publicKeyMultibase` is that `publicKeyMultibase` could use the **Multiformats** registry to map numbers to key representations. Any flaws in `publicKeyMultibase` are no better an argument against standardization of **Multiformats** than the flaws in VISA Protocol are against standardization of IPv4 and the IANA protocol-numbers registry. If X.509, JSON Web Key (JWK), or COSE_Key become the standard way to represent keys for the web then `publicKeyMultibase` could just add a **Multiformats** registry entry for X.509 or JWK, and `publicKeyMultibase` would just be a wrapper around those representations. COSE is already present in the registry. ### 4. > publicKeyMultibase can only represent a subset of the key types used in practice. Representing many kinds of keys > requires multiple values – for instance, RSA keys require both an exponent and a modulus. By comparison, the X.509, > JWK, and COSE_Key formats are flexible enough to represent all kinds of keys. It makes little to no sense to > standardize a key format that limits implementations to only certain kinds of keys. Please see above. `publicKeyMultibase` is outside the scope of this working group, which is [tasked]( https://github.com/msporny/charter-ietf-multiformats) with producing the following artifacts: > 1. An RFC specifying multibase usage > 2. An RFC defining an independent multibase registry and populating it with today's already-implemented stable and final values > 3. An RFC defining a registry-group for all the multicodecs, empty at inception, with registration process and group-wide constraints on registration values > 4. An RFC specifying multihash usage > 5. An RFC defining a multihash registry within the multicodecs registry group and populating it with today's already-implemented stable and final values The **Multiformat-varint** spec is also pulled in as it is needed to specify the length in **Multihash** and **Multiformat** with sized payloads. ### 5. > The "[multihash]( https://www.ietf.org/archive/id/draft-Multiformats-multihash-07.html)" specification relies on a > non-standard representation of integers called "Dwarf". Indeed, the referenced Dwarf document lists itself as being > at [http://dwarf.freestandards.org](http://dwarf.freestandards.org/) – a URL that no longer exists! We agree here - the **Multiformats-varint** is close to but not exactly Dwarf. This is due to the fact that the **Multiformats-varint** is limited to 9 bytes. It is a 1-to-9 byte representation of an unsigned int63. from 0x00(0) to 0x7FFFFFFF_FFFFFFFF(9223372036854775807) this means the decoded value will always fit in either a signed int64 or an unsigned int64. If the most-significant-bit of a byte is 0, this is the last byte of the **Multiformats-varint**. If it is 1, there is at least one more byte present in the **Multiformats-varint**. The 7 remaining bits are the payload bits. You can shift the payload bits left by `7 * (byte number)` and `|` (bitwise-OR) them in to get the decoded number. ``` | length in bytes | Encoded bits | Bits | |-----------------|--------------|----------------------------------------------------------------------------------| | 1 | 7 | 0xxxxxxx | | 2 | 14 | 1xxxxxxx 0xxxxxxx | | 3 | 21 | 1xxxxxxx 1xxxxxxx 0xxxxxxx | | 4 | 28 | 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx | | 5 | 35 | 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx | | 6 | 42 | 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx | | 7 | 49 | 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx | | 8 | 56 | 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx | | 9 | 63 | 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx | | | | 7 0, 14 8, 21 15, 28 22, 35 23, 42 36, 49 43, 56 50, 63 57 | ``` **Multiformats-varint** is such a simple varint that there is no reason to point anywhere else. The **Multiformats-varint** should be specified by this working group alongside **Multibase** and **Multihash**. Any reference to Dwarf is simply unnecessary as it is clearer to specify **Multiformats-varint** rather than trying to describe it relative to a similar but non-identical varint. ### 6. > The "Multihash Identifier Registry" at [ ietf.org/archive/id/draft-Multiformats-multihash-07.html#mh-registry](https://www.ietf.org/archive/id/draft-Multiformats-multihash-07.html#mh-registry) > duplicates the functionality of the IANA "Named Information Hash Algorithm Registry" at > [ iana.org/assignments/named-information/named-information.xhtml#hash-alg](https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg) , > in that both assign (different) numeric identifiers for hash functions. If multihash goes forward, it should use > the existing registry. "Not all uses of these names require use of the full hash output -- truncated hashes can be safely used in some environments. For this reason, we define a new IANA registry for hash functions to be used with this specification so as not to mix strong and weak (truncated) hash algorithms in other protocol registries." -- [rfc6920: Naming Things with Hashes](https://www.iana.org/go/rfc6920) The goal of the named-information registry is to be a hash function and prefix length for the binary encoding of a `ni://` or a `nih://`. This is limited to a 6-bit field but the **Multiformats** registry intends to support more than 64 algorithm/size pairs. | hash | sizes | |-----------|-------| | identity | 1 | | sha1 | 1 | | sha2 | 9 | | sha2a | 1 | | sha3 | 4 | | keccak | 5 | | blake3 | 1 | | md4 | 1 | | md5 | 1 | | blake2b | 64 | | blake2s | 32 | | skein256 | 32 | | skein512 | 64 | | skein1024 | 128 | We can't fit hundreds of hash function length pairs in a 64-entry registry. This would break backwards compatibility because it changes which numbers match which hash functions. It pollutes the registry for rfc6920 implementors by including non-cryptographically secure hash functions. Lastly, the **Multiformats** registry already contains more than 64 hash functions and would not fit in the Named Information Hash Algorithm Registry. It is better to have hash function and length as two different fields as in **Multihash**. ### 7. > It's concerning that [the draft charter]( https://msporny.github.io/charter-ietf-Multiformats/) states that > "Changing current Multiformat header assignments in a way that breaks backward compatibility with production > deployments" is out of scope. Normally IETF working groups are given free rein to make improvements during the > standardization process. This may be a distinction without a difference. We certainly could empower the working group to make backwards incompatible changes, but they will try not to have any unnecessary breaking changes. ### 8. > Finally, as a member of the W3C DID and W3C Verifiable Credentials working groups, I will state that it is > misleading for the draft charter to say that "The outputs from this Working Group are currently being used by … the > W3C Verifiable Credentials Working Group, W3C Decentralized Identifiers Working Group…". The documents produced by > these working groups intentionally contain no normative references to Multiformats or any data structures derived > from them. Where they are referenced, it is explicitly stated that the references are non-normative. This is a good note. The draft charter should probably be clear that **Multiformats** are being used in Verifiable Credentials and Decentralized Identifiers in production. There are multiple existing independent implementations of this technology enabling Verifiable Credentials and Decentralized Identifiers to be useful. While these specs contain no normative references, this registry provides the ability to make Verifiable Credentials and Decentralized Identifiers that are better decoupled from the data structures that they contain, and will therefore be flexible in the face of future evolution.
- [Multiformats] Multiformats Considered Harmful Michael Jones
- Re: [Multiformats] Multiformats Considered Harmful Murray S. Kucherawy
- Re: [Multiformats] Multiformats Considered Harmful Michael Jones
- Re: [Multiformats] Multiformats Considered Harmful Carsten Bormann
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Carsten Bormann
- Re: [Multiformats] Multiformats Considered Harmful bumblefudge von CASA
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Orie Steele
- Re: [Multiformats] Multiformats Considered Harmful Orie Steele
- Re: [Multiformats] Multiformats Considered Harmful Michael Jones
- Re: [Multiformats] Multiformats Considered Harmful Richard Barnes
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Robin Berjon
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Robin Berjon
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful bumblefudge von CASA
- Re: [Multiformats] Multiformats Considered Harmful Robin Berjon
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Robin Berjon
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Robin Berjon
- Re: [Multiformats] Multiformats Considered Harmful Orie Steele
- Re: [Multiformats] Multiformats Considered Harmful Martin J. Dürst
- Re: [Multiformats] Multiformats Considered Harmful Martin J. Dürst
- Re: [Multiformats] Multiformats Considered Harmful Aaron Goldman
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho
- Re: [Multiformats] Multiformats Considered Harmful Manu Sporny
- Re: [Multiformats] Multiformats Considered Harmful Melvin Carvalho