[media-types] Thoughts on suffixes, single and multiple

Mark Nottingham <mnot@mnot.net> Wed, 03 April 2024 06:30 UTC

Return-Path: <mnot@mnot.net>
X-Original-To: media-types@ietfa.amsl.com
Delivered-To: media-types@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 84F5EC14F6AF for <media-types@ietfa.amsl.com>; Tue, 2 Apr 2024 23:30:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level:
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=mnot.net header.b="hKlwJ0Xj"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="TfIR59p2"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZNm6oMbomN71 for <media-types@ietfa.amsl.com>; Tue, 2 Apr 2024 23:29:58 -0700 (PDT)
Received: from wfout5-smtp.messagingengine.com (wfout5-smtp.messagingengine.com [64.147.123.148]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C32BBC14F694 for <media-types@ietf.org>; Tue, 2 Apr 2024 23:29:58 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfout.west.internal (Postfix) with ESMTP id D26C71C000E4 for <media-types@ietf.org>; Wed, 3 Apr 2024 02:29:56 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 03 Apr 2024 02:29:56 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mnot.net; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:subject :subject:to:to; s=fm3; t=1712125796; x=1712212196; bh=MsfTbpstzc Aot9Z8t7QpRdU1NoVCBs4A8Af+TY6GnHM=; b=hKlwJ0XjgRgum8n5fvWcb8HuO0 +FHNvQ2FYF36U6UZ5W7phugSBn4QkzBq0gWP4nsPWBCoUrTznClZ+xcatqUsLAh3 ZvxVXSLMhbNHvEuUROMV7mVTwcqshr+vHu4BHYkGN8kKbQA59zn7PwUdI56tq2bd VxxHmjeGeL5mO8J2tRKrrB9HruhpMXCvnM0ABBzjtJeG/xLjJ5Pu4YAExY9+hp3e 0Gud6WrZBkxY2LRo5jdD5BJDmrJ8uRpX3jyzHF3jOkVMTdi9CWiRjYaUGE/vFOEI wLamwuZ6DjH1+21m1ISMzSFnIKWOSEAcJdLHAFa7C4kOjEtLYVY8gjxa5icg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712125796; x=1712212196; bh=MsfTbpstzcAot9Z8t7QpRdU1NoVC Bs4A8Af+TY6GnHM=; b=TfIR59p2y8oUAFpAFndcv8VbQrFAhyX4/49RySrYhiik Sys29rRHZ3ayTTHhX6oz5+fscl+ojVzhdU/+6AAMqYDb1PR9PMp9LmqMRvaClol3 1DfvziqoA/lvtn4dZHtS65mXH0IM44EAoI5Vsbqo53bSziI1HHLyexLQ4jJKl/YT tZA/GHSAZyN2Yp/pIrw6LlbnMa6QnUfwplGOFM6GpnVdQuxdDHqo9ryD3S6XU0TC yGjVitc7kxqSUlTs10kd/j2i6mf9xCJSKPK4V0TpW1SxkI+wD3yWVwiXNX9Fa7qf 66KnngSlLSyaNum7TYDVsyJOxVwROhZJZU60N/w20A==
X-ME-Sender: <xms:ZPcMZurr-0Wl8ePJuYTuXJdsT3z2_yi_5dD9rHJ0TGqt6XyGbwhvag> <xme:ZPcMZsrK5dG-zkMQMnQ9rUFCHggKCIFxtpgvOBNPRc5nzUt16kmVO2okqZ5S25vnV 2MWi787jvCmjQFhww>
X-ME-Received: <xmr:ZPcMZjMsoegcUJ2_TY0p19rCpYeKtZ0J1NgzYSGcT_0HHF2x5R_-71YrcUdXOtllBgFICwoVDaYwTXKDvY-syq9uklBogCGMIEv2akLONf-oPz85vaXx-p8E>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudeffedguddtkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhtgfgggfukfffvffosehtqhhmtdhhtddvnecuhfhrohhmpeforghrkhcu pfhothhtihhnghhhrghmuceomhhnohhtsehmnhhothdrnhgvtheqnecuggftrfgrthhtvg hrnhepgfevkeeiieegleejkeefvdffuefflefhuddutdetfeduleefffejuddtvdegfeff necuffhomhgrihhnpehmnhhothdrnhgvthenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehmnhhothesmhhnohhtrdhnvght
X-ME-Proxy: <xmx:ZPcMZt4PPRikHKURXMa7W-ks4diJMdpV7mrC7ml1xF-Gk6P_W4prQw> <xmx:ZPcMZt75LI_TkfV4rgC-AlqAjgMAGlFuGOoKV6uBSIKrM3OAl8As0Q> <xmx:ZPcMZthzMrzNT73_uhgI3dDgjXhYtJxbg1WyE32BzxZegaj8e1Zk-g> <xmx:ZPcMZn70TAO832PYjSGCVTXUcP-qKxX4oW8mEA2Zn6Vq7V_VbFHwWQ> <xmx:ZPcMZnl8VGTJMTqes2bWxK7UwwulAMm5KOJOp2ld3y8Pvzd5gDCYvBrz>
Feedback-ID: ie6694242:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA for <media-types@ietf.org>; Wed, 3 Apr 2024 02:29:55 -0400 (EDT)
From: Mark Nottingham <mnot@mnot.net>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\))
Message-Id: <2E20FEDE-C766-43EE-A6E2-1FB63E79CF0B@mnot.net>
Date: Wed, 03 Apr 2024 17:29:51 +1100
To: IETF Media Types <media-types@ietf.org>
X-Mailer: Apple Mail (2.3774.500.171.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/media-types/iWc8TLcWOyO0jyqeiuF9VCZClIs>
Subject: [media-types] Thoughts on suffixes, single and multiple
X-BeenThere: media-types@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "IANA mailing list for reviewing Media Type \(MIME Type, Content Type\) registration requests." <media-types.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/media-types>, <mailto:media-types-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/media-types/>
List-Post: <mailto:media-types@ietf.org>
List-Help: <mailto:media-types-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/media-types>, <mailto:media-types-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Apr 2024 06:30:03 -0000

After the meeting in Brisbane, some of us went aside to continue to the multiple suffixes discussion. There, we quickly came to the conclusion that we should deprecate the concept of suffixes in media subtypes -- i.e., they would still be syntactically allowed, but would have no meaning or registry. Martin Thomson and I took an action to write something down about this.

Once I was home, I started to think more carefully about this and do research. One thing that I haven't yet seen is a summary of how suffixes are currently used (apologies if I missed someone else's effort there). These are the counts for each suffix in the registry that I came up with about a week ago:

+xml = 439
+json = 145
+ber = 0
+cbor = 16
+der = 1
+fastinfoset = 1
+wbxml = 7
+zip = 24
+tlv = 1
+json-seq = 2
+sqlite = 1
+jwt = 6
+gzip = 2
+cbor-seq = 4
+zstd = 0
+yaml = 2
+cose = 0

As you can see, we have a few very widely used suffixes (in a registry of 1,588 entries as of that survey), and many very seldom used ones - with a few not used at all.

The widespread use of +xml and +json in particular made me more cautious about deprecating suffixes altogether -- especially since we still sort-of believe that they are indeed used by (or at least potentially useful to) things like editors to hint syntactic conventions.

So, that leaves a few different options, considering the constraints we have:

1) Disallow more than one "+" sign in media subtypes, as floated at the meeting. This would put a fair amount of pressure on the registry's ability to reflect reality, depending on how widely deployed some things get (although we could grandfather some types in to ease the pressure here).

2) Syntactically allow suffixes before the last one, but not assign them any meaning or register them; e.g., application/foo+bar+xml would be an XML format, but who knows what bar is; effectively, it's just part of "foo+bar". This would allow people to define suffix-like things, but wouldn't give them any recognition or coordination -- potentially leading to the need to formalise things more down the road, just as we did in the first round of suffixes.

3) Consider multiple suffixes, when they occur, to be unrelated hints as to the syntax of the format -- i.e., there is no processing model, there is no ordering (although a registrant would have to choose an order; registrations with different orderings should be refused). Effectively, suffixes would just be a 'bag of hints' about the format being used. 

I'd be interested in hearing people's reactions to these. 

Separately, I think we need to settle a few other matters to make progress:


### Defining What Suffixes Are For (no matter how many there are)

After the discussion in Brisbane, I strongly believe that suffixes should ONLY be for hinting about the syntax or format convention in use, as an aid eg to editors, syntax highlighters, etc. This is the proven use case for media type suffixes. Suffixes should not be used to hint semantics; only syntax. We should have strong language about the dangers of using suffixes to hint particular kinds of processing; cf the previous discussion on the 'polyglot problem' and the potential security issues around performing processing based upon suffixes.

The suffix registration process should be designed to assure that only such suffixes are registered. 

Note that in this view, "+ld" is very likely unregistrable. 


### Cleaning Up Existing Suffixes

+gzip and +zstd are problematic; the former should be disallowed for new registrations, and the latter should be removed or obsoleted in the registry. Likewise, I am highly suspicious of +jwt and +cose. +zip _is_ a format convention, so I suppose it's OK?


Cheers,

--
Mark Nottingham   https://www.mnot.net/