Re: [media-types] Thoughts on suffixes, single and multiple

Harald Alvestrand <harald@alvestrand.no> Thu, 04 April 2024 07:36 UTC

Return-Path: <harald@alvestrand.no>
X-Original-To: media-types@ietfa.amsl.com
Delivered-To: media-types@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 59255C14F5F2 for <media-types@ietfa.amsl.com>; Thu, 4 Apr 2024 00:36:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.103
X-Spam-Level:
X-Spam-Status: No, score=-6.103 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 22jUse7AZlDm for <media-types@ietfa.amsl.com>; Thu, 4 Apr 2024 00:36:37 -0700 (PDT)
Received: from smtp.alvestrand.no (unknown [IPv6:2a01:4f9:c010:a44b::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2B4B2C14F614 for <media-types@ietf.org>; Thu, 4 Apr 2024 00:36:35 -0700 (PDT)
Received: from [192.168.3.110] (unknown [185.71.208.122]) by smtp.alvestrand.no (Postfix) with ESMTPSA id 8A3634ED14 for <media-types@ietf.org>; Thu, 4 Apr 2024 09:36:32 +0200 (CEST)
Message-ID: <1c404c4d-437c-464a-b414-4e0d39c1d8ea@alvestrand.no>
Date: Thu, 04 Apr 2024 09:36:32 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US
To: media-types@ietf.org
References: <2E20FEDE-C766-43EE-A6E2-1FB63E79CF0B@mnot.net>
From: Harald Alvestrand <harald@alvestrand.no>
Autocrypt: addr=harald@alvestrand.no; keydata= xsFNBF3b3UcBEADG/UxgR81/WWeCrH+wICS5D6Wx85iAIEUSmLaCRVJejO5My90JskUdZkmS rYriW3v2nms1gUrI0QZWweEQ/7LTszT4mvWOsbZOwo+gp+jO0RkPjtfPn+cyvo8VPI4D64w5 czTHv9kfXIrGCxSDC8x7j4dsrJv5VwKC/kRx+SB5nBhFSyGo5GRUfUPt7cBdXa3mDMWLd02N kcMew4DP5t0IMlO+ZaXM+IbmQ8bG1Fyccc/+Q+unniAcoYxL3goNOMtyQU0F7cm4ngz5yjqX I3FHwl3CfWJ6ofcyLbhQUK/x2p3BOfUqeb82KMAH9UTGgeo3Z54T71eu9cfYf8AcKDNcFtRK w4NytEQw4UkxdCFL58H/kKSOYjWA0zgQO0X7dNyTs2UMZVzYcHSU9GcYEM9mwjCvcRIEmXfB Dx3rqbsnzu+8yQiOeJKAFLDNDTWle6wJ1iolONL/D4NDo93sbVtBRu+SroZUEfxUNB+InWLJ 2iEWc7mGtVESNGnitqPs+Ev9gsr60kVxqjTlvE+5rgEIMN0oZzA2tiKnYcyG90rsTiX+9xGn qjimtY7YUBthO4ZQvtlyROaxw91u5O1ch1HaWMmv2SsZecbDPcyQKFVSJBPqV7d3vg5mvFpH BTg2HpOM370VdVvoZFLpwRDNJXkvEFjBx/97jVr1iiZB5DB87wARAQABzS9IYXJhbGQgQWx2 ZXN0cmFuZCAoMjAxOSkgPGhhcmFsZEBhbHZlc3RyYW5kLm5vPsLBlAQTAQoAPhYhBEIWAU2+ Fuo0qTc+8P41XL9VgJnFBQJd3CbsAhsjBQkJZgGABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA AAoJEP41XL9VgJnFzfQP/0PN403d8xHDv0K6C0hy/Q5qVhik7iTgDqssADyr0/538BhH6o6a pyHwZJnzzKKhDrzR+8YMIYupqPuUDZrhMwTr7x71CTnrPRIPTxw0S9Pinnj5l0GGdXtb0vZ8 k+sh31hI7r+xIY/1qN2h0IgnYjYNl7OFdAteH74r4l5LInCtZrvnDbAiStUYdKN22T+MJzhL yXyr/4WeWSb2hX1j/9gu5osBfWM+RWSthP93tmGzxxO63Fr5AtIUDq8wpoRq0Y/BvOt/lYAr 3g3fNWYgcvXydvLJWpgaoIgSAlpKA7K9FNBXuolCveS+XrbLVM/ipoK50h1x68fQZCBgSVyM ENPBLKKYm1i5+0jNYwfMpF6fG847RzurIQlz2iWWZ2own6Fk32FuLip2lxn8Z6OHj+cC8UY/ hH+DBWHpYV58ZMJcScqoRiPHrE11Sa8kx6k4kiBA82bELMriS5qN8ybigLEy25EKwdwp+aQ+ gCAu5ddnyKZCC8qDXlsy6zUaE5bHZJ899B4hB+cThgdhZoSDjgFuRdO3hhpdTBgoAQqqvRRi dND9w1bfp/yKuL9i1Piq3zy9XJmnmxCYYawDqUU6ectkN3YIerZa4xb5BnXCcUiQVtUBsH3Q evj5mj9GR2raf/Do/d9V3jZqarA5A/rLQixRt1JlG1vV4gQZDHZ+u/CCzsFNBF3b3UcBEADF 173KMFgxrc+ch4Hbuf9ezNmXPugypSEhCmuCv7zG4yzScPlPgOEHPnZb5srFpbZAS6G1fEL9 JyaH+KU5CcqFXl5eGtoLIOeko5THNmNTEQVgfNQezBh97XEufTyjwyCv8nksjdqZyvIws6EH OnRjI7YKLhnfxAQal/PTFzXZqIcMm6OwHdLk7aTuql5nH0o36i+xQ00HaOM+nRHNJ81bhlyr ZAUtfaA4+EByhn70vcuFG+RY6efo6OHAgbWTF3ZYCXZRi+MQCVvNYsW0sYDLFUA5lpP4mnBT J1OqD3/Q+1OJzkFdSmZcFhxDNScSvhyLUdSVa6MjyPI8q14S87LFXBcIzGkCHcB0l+7da2xB Nvv7pJy2Gmd/1p0HiqygyxTHQeiPoPIa+0dBlEL8iKr6TRTLUTyDp2rPSPrNkbHzCKc0qir5 EcKbfhyeKlrUsk2yEggK33ainPLL9/SbGRiG91WRWa+EDNQuPUcY9FTTE6E766nEDp2f8Xtm 4EOygeMNylw0lTx2eLwRDefpS2EKXCbcAyNROiDAaf8nNn3iKDsiIqP/xtYZRfh9KLy5oxVy 5Fz213qx5eCj4PL4FAUOFLVSeBfP5pE0Q9GpMQQ4e7TcT+NA5U0KYPpQzE8gHcd1gHidPRPt sz7yEgZjPB1+/BcrI/EpfoNj3yhldiHdKwARAQABwsF8BBgBCgAmFiEEQhYBTb4W6jSpNz7w /jVcv1WAmcUFAl3b3UcCGwwFCQlmAYAACgkQ/jVcv1WAmcUqAQ//VwyH+NwMmfXUjy+hW3l6 JvoeXqhRD5KoOhgmY3REcAnAOP2olNXUDbXacpa2l6ribUXFGoAHCc3vtP2ivYz9IUmJbya4 uuQZ0PvkIkayeUKHisN1jev8pbeGJkbvqqnxoLv2ztg+vlAJH13pX7i4CerE7ENFPxW9rrtJ CdC4FExKlx40tC/5vuTkWrNYdkhCnPSlllVWJjJUHaKX90urc8Zx2xadZqwyhCktDYfrEKsw AU7lzbkXQV4Le5Z0gMVm6TQOUZcispceIMWhj1NdArpLx22BgF0/NHs1MZBQ64SFua7GtwXr oV74IZma9jFsDMxnXdtjsWfePVXD0e89DMapmvABFcY6LAS31k1aWuALAIWFILS5ZwYOXsm2 Y1lgGLBK7gq6VoipQR9PoT2wL5yulmXKVBranySieKnYXZkMQpVwEgIjbW+v20kLTQqqgMeh fyGt8DwhTyCRIAfEdx4VFy4dMZczlHwMYQkNq1Jt/JBUluQazmtklGKHsh3AWl7P4Ocuvt4C ifNE4N2DnyK7fO2FCJJtkKVFbhtfyb7O6tGbgANYqrfvyrYuDp+prCdHxheCG8hpoxskn2og qsgQHhZqwml0Fnn6p2v2dufbX1ZMhBsEkvKwTWR7KrOCaO6Bok0tI0S7F1d4LmuaUHu/Od55 p3o2fVs//BoTFQI=
In-Reply-To: <2E20FEDE-C766-43EE-A6E2-1FB63E79CF0B@mnot.net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/media-types/8WT11hCIOK2keKyazN1RUqcGE-4>
Subject: Re: [media-types] Thoughts on suffixes, single and multiple
X-BeenThere: media-types@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "IANA mailing list for reviewing Media Type \(MIME Type, Content Type\) registration requests." <media-types.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/media-types>, <mailto:media-types-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/media-types/>
List-Post: <mailto:media-types@ietf.org>
List-Help: <mailto:media-types-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/media-types>, <mailto:media-types-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Apr 2024 07:36:41 -0000

On 4/3/24 08:29, Mark Nottingham wrote:
> After the meeting in Brisbane, some of us went aside to continue to the multiple suffixes discussion. There, we quickly came to the conclusion that we should deprecate the concept of suffixes in media subtypes -- i.e., they would still be syntactically allowed, but would have no meaning or registry. Martin Thomson and I took an action to write something down about this.
> 
> Once I was home, I started to think more carefully about this and do research. One thing that I haven't yet seen is a summary of how suffixes are currently used (apologies if I missed someone else's effort there). These are the counts for each suffix in the registry that I came up with about a week ago:
> 
> +xml = 439
> +json = 145
> +ber = 0
> +cbor = 16
> +der = 1
> +fastinfoset = 1
> +wbxml = 7
> +zip = 24
> +tlv = 1
> +json-seq = 2
> +sqlite = 1
> +jwt = 6
> +gzip = 2
> +cbor-seq = 4
> +zstd = 0
> +yaml = 2
> +cose = 0
> 
> As you can see, we have a few very widely used suffixes (in a registry of 1,588 entries as of that survey), and many very seldom used ones - with a few not used at all.
> 
> The widespread use of +xml and +json in particular made me more cautious about deprecating suffixes altogether -- especially since we still sort-of believe that they are indeed used by (or at least potentially useful to) things like editors to hint syntactic conventions.
> 
> So, that leaves a few different options, considering the constraints we have:
> 
> 1) Disallow more than one "+" sign in media subtypes, as floated at the meeting. This would put a fair amount of pressure on the registry's ability to reflect reality, depending on how widely deployed some things get (although we could grandfather some types in to ease the pressure here).

There's also the option that was floated at the meeting: Allow 
registration of types with any number of + signs, in any position, 
without regard to the suffix registry (which would close); explain in 
informational (not normative) text what + signs have been used for and 
why it's not a good idea to police them.

This would allow, for instance, the registration of text/code-c++ as a 
legal media type....

> 2) Syntactically allow suffixes before the last one, but not assign them any meaning or register them; e.g., application/foo+bar+xml would be an XML format, but who knows what bar is; effectively, it's just part of "foo+bar". This would allow people to define suffix-like things, but wouldn't give them any recognition or coordination -- potentially leading to the need to formalise things more down the road, just as we did in the first round of suffixes.

That's the opposite sequence of the one in the current draft. I think we 
saw dislike expressed for this earlier.
> 3) Consider multiple suffixes, when they occur, to be unrelated hints as to the syntax of the format -- i.e., there is no processing model, there is no ordering (although a registrant would have to choose an order; registrations with different orderings should be refused). Effectively, suffixes would just be a 'bag of hints' about the format being used.

In the "we don't police this" model, there would be advice that 
registering both foo+bar+baz and foo+baz+bar is likely to be a Bad Idea, 
but we wouldn't want to actually forbid it - someone might come up with 
an use case.

> 
> I'd be interested in hearing people's reactions to these.
> 
> Separately, I think we need to settle a few other matters to make progress:
> 
> 
> ### Defining What Suffixes Are For (no matter how many there are)
> 
> After the discussion in Brisbane, I strongly believe that suffixes should ONLY be for hinting about the syntax or format convention in use, as an aid eg to editors, syntax highlighters, etc. This is the proven use case for media type suffixes. Suffixes should not be used to hint semantics; only syntax. We should have strong language about the dangers of using suffixes to hint particular kinds of processing; cf the previous discussion on the 'polyglot problem' and the potential security issues around performing processing based upon suffixes.
> 
> The suffix registration process should be designed to assure that only such suffixes are registered.
> 
> Note that in this view, "+ld" is very likely unregistrable.
> 
> 
> ### Cleaning Up Existing Suffixes
> 
> +gzip and +zstd are problematic; the former should be disallowed for new registrations, and the latter should be removed or obsoleted in the registry. Likewise, I am highly suspicious of +jwt and +cose. +zip _is_ a format convention, so I suppose it's OK?
> 
> 
> Cheers,
> 
> --
> Mark Nottingham   https://www.mnot.net/
> 
> _______________________________________________
> media-types mailing list
> media-types@ietf.org
> https://www.ietf.org/mailman/listinfo/media-types