Re: [media-types] Media subtypes containing "+"

Manu Sporny <msporny@digitalbazaar.com> Sun, 27 December 2020 16:36 UTC

Return-Path: <msporny@digitalbazaar.com>
X-Original-To: media-types@ietfa.amsl.com
Delivered-To: media-types@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4E14D3A0C26 for <media-types@ietfa.amsl.com>; Sun, 27 Dec 2020 08:36:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G3_YFtvMzYfG for <media-types@ietfa.amsl.com>; Sun, 27 Dec 2020 08:36:41 -0800 (PST)
Received: from mail.digitalbazaar.com (mail.digitalbazaar.com [96.89.14.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4A4673A0657 for <media-types@ietf.org>; Sun, 27 Dec 2020 08:36:41 -0800 (PST)
Received: from [73.152.135.186] (helo=[10.4.10.95]) by mail.digitalbazaar.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from <msporny@digitalbazaar.com>) id 1ktZ4s-0001ry-Kd for media-types@ietf.org; Sun, 27 Dec 2020 11:39:33 -0500
To: media-types@ietf.org
References: <e2ee2ce0-641f-de3e-b1b6-d375b24328ad@rhiaro.co.uk> <029ad5c8-b441-3a1e-997d-af1187bc8149@rhiaro.co.uk> <CAL0qLwYAnCSi6XQ2u8d-Xpt0SezpAiVbhGyDorrDm3vN-Sk9FA@mail.gmail.com>
From: Manu Sporny <msporny@digitalbazaar.com>
Message-ID: <1091500a-3b2a-3ff7-587e-4a55990cef5e@digitalbazaar.com>
Date: Sun, 27 Dec 2020 11:36:36 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0
MIME-Version: 1.0
In-Reply-To: <CAL0qLwYAnCSi6XQ2u8d-Xpt0SezpAiVbhGyDorrDm3vN-Sk9FA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-SA-Exim-Connect-IP: 73.152.135.186
X-SA-Exim-Mail-From: msporny@digitalbazaar.com
X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000)
X-SA-Exim-Scanned: Yes (on mail.digitalbazaar.com)
Archived-At: <https://mailarchive.ietf.org/arch/msg/media-types/CdoZdUVL_RMTYICkiJ7dcis2D8k>
Subject: Re: [media-types] Media subtypes containing "+"
X-BeenThere: media-types@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IANA mailing list for reviewing Media Type \(MIME Type, Content Type\) registration requests." <media-types.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/media-types>, <mailto:media-types-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/media-types/>
List-Post: <mailto:media-types@ietf.org>
List-Help: <mailto:media-types-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/media-types>, <mailto:media-types-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 27 Dec 2020 16:36:43 -0000

Hi Murray, Graham, and Martin,

I'm the lead editor of the Decentralized Identifier Core specification
in the W3C DID WG and have been working with Amy on the
media-types-with-multiple-suffixes document. The W3C DID WG specifically
needs it to be "done enough" for the JSON-LD serialization of DID
Documents. We plan to enter the Candidate Recommendation phase toward
the end of January and getting this document sorted is becoming a
priority. So any thoughts on whether we should stick with the direction
we're going in or have some sort of "backup plan" would be welcome.
Responses to statements below...

On 12/23/20 5:35 PM, Murray S. Kucherawy wrote:
> (1) In terms of handling, you might also propose this to art@ietf.org
> <mailto:art@ietf.org> (the general ART area mailing list) and to
> dispatch@ietf.org <mailto:dispatch@ietf.org> (the DISPATCH working
> group); the latter would be the place to get a discussion going about
> an appropriate venue for handling the document.

Hmm, by "handling" do you mean "further discussion"? We were under the
impression that the media-types group was the right forum for the
discussion. We're a bit lost as to how we'd publish this -- is that what
you're getting at?

> (2) I'm unclear on the nomenclature, in particular that the most 
> specific suffix is the first in the list of suffixes (when read left 
> to right) and the most generic is last.  My thought is that the list 
> needs to be ordered, and we just need to specify what that ordering 
> is.  For instance, I'm dreaming up a (possibly absurd) example of 
> "foo+zip+gzip".  Would I un-gzip it first, or unzip it first, with 
> the goal of getting a "foo" out of it?  Why would I consider one of 
> them more specific or generic than the other?

Our understanding was that the media-type string could contain multiple
subtypes and that there may be generic processing applications that
could determine if they could perform processing based on each generic
subtype.

For example, let's take the media type we're most concerned about right now:

application/did+ld+json

There are classes of processors that would be interested in each
subtype, for example:

json -- JSON linting software, JSON storage software that could store
        the document directly in an unstructured database, etc.

ld+json -- JSON-LD linting software, JSON-LD document processing
           software (expansion/compaction), JSON-LD to other RDF syntax
           processing software, etc.

did+ld+json -- Decentralized Identifier Document serialized as JSON-LD
               processing software such as DID resolution software,
               DID dereferencing software, Linked Data Signature
               processing software, etc.

So, the subtype(s) you're interested in depends on where you're
operating in the stack... there is definitely an answer to the "more
specific" / "more generic" question and the spec extension attempts to
point that out. I'm guessing that what you're saying is that "it needs
to do a better job at making that distinction"?

> If (based on the end of Section 1.1) what you're suggesting is 
> actually to in effect consider "+zip+gzip" a single unified suffix
> of its own, then I think what we're really saying here is that
> there's only ever one suffix, but that suffix can contain a "+".
> That is, when parsing the media subtype, the suffix is everything at
> and after the first "+" irrespective of how many "+" characters there
> are.  If that's the case, it might be simpler to just say it that way
> so that the reader doesn't need to think about a gradation of
> most-to-least specific.

Hmm, I don't think that's what we're trying to say. I think what we're
trying to say is the above, or to put it another way:

When there are multiple plus signs, there are multiple subtypes.
Subtypes that are registered have clear processing rules by pointing to
the specification that defines those processing rules.  So,
"application/did+ld+json" may be processed as "application/json",
"application/ld+json", and "application/did+ld+json"... and it's up to
the application to decide which subtype it would like to apply (all of
them are valid).

Does that make sense? Does anyone know of any media types where that
rationale breaks down? Or is that a nonsensical way to think about the
problem?

-- manu

-- 
Manu Sporny - https://www.linkedin.com/in/manusporny/
Founder/CEO - Digital Bazaar, Inc.
blog: Veres One Decentralized Identifier Blockchain Launches
https://tinyurl.com/veres-one-launches