[media-types] A proposal for a new top-level media type: archive

Sean Leonard <dev+ietf@seantek.com> Wed, 24 September 2014 23:24 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: media-types@ietfa.amsl.com
Delivered-To: media-types@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C6C751A1B38 for <media-types@ietfa.amsl.com>; Wed, 24 Sep 2014 16:24:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=unavailable
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PzFe6LRhb8bX for <media-types@ietfa.amsl.com>; Wed, 24 Sep 2014 16:24:01 -0700 (PDT)
Received: from pechora3.lax.icann.org (pechora3.icann.org [IPv6:2620:0:2d0:201::1:73]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D50D41A1ADB for <media-types@ietf.org>; Wed, 24 Sep 2014 16:24:01 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) by pechora3.lax.icann.org (8.13.8/8.13.8) with ESMTP id s8ONNfRg000935 for <media-types@iana.org>; Wed, 24 Sep 2014 23:24:01 GMT
Received: from [192.168.123.7] (unknown [23.240.242.6]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id AD57C509B8; Wed, 24 Sep 2014 19:23:40 -0400 (EDT)
Message-ID: <54235269.2060002@seantek.com>
Date: Wed, 24 Sep 2014 16:23:21 -0700
From: Sean Leonard <dev+ietf@seantek.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: media-types@iana.org, apps-discuss@ietf.org
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.0 (pechora3.lax.icann.org [192.0.33.73]); Wed, 24 Sep 2014 23:24:01 +0000 (UTC)
Archived-At: http://mailarchive.ietf.org/arch/msg/media-types/S1tGM-I7kta-27r3ooFLasGQZEY
Subject: [media-types] A proposal for a new top-level media type: archive
X-BeenThere: media-types@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IANA mailing list for reviewing Media Type \(MIME Type, Content Type\) registration requests." <media-types.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/media-types>, <mailto:media-types-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/media-types/>
List-Post: <mailto:media-types@ietf.org>
List-Help: <mailto:media-types-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/media-types>, <mailto:media-types-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Sep 2014 23:24:02 -0000

Colleagues on media-types and apps-discuss:

I would like to propose that the IETF create a new top-level media type: 
archive.

Basically, archive would be a top-level type for all types of archive 
formats.
https://en.wikipedia.org/wiki/Archive_file
https://en.wikipedia.org/wiki/List_of_archive_formats

I think it's important to register archive formats as a distinct type 
from application, because there are common semantics that apply. In 
fact, these semantics are very similar to multipart and message 
top-level types.

The archive data types are all storage formats for *files*, as opposed 
to *content*. Each file has its own security implications, along with 
metadata that also has security implications (user and group 
permissions, access bits, executable bits, ACLs). At the highest level, 
an Internet-connected application ought to be able to identify that a 
particular piece of content is of this type (as opposed to the opaque 
application type), so it can make decisions about the content that are 
unique to archives, namely, dealing with the security issues, and 
presenting uniform user interfaces to handling such archives. Content 
bundling types like message (RFC 5322), multipart, and application/cms 
(CMS) are conceptually distinct. All those types can contain content 
that can get split off into files, but their purpose is not to replicate 
file system data.

Archives are ubiquitous on the Internet. Even if archives are used 
"infrequently" across the Internet architecture, they are obviously used 
at the endpoints. Improper transmission of archives has become a major 
source of labeling and security issues.

Remarkably, most archive formats have not been registered as media types 
(except for application/zip, which is an oldie). Therefore, it's pretty 
much a "clean field". Furthermore, there is a trend of a lot of widely 
available tools to support multiple formats, so the probability is good 
that if you pass some archive/* labeled content to an archive 
application, it will be able to do something intelligent with it.

The following major sub-types of archives, all belong in a common 
top-level media type: [from Wikipedia]
* archiving only (concatenate files): tar
*  multi-function (concatenate, compress, encrypt, etc.): zip, rar, 7z, 
arc, arj, the list goes on and on...
* software packaging: cab, msi, pup, pet, apk, rpm...
* disk image: ISO-9660 (CD/DVD/Blu-Ray), Apple Disk Image, virtual 
floppy disks, formerly-known-as-TrueCrypt, etc.
* backup: (a large quantity of proprietary formats)

I know that the TLMT matter has been brought up before with fonts. 
<http://www6.ietf.org/mail-archive/web/apps-discuss/current/msg03447.html>

Where do we start? Maybe we should talk about it? I don't think it's as 
simple as drafting an Internet-Draft. Maybe there should be a BOF or 
working group. Experts with file system and archival experience should 
get involved.

Sean