[apps-discuss] MIME sniffing and the media type registry magic numbers

Larry Masinter <masinter@adobe.com> Tue, 15 November 2011 03:34 UTC

Return-Path: <masinter@adobe.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A4BFB11E81A2 for <apps-discuss@ietfa.amsl.com>; Mon, 14 Nov 2011 19:34:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.253
X-Spam-Level:
X-Spam-Status: No, score=-106.253 tagged_above=-999 required=5 tests=[AWL=0.346, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fxsHyOhhnfC2 for <apps-discuss@ietfa.amsl.com>; Mon, 14 Nov 2011 19:34:31 -0800 (PST)
Received: from exprod6og105.obsmtp.com (exprod6og105.obsmtp.com [64.18.1.189]) by ietfa.amsl.com (Postfix) with ESMTP id 65BF511E80E0 for <apps-discuss@ietf.org>; Mon, 14 Nov 2011 19:34:30 -0800 (PST)
Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob105.postini.com ([64.18.5.12]) with SMTP ID DSNKTsHdxcucVoDacwgc54kOLZyinls8uhZj@postini.com; Mon, 14 Nov 2011 19:34:30 PST
Received: from inner-relay-1.corp.adobe.com (ms-exchange.macromedia.com [153.32.1.51]) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id pAF3YRQB027440 for <apps-discuss@ietf.org>; Mon, 14 Nov 2011 19:34:28 -0800 (PST)
Received: from nahub01.corp.adobe.com (nahub01.corp.adobe.com [10.8.189.97]) by inner-relay-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id pAF3YR5R029783 for <apps-discuss@ietf.org>; Mon, 14 Nov 2011 19:34:27 -0800 (PST)
Received: from nambxv01a.corp.adobe.com ([10.8.189.95]) by nahub01.corp.adobe.com ([10.8.189.97]) with mapi; Mon, 14 Nov 2011 19:34:27 -0800
From: Larry Masinter <masinter@adobe.com>
To: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Date: Mon, 14 Nov 2011 19:34:24 -0800
Thread-Topic: MIME sniffing and the media type registry magic numbers
Thread-Index: AcyjR3dOdvrAwmf7S4mRisFiNDNxjA==
Message-ID: <C68CB012D9182D408CED7B884F441D4D0611DAC12B@nambxv01a.corp.adobe.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Subject: [apps-discuss] MIME sniffing and the media type registry magic numbers
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Nov 2011 03:34:31 -0000

I admit to a strong case of cognitive dissonance arguing two sides of these issues, but I think there's a resolution:

Context:   http://tools.ietf.org/html/draft-ietf-websec-mime-sniff proposes a standards track normative algorithm for "sniffing" content to determine its media type.
Document will be discussed at websec working group meeting tomorrow (Wednesday).

In http://trac.tools.ietf.org/wg/websec/trac/ticket/17 I proposed using a (updated, cleaned up, reviewed?) registry for magic numbers rather than an explicit table.

I am quite conflicted about first arguing for loose controls on media type registration in general, but standards track for one of the fields (the magic number).

An alternative is to add a new registry of "Validated magic numbers" and be clearer in the media type registry itself that the "magic number" field is only a hint, and point people at the parallel registry.

In http://trac.tools.ietf.org/wg/websec/trac/ticket/18 i noted that sniffing also covers retrieving documents from ftp: and file: URIs into browsers, in which cases the file extensions *are* used; again, I think this makes the "file extension" field which is optional now into something that might have a "validated" value.

So rather than considering a "registration" to have different status (FCFS, standards track, etc.) perhaps the individual fields of the registration need status metadata.

Larry


-----Original Message-----
From: "Martin J. Dürst" [mailto:duerst@it.aoyama.ac.jp] 
Sent: Tuesday, November 15, 2011 9:15 AM
To: David Singer
Cc: Larry Masinter; t.petch; apps-discuss@ietf.org; gadams@xfsi.com
Subject: Re: [apps-discuss] font/* (and draft-freed-media-type-regs)

On 2011/11/15 3:35, David Singer wrote:
>
> On Nov 12, 2011, at 12:25 , Larry Masinter wrote:
>
>> I see no use case for why having font/opentype is any better than 
>> application/opentype
>>
>> The only use case I can imagine from looking at
>> http://tools.ietf.org/html/draft-singer-font-mime-00
>> is the possibility of defining common parameters across font data types (in the same way that text/.. has a common charset parameter).
>
> How serious is the first concern "First, the  "application" sub-tree is treated (correctly) with great caution with respect to viruses and other active code."?

I very much think that having a  font/ top level type is actually a good idea. But I hinted at this before: a type shouldn't be treated as "more safe" just because it says font/, rather than application/. Many font formats contain active code that is executed by the font engine. Several security holes have been found in this area. So I'd actually de-emphasize or remove this point. draft-singer-font-mime-00 also doesn't have a security section, and it of course needs one.


> (The reason I abandoned the draft was not the difficulty of getting it through, by the way, but because the W3C Timed Text group decided it didn't need it).

Can you be more specific? E.g., does Timed Text only use one font format? Or does it not contain any field that indicates the format, which makes this "somebody else's problem"? Or some other reason?

Regards,    Martin.