Re: [websec] Issue 17: Registry for magic numbers

Larry Masinter <masinter@adobe.com> Wed, 26 October 2011 08:38 UTC

Return-Path: <masinter@adobe.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFBBC21F84D2 for <websec@ietfa.amsl.com>; Wed, 26 Oct 2011 01:38:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.355
X-Spam-Level:
X-Spam-Status: No, score=-106.355 tagged_above=-999 required=5 tests=[AWL=-0.211, BAYES_00=-2.599, FRT_ADOBE2=2.455, GB_I_LETTER=-2, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u26AgxjpWcPZ for <websec@ietfa.amsl.com>; Wed, 26 Oct 2011 01:38:22 -0700 (PDT)
Received: from exprod6og116.obsmtp.com (exprod6og116.obsmtp.com [64.18.1.37]) by ietfa.amsl.com (Postfix) with ESMTP id 715D321F84CC for <websec@ietf.org>; Wed, 26 Oct 2011 01:38:20 -0700 (PDT)
Received: from outbound-smtp-1.corp.adobe.com ([192.150.11.134]) by exprod6ob116.postini.com ([64.18.5.12]) with SMTP; Wed, 26 Oct 2011 01:38:21 PDT
Received: from inner-relay-4.eur.adobe.com (inner-relay-4.adobe.com [193.104.215.14]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id p9Q8aTYE004089; Wed, 26 Oct 2011 01:36:30 -0700 (PDT)
Received: from nacas01.corp.adobe.com (nacas01.corp.adobe.com [10.8.189.99]) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id p9Q8c0LW003218; Wed, 26 Oct 2011 01:38:04 -0700 (PDT)
Received: from nambxv01a.corp.adobe.com ([10.8.189.95]) by nacas01.corp.adobe.com ([10.8.189.99]) with mapi; Wed, 26 Oct 2011 01:38:03 -0700
From: Larry Masinter <masinter@adobe.com>
To: Adam Barth <ietf@adambarth.com>
Date: Wed, 26 Oct 2011 01:38:00 -0700
Thread-Topic: [websec] Issue 17: Registry for magic numbers
Thread-Index: AcyTquHWBNIMkzUkSQu3HYvOq8c2/QADiNJA
Message-ID: <C68CB012D9182D408CED7B884F441D4D0605EFA741@nambxv01a.corp.adobe.com>
References: <CAJE5ia8n+B10TbjpVYbVieTWEHo3AY_pRm1EToNX_iB1+3UTCw@mail.gmail.com> <4EA6360C.7070700@it.aoyama.ac.jp> <CAJE5ia8rnkeET5GQhoj7CWbOLha=hp-Ucq6Psw8M1LGvPTMC-w@mail.gmail.com> <4EA783F7.90609@gondrom.org> <CAJE5ia9DPv4aOFsDZuu3YBzBQ4H95UUO_K3ooY1SD+UyfhJiWg@mail.gmail.com> <C68CB012D9182D408CED7B884F441D4D0605EFA73E@nambxv01a.corp.adobe.com> <CAJE5ia9GMTGT_8UhsW79q-Y9v55vs_+2tmDbFRKWXc_WOUEW1Q@mail.gmail.com>
In-Reply-To: <CAJE5ia9GMTGT_8UhsW79q-Y9v55vs_+2tmDbFRKWXc_WOUEW1Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "websec@ietf.org" <websec@ietf.org>
Subject: Re: [websec] Issue 17: Registry for magic numbers
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Oct 2011 08:38:23 -0000

A standards specification should meet the requirements of the use cases that are in scope for the specification. 

If you only evaluate adequacy against  a narrow set of requirements, then the scope should be limited to those situations where those requirements are adequate.

If you're evaluating against what "a  user agent needs to perform in order to be competitive in the browser market"  then the only use cases you're validating against are "popular web browsers in 2012", which is a very narrow scope.

If, on the other hand, you expect the standard to have value over the long term, you need a longer-term and broader set of requirements and use cases, which will add additional complexity to meet requirements.

Larry


-----Original Message-----
From: Adam Barth [mailto:ietf@adambarth.com] 
Sent: Tuesday, October 25, 2011 11:45 PM
To: Larry Masinter
Cc: Tobias Gondrom; websec@ietf.org
Subject: Re: [websec] Issue 17: Registry for magic numbers

You've posed a large number of questions.  I'll do my best to answer them.

On Tue, Oct 25, 2011 at 11:31 PM, Larry Masinter <masinter@adobe.com> wrote:
> This gets back to the question of the scope of the document. Does it, 
> or does it not, handle sniffing of arbitrary blobs of data that come 
> in without any content-type,

User agents that implement sniffing are expected to sniff HTTP responses that lack a Content-Type header, yes.

> blobs of data labeled application/octet-stream,

User agents that implement sniffing are not expected to sniff HTTP responses that contain Content-Type header with the value application/octet-stream.

> and data coming via ftp (through ftp URIs)

Yes.

> or thumb drives

Yes.

>or mounted NFS file systems or whatever?

Yes.

You can answer these questions by reading the document.  For example, the document explicitly states the set of Content-Type header values that trigger sniffing.  The document also explicitly calls out FTP as an example.

> Does it, or does it not, handle sniffing rules inside ZIP packaged web applications?

Not as described by this document.  However, I've been told that another document has re-used the algorithm for that purpose.

> If it does, then sniffing should cover everything that is sniffable, 
> including almost all MIME types

Why is that?  The document describes what is essentially the minimal amount of sniffing a user agent needs to perform in order to be competitive in the browser market.  I don't think we should be encouraging sniffing beyond that.

> -- you say "most MIME types that get registered don't need sniffing 
> rules", I don't know what the percentage is,

With the possible exception of fonts, I believe the document describes all the sniffing rules that are necessary today.  You can compare with list of MIME types in the document with the list of registered MIME types if you wish to get a sense of what I mean when I say that "most"
don't need sniffing rules.

> but after all, don't you want to be able to discover file types??

I'm not sure what you mean by "discover file types".  There's no discovery going on here.

> Of course, maybe that broad applicability of sniffing isn't appropriate, but then ... where's are the boundaries?

The boundaries are exactly what's described in the document.  There's been a great deal of research and implementation experience poured into the document to determine precisely where to draw the boundaries.
 As far as I can tell, the document describes the optimal point.  If you have data that shows otherwise, I'd like to see it.

> Which situations are in scope vs. not?

The criteria I would use is the following one:

"Given a diverse market of browser vendors, is this a sniffing algorithm that all browser vendors are mutually interested in converging upon."

If the answer is "yes", then you've identified the correct scope and rules.  If "no", then the spec needs to be improved.  If there is no such set of rules, then this endeavor is a waste of time and any spec we create will be dead letter.

> And don't some of the "in-scope" situations need almost all MIME types to be sniffable?

No.

Adam


> -----Original Message-----
> From: websec-bounces@ietf.org [mailto:websec-bounces@ietf.org] On 
> Behalf Of Adam Barth
> Sent: Tuesday, October 25, 2011 9:00 PM
> To: Tobias Gondrom
> Cc: websec@ietf.org
> Subject: Re: [websec] Issue 17: Registry for magic numbers
>
> Yeah, I think we're much better off creating a new registry rather than using the MIME registry.  The truth is that most MIME types that get registered don't need sniffing rules.  The only ones that need it are the legacy ones and the ones browser vendor cause to need it because of the prisoner's dilemma in the browser market.
>
> Adam
>
>
> On Tue, Oct 25, 2011 at 8:52 PM, Tobias Gondrom <tobias.gondrom@gondrom.org> wrote:
>> <hat="individual">
>> For me the point is, currently we have a table in the document, which 
>> inside an RFC is rather static and hard to extend.
>> So it looks like a good case for a registry to allow for 
>> extendibility for new mime-types. (e.g. we keep the table in the 
>> document, create an IANA registry, copy the values to the registry 
>> and allow for future entries by expert review) That can either be 
>> added to the current Mime-type registry, or we create a new one (e.g. 
>> within the websec
>> namespace) with only these elements.
>>
>> Just my 5cents.
>>
>> Tobias
>>
>>
>>
>> On 25/10/11 05:23, Adam Barth wrote:
>>>
>>> On Mon, Oct 24, 2011 at 9:07 PM, "Martin J. Dürst"
>>> <duerst@it.aoyama.ac.jp>  wrote:
>>>>
>>>> On 2011/10/25 11:21, Adam Barth wrote:
>>>>>
>>>>> http://trac.tools.ietf.org/wg/websec/trac/ticket/17 refers to an 
>>>>> IANA registry with magic numbers for various media types.  I 
>>>>> wanted to compare them to what's in the draft, but I couldn't find 
>>>>> it.  I found the media type registry, e.g., for images:
>>>>>
>>>>> http://www.iana.org/assignments/media-types/image/index.html
>>>>>
>>>>> but I don't see any magic numbers.  Would someone be willing to 
>>>>> point me in the right direction?
>>>>
>>>> They are in the templates. To get the template for a registration, 
>>>> start at the overview page 
>>>> (http://www.iana.org/assignments/media-types/index.html).
>>>>
>>>> Then go to the page that lists all the registration for a give top 
>>>> level, e.g.
>>>> http://www.iana.org/assignments/media-types/image/index.html for images.
>>>>
>>>> Then look at each registration template (click on the link in the 
>>>> left column, or in the right column if the left one doesn't have a 
>>>> link and the right one is to an RFC). You may then find a magic 
>>>> number in the registration template. As an example, for image/jp2, 
>>>> the template is at 
>>>> http://www.iana.org/assignments/media-types/image/jp2.
>>>>
>>>> But it looks like earlier templates didn't have a field for a magic 
>>>> number, and this and the reasons Anne gave make this information 
>>>> helpful for cross-checking, but not much more.
>>>
>>> == Images ==
>>>
>>> PNG has a registration template
>>> <http://www.iana.org/assignments/media-types/image/png>, but lacks a 
>>> signature.
>>> JPEG doesn't have a template.
>>> GIF doesn't have a template.
>>> BMP isn't even registered.
>>> WEBP isn't even registered.
>>> ICO has a registration template
>>> <http://www.iana.org/assignments/media-types/image/vnd.microsoft.ico
>>> n
>>> >
>>> and has the correct signature.  Yay!
>>>
>>> == Text ==
>>>
>>> HTML lacks a registration template.
>>>
>>> == Application ==
>>>
>>> PDF doesn't have a template.
>>> Postscript doesn't have a template.
>>> OGG doesn't have a template.
>>> RAR isn't even registered.
>>> ZIP has a registration template
>>> <http://www.iana.org/assignments/media-types/application/zip>, but 
>>> lacks a signature.
>>> GZIP isn't even registered.
>>> RSS isn't even registered.
>>> Atom lacks a registration template.
>>>
>>> == Audio ==
>>>
>>> WAV isn't even registered.
>>>
>>> == Video ==
>>>
>>> MP4 lacks a registration template.
>>> WebM isn't even registered.
>>>
>>> This does not look like a promising approach.  Note: I haven't even 
>>> looked through all the registrations to see how many have signatures 
>>> that we shouldn't be using.
>>>
>>> Adam
>>> _______________________________________________
>>> websec mailing list
>>> websec@ietf.org
>>> https://www.ietf.org/mailman/listinfo/websec
>>
>> _______________________________________________
>> websec mailing list
>> websec@ietf.org
>> https://www.ietf.org/mailman/listinfo/websec
>>
> _______________________________________________
> websec mailing list
> websec@ietf.org
> https://www.ietf.org/mailman/listinfo/websec
>