Re: [websec] Issue 17: Registry for magic numbers

Adam Barth <ietf@adambarth.com> Wed, 26 October 2011 06:45 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 28BBB11E8082 for <websec@ietfa.amsl.com>; Tue, 25 Oct 2011 23:45:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.037
X-Spam-Level:
X-Spam-Status: No, score=-2.037 tagged_above=-999 required=5 tests=[AWL=0.485, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, FRT_ADOBE2=2.455, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XMJFURlZk36s for <websec@ietfa.amsl.com>; Tue, 25 Oct 2011 23:45:45 -0700 (PDT)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by ietfa.amsl.com (Postfix) with ESMTP id 1CD9711E807F for <websec@ietf.org>; Tue, 25 Oct 2011 23:45:39 -0700 (PDT)
Received: by iabn5 with SMTP id n5so1800876iab.31 for <websec@ietf.org>; Tue, 25 Oct 2011 23:45:38 -0700 (PDT)
Received: by 10.42.136.196 with SMTP id v4mr48928231ict.3.1319611538913; Tue, 25 Oct 2011 23:45:38 -0700 (PDT)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by mx.google.com with ESMTPS id ge16sm1527554ibb.2.2011.10.25.23.45.37 (version=SSLv3 cipher=OTHER); Tue, 25 Oct 2011 23:45:37 -0700 (PDT)
Received: by iabn5 with SMTP id n5so1800850iab.31 for <websec@ietf.org>; Tue, 25 Oct 2011 23:45:37 -0700 (PDT)
Received: by 10.42.154.132 with SMTP id q4mr3309012icw.54.1319611537124; Tue, 25 Oct 2011 23:45:37 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.205.144 with HTTP; Tue, 25 Oct 2011 23:45:07 -0700 (PDT)
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D0605EFA73E@nambxv01a.corp.adobe.com>
References: <CAJE5ia8n+B10TbjpVYbVieTWEHo3AY_pRm1EToNX_iB1+3UTCw@mail.gmail.com> <4EA6360C.7070700@it.aoyama.ac.jp> <CAJE5ia8rnkeET5GQhoj7CWbOLha=hp-Ucq6Psw8M1LGvPTMC-w@mail.gmail.com> <4EA783F7.90609@gondrom.org> <CAJE5ia9DPv4aOFsDZuu3YBzBQ4H95UUO_K3ooY1SD+UyfhJiWg@mail.gmail.com> <C68CB012D9182D408CED7B884F441D4D0605EFA73E@nambxv01a.corp.adobe.com>
From: Adam Barth <ietf@adambarth.com>
Date: Tue, 25 Oct 2011 23:45:07 -0700
Message-ID: <CAJE5ia9GMTGT_8UhsW79q-Y9v55vs_+2tmDbFRKWXc_WOUEW1Q@mail.gmail.com>
To: Larry Masinter <masinter@adobe.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: "websec@ietf.org" <websec@ietf.org>
Subject: Re: [websec] Issue 17: Registry for magic numbers
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Oct 2011 06:45:46 -0000

You've posed a large number of questions.  I'll do my best to answer them.

On Tue, Oct 25, 2011 at 11:31 PM, Larry Masinter <masinter@adobe.com> wrote:
> This gets back to the question of the scope of the document. Does it, or does it not, handle sniffing of arbitrary blobs of data that come in without any content-type,

User agents that implement sniffing are expected to sniff HTTP
responses that lack a Content-Type header, yes.

> blobs of data labeled application/octet-stream,

User agents that implement sniffing are not expected to sniff HTTP
responses that contain Content-Type header with the value
application/octet-stream.

> and data coming via ftp (through ftp URIs)

Yes.

> or thumb drives

Yes.

>or mounted NFS file systems or whatever?

Yes.

You can answer these questions by reading the document.  For example,
the document explicitly states the set of Content-Type header values
that trigger sniffing.  The document also explicitly calls out FTP as
an example.

> Does it, or does it not, handle sniffing rules inside ZIP packaged web applications?

Not as described by this document.  However, I've been told that
another document has re-used the algorithm for that purpose.

> If it does, then sniffing should cover everything that is sniffable, including almost all MIME types

Why is that?  The document describes what is essentially the minimal
amount of sniffing a user agent needs to perform in order to be
competitive in the browser market.  I don't think we should be
encouraging sniffing beyond that.

> -- you say "most MIME types that get registered don't need sniffing rules", I don't know what the percentage is,

With the possible exception of fonts, I believe the document describes
all the sniffing rules that are necessary today.  You can compare with
list of MIME types in the document with the list of registered MIME
types if you wish to get a sense of what I mean when I say that "most"
don't need sniffing rules.

> but after all, don't you want to be able to discover file types??

I'm not sure what you mean by "discover file types".  There's no
discovery going on here.

> Of course, maybe that broad applicability of sniffing isn't appropriate, but then ... where's are the boundaries?

The boundaries are exactly what's described in the document.  There's
been a great deal of research and implementation experience poured
into the document to determine precisely where to draw the boundaries.
 As far as I can tell, the document describes the optimal point.  If
you have data that shows otherwise, I'd like to see it.

> Which situations are in scope vs. not?

The criteria I would use is the following one:

"Given a diverse market of browser vendors, is this a sniffing
algorithm that all browser vendors are mutually interested in
converging upon."

If the answer is "yes", then you've identified the correct scope and
rules.  If "no", then the spec needs to be improved.  If there is no
such set of rules, then this endeavor is a waste of time and any spec
we create will be dead letter.

> And don't some of the "in-scope" situations need almost all MIME types to be sniffable?

No.

Adam


> -----Original Message-----
> From: websec-bounces@ietf.org [mailto:websec-bounces@ietf.org] On Behalf Of Adam Barth
> Sent: Tuesday, October 25, 2011 9:00 PM
> To: Tobias Gondrom
> Cc: websec@ietf.org
> Subject: Re: [websec] Issue 17: Registry for magic numbers
>
> Yeah, I think we're much better off creating a new registry rather than using the MIME registry.  The truth is that most MIME types that get registered don't need sniffing rules.  The only ones that need it are the legacy ones and the ones browser vendor cause to need it because of the prisoner's dilemma in the browser market.
>
> Adam
>
>
> On Tue, Oct 25, 2011 at 8:52 PM, Tobias Gondrom <tobias.gondrom@gondrom.org> wrote:
>> <hat="individual">
>> For me the point is, currently we have a table in the document, which
>> inside an RFC is rather static and hard to extend.
>> So it looks like a good case for a registry to allow for extendibility
>> for new mime-types. (e.g. we keep the table in the document, create an
>> IANA registry, copy the values to the registry and allow for future
>> entries by expert review) That can either be added to the current
>> Mime-type registry, or we create a new one (e.g. within the websec
>> namespace) with only these elements.
>>
>> Just my 5cents.
>>
>> Tobias
>>
>>
>>
>> On 25/10/11 05:23, Adam Barth wrote:
>>>
>>> On Mon, Oct 24, 2011 at 9:07 PM, "Martin J. Dürst"
>>> <duerst@it.aoyama.ac.jp>  wrote:
>>>>
>>>> On 2011/10/25 11:21, Adam Barth wrote:
>>>>>
>>>>> http://trac.tools.ietf.org/wg/websec/trac/ticket/17 refers to an
>>>>> IANA registry with magic numbers for various media types.  I wanted
>>>>> to compare them to what's in the draft, but I couldn't find it.  I
>>>>> found the media type registry, e.g., for images:
>>>>>
>>>>> http://www.iana.org/assignments/media-types/image/index.html
>>>>>
>>>>> but I don't see any magic numbers.  Would someone be willing to
>>>>> point me in the right direction?
>>>>
>>>> They are in the templates. To get the template for a registration,
>>>> start at the overview page
>>>> (http://www.iana.org/assignments/media-types/index.html).
>>>>
>>>> Then go to the page that lists all the registration for a give top
>>>> level, e.g.
>>>> http://www.iana.org/assignments/media-types/image/index.html for images.
>>>>
>>>> Then look at each registration template (click on the link in the
>>>> left column, or in the right column if the left one doesn't have a
>>>> link and the right one is to an RFC). You may then find a magic
>>>> number in the registration template. As an example, for image/jp2,
>>>> the template is at
>>>> http://www.iana.org/assignments/media-types/image/jp2.
>>>>
>>>> But it looks like earlier templates didn't have a field for a magic
>>>> number, and this and the reasons Anne gave make this information
>>>> helpful for cross-checking, but not much more.
>>>
>>> == Images ==
>>>
>>> PNG has a registration template
>>> <http://www.iana.org/assignments/media-types/image/png>, but lacks a
>>> signature.
>>> JPEG doesn't have a template.
>>> GIF doesn't have a template.
>>> BMP isn't even registered.
>>> WEBP isn't even registered.
>>> ICO has a registration template
>>> <http://www.iana.org/assignments/media-types/image/vnd.microsoft.icon
>>> >
>>> and has the correct signature.  Yay!
>>>
>>> == Text ==
>>>
>>> HTML lacks a registration template.
>>>
>>> == Application ==
>>>
>>> PDF doesn't have a template.
>>> Postscript doesn't have a template.
>>> OGG doesn't have a template.
>>> RAR isn't even registered.
>>> ZIP has a registration template
>>> <http://www.iana.org/assignments/media-types/application/zip>, but
>>> lacks a signature.
>>> GZIP isn't even registered.
>>> RSS isn't even registered.
>>> Atom lacks a registration template.
>>>
>>> == Audio ==
>>>
>>> WAV isn't even registered.
>>>
>>> == Video ==
>>>
>>> MP4 lacks a registration template.
>>> WebM isn't even registered.
>>>
>>> This does not look like a promising approach.  Note: I haven't even
>>> looked through all the registrations to see how many have signatures
>>> that we shouldn't be using.
>>>
>>> Adam
>>> _______________________________________________
>>> websec mailing list
>>> websec@ietf.org
>>> https://www.ietf.org/mailman/listinfo/websec
>>
>> _______________________________________________
>> websec mailing list
>> websec@ietf.org
>> https://www.ietf.org/mailman/listinfo/websec
>>
> _______________________________________________
> websec mailing list
> websec@ietf.org
> https://www.ietf.org/mailman/listinfo/websec
>