Re: comments on draft-abarth-mime-sniff-03

Adam Barth <w3c@adambarth.com> Tue, 26 January 2010 20:42 UTC

Return-Path: <adam@adambarth.com>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4150F3A69C5 for <apps-discuss@core3.amsl.com>; Tue, 26 Jan 2010 12:42:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.704
X-Spam-Level:
X-Spam-Status: No, score=-3.704 tagged_above=-999 required=5 tests=[AWL=-4.182, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, FRT_ADOBE2=2.455]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8GV2A47ac80z for <apps-discuss@core3.amsl.com>; Tue, 26 Jan 2010 12:42:29 -0800 (PST)
Received: from mail-px0-f186.google.com (mail-px0-f186.google.com [209.85.216.186]) by core3.amsl.com (Postfix) with ESMTP id 8BDC03A6991 for <apps-discuss@ietf.org>; Tue, 26 Jan 2010 12:42:29 -0800 (PST)
Received: by pxi16 with SMTP id 16so3502289pxi.29 for <apps-discuss@ietf.org>; Tue, 26 Jan 2010 12:42:38 -0800 (PST)
MIME-Version: 1.0
Received: by 10.142.5.26 with SMTP id 26mr5901496wfe.210.1264538558156; Tue, 26 Jan 2010 12:42:38 -0800 (PST)
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com>
From: Adam Barth <w3c@adambarth.com>
Date: Tue, 26 Jan 2010 20:42:18 +0000
Message-ID: <7789133a1001261242m3fd2eb72o9e7ffb12f522cf7f@mail.gmail.com>
Subject: Re: comments on draft-abarth-mime-sniff-03
To: Larry Masinter <masinter@adobe.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Thu, 28 Jan 2010 09:20:49 -0800
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jan 2010 20:42:31 -0000

On Wed, Jan 20, 2010 at 10:41 PM, Larry Masinter <masinter@adobe.com> wrote:
> All or nothing:
>
>  Whenever possible, user agents should avoid
>  employing a content sniffing algorithm.
>
> The normative advice seems to apply to roles
> other than "user agents"; whether or not
> there is a user isn't really relevant.

What other roles did you have in mind?  I think the primary consumer
of this specification with be web browsers, which commonly consider
themselves to be user agents.

> Secondly, it's not clear the scope of the
> "should avoid" or whether this is really
> intended to be normative.  I'll assume
> that it is.

I've made this clearer by using SHOULD in all caps.

>   However, if a user agent does employ a
>   content sniffing algorithm, the user agent
>   should use the algorithm in this document exactly
>
> The algorithm isn't exact (more below).

I've removed the word exactly,

> And think this is the wrong normative advice.
> Implementations SHOULD NOT sniff labeled content,
> and MUST NOT sniff any way beyond the ways
> explicitly allowed here, and when they do sniff,
> should opt into sniffing on a situation by situation,
> type by type basis.

I disagree.  Life would be more predictable if there were fewer
sniffing options, not more.  The more predictable life is, the more
secure it is.

> The application of sniffing should be reserved
> for situations where the implementor has sufficient
> reason to believe that there is sufficient existing
> deployed content that would not function as expected
> if sniffing were not implemented, and these situations
> should be encouraged to be as fine a granularity
> as the implementor needs.

I agree that sniffing should be reserved for those cases where the
implementor believes it is necessary.  However, once the implementor
has decided its necessary, they should go "whole hog" and implement
the algorithm defined in the spec.  If everyone got to pick and choose
what heuristics to use, then we'd be in much the same mess we're in
now.

> This normative advice should not require
> updating as the deployed content evolves.

I don't think it does.

> =============================
> TERMINOLOGY "resource"

I've removed this word from the draft.

> ==========================================================
> Malformed content-type information vs. missing content-type
>
> I think the cases where content-type information supplied is
> malformed should be treated differently than the cases where
> content-type information is missing. In particular, it
> should be possible and allowed to raise an error
> if content-type information is malformed, even when
> sniffing is performed on well-formed, but incorrect,
> content-type. That is, the "opt-in" behavior could allow
> an implementation to sniff when no content-type, but
> not sniff when there is a malformed content-type.

This would make the algorithm less predictable.  The goal is to make
life more predictable.

> ==============================================================
> Contextual application
>
> There are many different contexts of use of MIME labeled
> material in the web, and the opt-in behavior for sniffing
> should be allowed to be contextualized. For example, one
> might opt in to sniffing for <img src=...> but opt out of
> sniffing a script or rel link and opt in when doing a GET
> on the main body of a web page.

This would make the algorithm less predictable.  The goal is to make
life more predictable.  Either you want to sniff or you don't.  Cherry
picking what to sniff leads to even more complexity.

> ================================================================
> multiple content-type headers are malformed:
>
>> For HTTP resources, only the last Content-Type HTTP header,
>> if any, contributes any type information
>
> =============================================================
> A message with more than one content-type header
> should be treated as malformed.
>
>   If the Content-Type HTTP header is present but
>   the value of the last such header cannot be
>   interpreted as described by the HTTP
>   specifications (e.g. because its value doesn't
>   contain a U+002F SOLIDUS ('/') character), then
>   the resource has no type information (even
>   if there are multiple Content-Type HTTP headers and
>   one of the other ones is syntactically correct).

That's not the way sniffing algorithms work.  We might wish they
worked some other way, but that's life.

> =============================================
> The "algorithm for extracting an encoding ...."

I've removed this section.  I suspect it should have remained in HTML5
instead of being moved into this document.

> ========================
> file extensions:
>
>  Note: It is essential that file extensions
>  are not used for determining the media type
>   for resources fetched over HTTP because
>  file extensions can often by supplied by
>   malicious parties.
>
>  "Often" is dubious. How can file extensions be
> supplied more often than  content-type headers?

I've removed the word "often" in favor of "in some cases" which is
factually indisputable.

> What is the security threat?

I've explained this in a previous reply.

>  For resources fetched over most other protocols, e.g.
>  FTP, there is no type information.
>
> "most other protocols" is imprecise.
> Does this apply to imap? data:? There seems
> to be some attempt to define this for RSS?

I've replaced the word "most" with "some" which is factually indisputable.

> Is it the protocol or the scheme that
> determines the method, or both?

That's a question for folks more pedantic than me.  I have no idea,
but I suspect it's the protocol.

> I also don't think this matches implementations
> for FTP, aren't file extension used frequently
> to determine content-type?

The FTP protocol does not transmit the type information to the user agent.

> =========================
> Waiting (section 5)

I've already replied to these comments.

> ============
> Adding new types:
>
> " User agents may support additional types if desired,
>   by implicitly adding to the above table."
>
> This makes no sense and is a disaster for
> interoperability.

Sadly, it admits the reality of what will happen.  For example, no one
knows what will happen with audio and video content on the web now
that HTML supports these media types natively.  To pretend like we can
control user agents here is fantasy.  I've made the requirements here
tighter.

> The only reason why we have badly deployed content
> is that user agents "sniffed" new types. If the
> deployed infrastructure doesn't deploy new
> kinds of sniffing, then people won't distribute
> content.  This is a harmful escalating path and
> encouraging implementations to add to the table
> is disruptive and harmful. Don't do it.

I've changed "if desired" to "if necessary".  If user agents feel that
sniffing video formats is necessary, it doesn't matter what we write
in the spec.  They'll still do it.

Adam