Re: comments on draft-abarth-mime-sniff-03

Adam Barth <ietf@adambarth.com> Wed, 20 January 2010 23:15 UTC

Return-Path: <adam@adambarth.com>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D104B3A6992 for <apps-discuss@core3.amsl.com>; Wed, 20 Jan 2010 15:15:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.756
X-Spam-Level:
X-Spam-Status: No, score=-0.756 tagged_above=-999 required=5 tests=[AWL=-1.234, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, FRT_ADOBE2=2.455]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yPDhdyMmBeDA for <apps-discuss@core3.amsl.com>; Wed, 20 Jan 2010 15:15:03 -0800 (PST)
Received: from mail-px0-f186.google.com (mail-px0-f186.google.com [209.85.216.186]) by core3.amsl.com (Postfix) with ESMTP id C117D3A6AA7 for <apps-discuss@ietf.org>; Wed, 20 Jan 2010 15:15:03 -0800 (PST)
Received: by pxi16 with SMTP id 16so4083820pxi.29 for <apps-discuss@ietf.org>; Wed, 20 Jan 2010 15:14:57 -0800 (PST)
MIME-Version: 1.0
Received: by 10.142.247.10 with SMTP id u10mr441731wfh.132.1264029297074; Wed, 20 Jan 2010 15:14:57 -0800 (PST)
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com>
From: Adam Barth <ietf@adambarth.com>
Date: Wed, 20 Jan 2010 15:14:37 -0800
Message-ID: <7789133a1001201514l47b43b8bw958e42794707dbc9@mail.gmail.com>
Subject: Re: comments on draft-abarth-mime-sniff-03
To: Larry Masinter <masinter@adobe.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Thu, 21 Jan 2010 08:14:01 -0800
Cc: Ian Hickson <ian@hixie.ch>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Jan 2010 23:15:04 -0000

Thanks Larry.  These are great comments.  I'll incorporate them when I
next update the draft.  To answer a couple of your questions:

On Wed, Jan 20, 2010 at 2:41 PM, Larry Masinter <masinter@adobe.com> wrote:
> A message with more than one content-type header
> should be treated as malformed.

What does it mean to treat the response as malformed?  I've seen
examples of servers that blissfully send more than one Content-Type
header.  This document just describes how to process the responses
without making a judge about whether the server is acting properly or
not.

> The "algorithm for extracting an encoding ...."
[...]
> The nature of the "willful violation"
> (I.e., how it is different) and the
> justification for the "willful violation"
> should be included. I can't fathom any
> justification for it.

Charset sniffing is required to avoid ugly replacement characters from
being shown to the user.  Sad, but true.

> file extensions:
>
>  Note: It is essential that file extensions
>  are not used for determining the media type
>   for resources fetched over HTTP because
>  file extensions can often by supplied by
>   malicious parties.
>
>  "Often" is dubious. How can file extensions be
> supplied more often than  content-type headers?

For example, the attacker can chose the file extension in most PHP
installations because foo.php happily processes:

http://example.com/foo.php/bar.qux

> What is the security threat?

The security threat is that if you treat an HTML file extension as
evidence the server wants the response to be treated as text/html you
will introduce XSS vulnerabilities into some large number of sites
running PHP (among others).

> I'd think that the behavior of "how to sniff"
> should start out with what the inputs are
> (the first N bytes of some data from a response).

This business about waiting for 512 bytes has to do with a poor
interaction between buffering for sniffing and Comet
<http://en.wikipedia.org/wiki/Comet_(programming)>.  Basically, if you
wait forever for the 512 bytes you need to sniff completely, then you
break things like chat in Gmail.  For example, Gmail chat used to not
work in Safari for this reason.  However, always using the first chunk
of data off the network to sniff means you'll get unpredictable
results based on how exactly the response was chunked.  Hence the
advice to wait for 512 bytes but not a requirement to wait forever.

Adam