RE: comments on draft-abarth-mime-sniff-03

Larry Masinter <masinter@adobe.com> Sat, 23 January 2010 01:35 UTC

Return-Path: <masinter@adobe.com>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2C2633A68F8 for <apps-discuss@core3.amsl.com>; Fri, 22 Jan 2010 17:35:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.599
X-Spam-Level:
X-Spam-Status: No, score=-8.599 tagged_above=-999 required=5 tests=[AWL=-2.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x0axB+niF0Re for <apps-discuss@core3.amsl.com>; Fri, 22 Jan 2010 17:35:27 -0800 (PST)
Received: from exprod6og101.obsmtp.com (exprod6og101.obsmtp.com [64.18.1.181]) by core3.amsl.com (Postfix) with ESMTP id 2121D3A67A3 for <apps-discuss@ietf.org>; Fri, 22 Jan 2010 17:35:25 -0800 (PST)
Received: from source ([192.150.11.134]) by exprod6ob101.postini.com ([64.18.5.12]) with SMTP ID DSNKS1pSV2ndBWH7OJPASBf3ZTgCysUgCNfR@postini.com; Fri, 22 Jan 2010 17:35:23 PST
Received: from inner-relay-3.eur.adobe.com ([192.150.8.236]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id o0N1RX18027132; Fri, 22 Jan 2010 17:27:34 -0800 (PST)
Received: from nacas02.corp.adobe.com (nacas02.corp.adobe.com [10.8.189.100]) by inner-relay-3.eur.adobe.com (8.12.10/8.12.9) with ESMTP id o0N1YX7p019897; Fri, 22 Jan 2010 17:35:16 -0800 (PST)
Received: from nacas03.corp.adobe.com (10.8.189.121) by nacas02.corp.adobe.com (10.8.189.100) with Microsoft SMTP Server (TLS) id 8.1.375.2; Fri, 22 Jan 2010 17:34:47 -0800
Received: from nambxv01a.corp.adobe.com ([10.8.189.95]) by nacas03.corp.adobe.com ([10.8.189.121]) with mapi; Fri, 22 Jan 2010 17:34:46 -0800
From: Larry Masinter <masinter@adobe.com>
To: Adam Barth <ietf@adambarth.com>
Date: Fri, 22 Jan 2010 17:34:46 -0800
Subject: RE: comments on draft-abarth-mime-sniff-03
Thread-Topic: comments on draft-abarth-mime-sniff-03
Thread-Index: AcqaJmPgw4U9goOiQJu1785spbFnMABpDvCQ
Message-ID: <C68CB012D9182D408CED7B884F441D4D5FE353@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com> <7789133a1001201514l47b43b8bw958e42794707dbc9@mail.gmail.com>
In-Reply-To: <7789133a1001201514l47b43b8bw958e42794707dbc9@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Ian Hickson <ian@hixie.ch>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Jan 2010 01:35:28 -0000

>> A message with more than one content-type header
>> should be treated as malformed.

> What does it mean to treat the response as malformed?  I've seen
> examples of servers that blissfully send more than one Content-Type
> header.  This document just describes how to process the responses
> without making a judge about whether the server is acting properly or
> not.

Implementations should be allowed to not do sniffing when
the content-type is malformed, even when they do sniffing
when the content-type is missing, or specific kinds of
sniffing when the content-type is supplied but just wrong.

Malformed content is an indication of more serious site
misconfiguration than the typical excuse that the default
Apache installation used to label any file extension it
didn't understand as "text/plain" rather than
"application/octet string"

>> The "algorithm for extracting an encoding ...."
> [...]
>> The nature of the "willful violation"
>> (I.e., how it is different) and the
>> justification for the "willful violation"
>> should be included. I can't fathom any
>> justification for it.

> Charset sniffing is required to avoid ugly replacement characters from
> being shown to the user.  Sad, but true.

I'm sorry, but the paragraphs preceding the discussion of
"willful violation" seem to be talking about how to determine
the value of the charset= MIME parameter, not about the 
interpretation of the results. As I pointed out, this 
document doesn't include the algorithm for charset
sniffing, which still seems to be in the HTML4 document.
If you mean by your disclaimer of a "willful violation" that
you were doing charset sniffing, then moving that text from
the HTML specification into this one (or, as I had also
thought reasonable) a parallel one, would seem to be more
reasonable. The reason for including an algorithm for parsing
the MIME parameters that differs from the ordinary one requires
more justification than you've supplied.



>> file extensions:
>>
>>  Note: It is essential that file extensions
>>  are not used for determining the media type
>>   for resources fetched over HTTP because
>>  file extensions can often by supplied by
>>   malicious parties.
>
>>  "Often" is dubious. How can file extensions be
>> supplied more often than  content-type headers?

> For example, the attacker can chose the file extension in most PHP
> installations because foo.php happily processes:

> http://example.com/foo.php/bar.qux

I suggest including this example in the security considerations section
of the document.

>> What is the security threat?

> The security threat is that if you treat an HTML file extension as
> evidence the server wants the response to be treated as text/html you
> will introduce XSS vulnerabilities into some large number of sites
> running PHP (among others).

Yes, I think this belongs in the "security considerations" section
of the document.

>> I'd think that the behavior of "how to sniff"
>> should start out with what the inputs are
>> (the first N bytes of some data from a response).

> This business about waiting for 512 bytes has to do with a poor
> interaction between buffering for sniffing and Comet
> <http://en.wikipedia.org/wiki/Comet_(programming)>.  Basically, if you
> wait forever for the 512 bytes you need to sniff completely, then you
> break things like chat in Gmail.  For example, Gmail chat used to not
> work in Safari for this reason.  However, always using the first chunk
> of data off the network to sniff means you'll get unpredictable
> results based on how exactly the response was chunked.  Hence the
> advice to wait for 512 bytes but not a requirement to wait forever.

I'm just pointing out that the consequence is that sniffing is indeterminate,
and that even if you have an implementation that claims that it sniffs,
that it might not if the server responds slowly and your wait times out.
This might even be an attack vector? (I'm not sure how, though).

I don't think you've made clear the advantage of sniffing at all until
you've received the entire message body, much less the first 512 bytes.

Your example of gmail chat isn't convincing... I suppose the browser
doesn't know that it's talking to a presumably sniffing-not-necessary
site (Google), but the first 512 bytes of the response might not arrive
in a timely fashion?

Maybe you could explain this use case in more detail.

Larry