RE: comments on draft-abarth-mime-sniff-03

Larry Masinter <masinter@adobe.com> Sat, 23 January 2010 01:35 UTC

From: Larry Masinter <masinter@adobe.com>
To: Adam Barth <ietf@adambarth.com>
Date: Fri, 22 Jan 2010 17:34:46 -0800
Subject: RE: comments on draft-abarth-mime-sniff-03
Thread-Topic: comments on draft-abarth-mime-sniff-03
Thread-Index: AcqaJmPgw4U9goOiQJu1785spbFnMABpDvCQ
Message-ID: <C68CB012D9182D408CED7B884F441D4D5FE353@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com> <7789133a1001201514l47b43b8bw958e42794707dbc9@mail.gmail.com>
In-Reply-To: <7789133a1001201514l47b43b8bw958e42794707dbc9@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Ian Hickson <ian@hixie.ch>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Precedence: list

>> A message with more than one content-type header
>> should be treated as malformed.

> What does it mean to treat the response as malformed?  I've seen
> examples of servers that blissfully send more than one Content-Type
> header.  This document just describes how to process the responses
> without making a judge about whether the server is acting properly or
> not.

Implementations should be allowed to not do sniffing when
the content-type is malformed, even when they do sniffing
when the content-type is missing, or specific kinds of
sniffing when the content-type is supplied but just wrong.

Malformed content is an indication of more serious site
misconfiguration than the typical excuse that the default
Apache installation used to label any file extension it
didn't understand as "text/plain" rather than
"application/octet string"

>> The "algorithm for extracting an encoding ...."
> [...]
>> The nature of the "willful violation"
>> (I.e., how it is different) and the
>> justification for the "willful violation"
>> should be included. I can't fathom any
>> justification for it.

> Charset sniffing is required to avoid ugly replacement characters from
> being shown to the user.  Sad, but true.

I'm sorry, but the paragraphs preceding the discussion of
"willful violation" seem to be talking about how to determine
the value of the charset= MIME parameter, not about the 
interpretation of the results. As I pointed out, this 
document doesn't include the algorithm for charset
sniffing, which still seems to be in the HTML4 document.
If you mean by your disclaimer of a "willful violation" that
you were doing charset sniffing, then moving that text from
the HTML specification into this one (or, as I had also
thought reasonable) a parallel one, would seem to be more
reasonable. The reason for including an algorithm for parsing
the MIME parameters that differs from the ordinary one requires
more justification than you've supplied.



>> file extensions:
>>
>>  Note: It is essential that file extensions
>>  are not used for determining the media type
>>   for resources fetched over HTTP because
>>  file extensions can often by supplied by
>>   malicious parties.
>
>>  "Often" is dubious. How can file extensions be
>> supplied more often than  content-type headers?

> For example, the attacker can chose the file extension in most PHP
> installations because foo.php happily processes:

> http://example.com/foo.php/bar.qux

I suggest including this example in the security considerations section
of the document.

>> What is the security threat?

> The security threat is that if you treat an HTML file extension as
> evidence the server wants the response to be treated as text/html you
> will introduce XSS vulnerabilities into some large number of sites
> running PHP (among others).

Yes, I think this belongs in the "security considerations" section
of the document.

>> I'd think that the behavior of "how to sniff"
>> should start out with what the inputs are
>> (the first N bytes of some data from a response).

> This business about waiting for 512 bytes has to do with a poor
> interaction between buffering for sniffing and Comet
> <http://en.wikipedia.org/wiki/Comet_(programming)>.  Basically, if you
> wait forever for the 512 bytes you need to sniff completely, then you
> break things like chat in Gmail.  For example, Gmail chat used to not
> work in Safari for this reason.  However, always using the first chunk
> of data off the network to sniff means you'll get unpredictable
> results based on how exactly the response was chunked.  Hence the
> advice to wait for 512 bytes but not a requirement to wait forever.

I'm just pointing out that the consequence is that sniffing is indeterminate,
and that even if you have an implementation that claims that it sniffs,
that it might not if the server responds slowly and your wait times out.
This might even be an attack vector? (I'm not sure how, though).

I don't think you've made clear the advantage of sniffing at all until
you've received the entire message body, much less the first 512 bytes.

Your example of gmail chat isn't convincing... I suppose the browser
doesn't know that it's talking to a presumably sniffing-not-necessary
site (Google), but the first 512 bytes of the response might not arrive
in a timely fashion?

Maybe you could explain this use case in more detail.

Larry

comments on draft-abarth-mime-sniff-03 Larry Masinter
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
RE: comments on draft-abarth-mime-sniff-03 Larry Masinter
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
RE: comments on draft-abarth-mime-sniff-03 Ian Hickson
Re: comments on draft-abarth-mime-sniff-03 Ian Hickson
Re: comments on draft-abarth-mime-sniff-03 Julian Reschke
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
Re: comments on draft-abarth-mime-sniff-03 Adam Barth