Re: comments on draft-abarth-mime-sniff-03

Adam Barth <ietf@adambarth.com> Sat, 23 January 2010 03:26 UTC

MIME-Version: 1.0
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D5FE353@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com> <7789133a1001201514l47b43b8bw958e42794707dbc9@mail.gmail.com> <C68CB012D9182D408CED7B884F441D4D5FE353@nambxv01a.corp.adobe.com>
From: Adam Barth <ietf@adambarth.com>
Date: Fri, 22 Jan 2010 19:25:47 -0800
Message-ID: <7789133a1001221925sf1f55b8k31953828848f2787@mail.gmail.com>
Subject: Re: comments on draft-abarth-mime-sniff-03
To: Larry Masinter <masinter@adobe.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Precedence: list

At a higher level, what do folks think about re-writing the draft in a
more informative style instead of a normative style?  A bunch of
Larry's points boil down to the strengths of the normative
requirements and the scope of the affected user agents.  I certainly
have no wish to ram sniffing down anyone's throats.  I'd rather
provide this document as a reference for folks who feel compelled to
do content sniffing but who don't want to invest the year and a half
of research that my colleagues and I invested to arrive at this
algorithm.

Some comments below.

On Fri, Jan 22, 2010 at 5:34 PM, Larry Masinter <masinter@adobe.com> wrote:
> Malformed content is an indication of more serious site
> misconfiguration than the typical excuse that the default
> Apache installation used to label any file extension it
> didn't understand as "text/plain" rather than
> "application/octet string"

To ground this discussion, here's a real-world example from Google
News.  Google News syndicates images from a bunch of news sources.
For a while, instead of providing the content type of the images, the
servers were providing the "magic numbers" that are found at the
beginning of the images.  Once we discovered the issue, we convinced
the team to fix the issue.

>> Charset sniffing is required to avoid ugly replacement characters from
>> being shown to the user.  Sad, but true.
>
> I'm sorry, but the paragraphs preceding the discussion of
> "willful violation" seem to be talking about how to determine
> the value of the charset= MIME parameter, not about the
> interpretation of the results. As I pointed out, this
> document doesn't include the algorithm for charset
> sniffing, which still seems to be in the HTML4 document.

I presume you mean HTML5.

> If you mean by your disclaimer of a "willful violation" that
> you were doing charset sniffing, then moving that text from
> the HTML specification into this one (or, as I had also
> thought reasonable) a parallel one, would seem to be more
> reasonable. The reason for including an algorithm for parsing
> the MIME parameters that differs from the ordinary one requires
> more justification than you've supplied.

This text originated from the HTML5 specification.  It's entirely
possible it doesn't make sense without the surrounding context.  I'll
investigate.

>>> file extensions:
>>>
>>>  Note: It is essential that file extensions
>>>  are not used for determining the media type
>>>   for resources fetched over HTTP because
>>>  file extensions can often by supplied by
>>>   malicious parties.
>>
>>>  "Often" is dubious. How can file extensions be
>>> supplied more often than  content-type headers?
>
>> For example, the attacker can chose the file extension in most PHP
>> installations because foo.php happily processes:
>
>> http://example.com/foo.php/bar.qux
>
> I suggest including this example in the security considerations section
> of the document.

Will do.

>>> What is the security threat?
>
>> The security threat is that if you treat an HTML file extension as
>> evidence the server wants the response to be treated as text/html you
>> will introduce XSS vulnerabilities into some large number of sites
>> running PHP (among others).
>
> Yes, I think this belongs in the "security considerations" section
> of the document.

Will do.

>>> I'd think that the behavior of "how to sniff"
>>> should start out with what the inputs are
>>> (the first N bytes of some data from a response).
>
>> This business about waiting for 512 bytes has to do with a poor
>> interaction between buffering for sniffing and Comet
>> <http://en.wikipedia.org/wiki/Comet_(programming)>.  Basically, if you
>> wait forever for the 512 bytes you need to sniff completely, then you
>> break things like chat in Gmail.  For example, Gmail chat used to not
>> work in Safari for this reason.  However, always using the first chunk
>> of data off the network to sniff means you'll get unpredictable
>> results based on how exactly the response was chunked.  Hence the
>> advice to wait for 512 bytes but not a requirement to wait forever.
>
> I'm just pointing out that the consequence is that sniffing is indeterminate,
> and that even if you have an implementation that claims that it sniffs,
> that it might not if the server responds slowly and your wait times out.
> This might even be an attack vector? (I'm not sure how, though).

In reality, the sniffing algorithm is highly predictable.  Although
it's possible for the HTML heuristic to use all 512 bytes, it's
extremely rare.  In fact, in 97% of cases when the heuristic fires,
the HTML tag begins at the first byte of the entity-body.  It's also
possible for the text-or-binary algorithm to consume all 512 bytes,
but in most binary files, the first "non-text" character occurs well
before the 512th byte.

This is not an attack vector, that I'm aware of, because the default
types you could force by truncating the sniffing buffer are "safe"
(e.g., text/plain or application/octet-stream).

> I don't think you've made clear the advantage of sniffing at all until
> you've received the entire message body, much less the first 512 bytes.

There's no reason to wait for the entire message body.  The algorithm
considers only the first 512 bytes.  Also, waiting for the whole
entity body would be disastrous to performance.

In practice, waiting for the first 512 bytes is very cheap, so you
might as well do it to make the algorithm 100% predictable.  In some
rare cases, it's expensive to wait, so the epsilon extra
predictability isn't worth the cost, e.g., breaking Gmail chat.

> Your example of gmail chat isn't convincing...

Isn't convincing to whom?  To you?  It was convincing enough to the
Safari team that they changed their sniffing algorithm in this regard
to what's described in the document.

> I suppose the browser
> doesn't know that it's talking to a presumably sniffing-not-necessary
> site (Google), but the first 512 bytes of the response might not arrive
> in a timely fashion?

In this case, the server was responding with a sniffable Content-Type
and the first 512 bytes would never arrive.  It sounds like you don't
understand how Comet works.  I'd encourage you to read the citation I
provided before giving your opinion about things you don't understand:

http://en.wikipedia.org/wiki/Comet_(programming)

> Maybe you could explain this use case in more detail.

The use case is as follows:

1) The server responds with a sniffable Content-Type and a chunked encoding.
2) The server sends the first chunk (say of five bytes) and then blocks.
3) The client blocks waiting for the server to send more content.
4) ... time passes ...
5) The server sends the second chunk (say another five bytes) and then blocks.

Now, what the web site wants is that the first five bytes are
delivered to the XMLHttpRequest API soon after they arrive from the
server.  Then, the web site wants the second five bytes to be
delivered to the XMLHttpRequest API soon after they arrive from the
server.  This techniques, known as Comet, allows the web site to
simulate a server data "push" and is why you're able to receive a
Gmail chat message soon after your contact sends it.  However, if the
user agent blocks the response waiting for 512 bytes, those bytes will
never arrive and you'll never get your chat messages.

Of course, there are ways for the server to work around this issue by
spamming the Comet channel with 512 bytes of junk at the beginning or
specifying a non-sniffable media type.  However, the reality is that
most user agents won't wait forever for the 512 bytes needed for
sniffing.  Instead, they'll get board of waiting and deliver the bytes
to the XMLHttpRequest API.  Hence, server operators don't employ these
workarounds, and if you're the lone user agent that doesn't behave
this way (e.g., Safari), then Gmail chat doesn't work in your browser
and your users are sad.  Sad users stop using your browser and use
another browser.

All that is a long-winded way of saying that these things are in the
draft for a reason.  The sniffing algorithm is highly constrained by
reality.  Looking for philosophical purity in content sniffing is like
trying to find a Newtonian explanation of Brownian motion.

Adam

comments on draft-abarth-mime-sniff-03 Larry Masinter
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
RE: comments on draft-abarth-mime-sniff-03 Larry Masinter
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
RE: comments on draft-abarth-mime-sniff-03 Ian Hickson
Re: comments on draft-abarth-mime-sniff-03 Ian Hickson
Re: comments on draft-abarth-mime-sniff-03 Julian Reschke
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
Re: comments on draft-abarth-mime-sniff-03 Adam Barth
Re: comments on draft-abarth-mime-sniff-03 Adam Barth