Re: [websec] more on sniffing

Adam Barth <ietf@adambarth.com> Sun, 08 January 2012 20:55 UTC

MIME-Version: 1.0
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D06123B4E47@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D06123B4E47@nambxv01a.corp.adobe.com>
From: Adam Barth <ietf@adambarth.com>
Date: Sun, 08 Jan 2012 12:55:12 -0800
Message-ID: <CAJE5ia9+MqfbcpCyr+weeRRThygGCRtquDR9Vr2QOCdzU8Emtg@mail.gmail.com>
To: Larry Masinter <masinter@adobe.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Cc: "IETF WebSec WG (websec@ietf.org)" <websec@ietf.org>
Subject: Re: [websec] more on sniffing
Precedence: list

On Sun, Jan 8, 2012 at 9:12 AM, Larry Masinter <masinter@adobe.com> wrote:
>     <section anchor="intro" title="Introduction">
>       <t>HTTP provides a way of labeling content with its
>       Content-Type, as an indication of the file format / language by
>       which the content is to be interpreted.  Unfortunately, many web
>       servers, as deployed, supply incorrect Content-Type header
>       fields with their HTTP responses.  In order to be compatible
>       with these servers, web clients would consider the content of
>       HTTP responses as well as the Content-Type header fields when
>       determining how the content was interpreted (the "effective
>       media type").  Looking at content to determine its type (aka
>       "sniffing") is also used when no Content-Type header is
>       supplied.</t>
>
> Seemed important to define “sniffing”.
>
>       <list style="symbols">
>      <t> Q: Why doesn't file upload sniff? </t>

Because it hasn't historically.

>       <t>Q: where is the concept
>       of 'privilege' defined?</t>

RFC 6454, but we might want to update the terminology to "authority"
to better align with that document.

>       <t> Why not treat sniffed content as a
>       different origin to prevent XSS? </t>

I answered this question in my previous mail.

>       </list>
>
> I’m not sure, but at least some of the bigger unaddressed issues could be in
> the document? Probably the “status of this document” should just point to
> the tracker and I should enter in things as issues, not sure how the group
> wants to track these.

Referencing the tracker seems fine, but I would assume that's true of
every working document in every IETF working group.

>       <t>However, overly ambitious sniffing has resulted in a number
>       of security issues in the past. For example, consider a simple
>       server which allows users to upload content, which is then
>       served as simple content such as plain text or an images.
>       However, if the content is subsequently 'sniffed' to be active
>       content; for example, a malicious user might be able to leverage
>       content sniffing to mount a cross-site script attack by
>       including JavaScript code in the uploaded file that a user agent
>       treats as text/html.</t>
>
> As I noted before, I wish there were more examples of sniffing security
> issues since that’s the main justification for this document, at least as a
> ‘websec’ document.

Feel free to add a reference to
<http://www.adambarth.com/papers/2009/barth-caballero-song.pdf>, which
contains a number of concrete attacks.

>       <t>This document describes a method for sniffing that carefully
>       balances the compatibility needs of user agent implementors with the
>       security constraints.</t>
>
> I only changed “algorithm” to “method” because of the many unspecified
> options (e.g., how long to wait for additional data).
>
>       <t>Often, sniffing is done in a context where the use
>        of the data retrieved is not merely for independent presentation,
>         but for embedding (as an image, as video) or other uses
>         (as a style sheet, a script). </t>
>
> I think this is the crux of some additional material, where you know that
> you’re sniffing  a font or a script or a style sheet, and that knowledge
> influences the sniffing decision.
>
>       <t>One can consider 'sniffing' in several categories:
>
>        <list style="symbols">
>                 <t>Content delivered via a channel which does not allow
>           supplying Content-Type </t>
>                 <t>Content delivered via HTTP, but No Content-Type
> supplied</t>
>                 <t>Content-Type is malformed</t>
>          <t>Content-Type is duplicated with different values</t>
>                 <t>Content-Type is syntactically legal, but content clearly
> does not
>            match constraints of specified content-type. </t>
>          <t>Content-Type is syntactically legal, content may actually match
>            constraints of specified content-type, but the content
>            is intended for use in a limited context, in which the
>            content could also be interpreted as another type.</t>
>          <t>Content matches the specified content-type constraints, and that
>            type is appropriate for the context of use, but there is some
>            other belief that content has been mislabeled.</t>
>        </list></t>

I'm not sure what the point of this taxonomy is.

>        <t>The supplied content-type usually comes from HTTP, but in
>        some situations, the link to the content contains a
>        content-type.  (For example, in a style sheet or script.)
>       </t>
>
> This is trying to address the question of when sniffing might result in
> “false positives”.   The main issue is that sniffing needs to come up with a
> definitive answer (“what is this”) even in situations where the signature of
> the data is consistent with multiple results (data could be interpreted as
> application/octet-stream, text/plain, application/xml,
> application/something1+xml, application/something2+xml, and all of those
> match the signature data; same issue happens with zip-based packaging
> formats…

Why not just say that then.

>       <t>ftp: and file: resources also examine the file extension.</t>
>
> The widget packaging recommendation, which normatively references some
> version of sniffing, also uses file extensions for some content and not
> others, but I haven’t figured out yet where that belongs.

The widget spec is very confused.  I would pay more attention to code
that's been widely deployed.

>       <t> The methods described here have been constructed with
>       reference to content sniffing algorithms present in popular user
>       agents, an extensive database of existing web content, and
>       metrics collected from implementations deployed to a sizable
>       number of users <xref target="BarthCaballeroSong2009" />.</t>
>
>       <t>For reasons discussed in
> http://www.w3.org/2001/tag/doc/mime-respect,
>      sniffing should be avoided when the content could likely be reasonably
>      interpreted as the content-type supplied.  If it is necessary to sniff
>      in such situations, it is preferable to do so only with care, e.g.,
>      by offering the user an alternative or explicit choice, or by noting
>      and remembering origins which have content that requires sniffing.</t>

I strongly disagree with this last paragraph.  If you have your heart
set on adding it, let's discuss it in a separate thread first.

> This should turn into a reference.   I know current implementors don’t  want
> to bother warning users that their favorite sites actually are sending out
> incorrect MIME labels, but we should still recommend it.

We shouldn't recommend behavior that implementations aren't going to implement.

>      <t>Sniffing is by its nature a heuristic process, because there are
>      many situations where content matches the signatures and capabilities
>      of many different possible content-type values.

I disagree with this statement as well.  The sniffing we're talking
about here is not a heuristic.  It's a historical anomaly that needs
to be corrected for in order for user agents to be compatible with
some web sites.

> False positives result
>      in security problems, while inconsistent sniffing results in
>      interoperability problems. For these reasons, implementations of
>      any receiver of content, attempting to follow the guidelines in this
>      document, MUST NOT result in any value other than those permitted
>      in this specification.</t>
>
> I’m still not sure what the scope of this document is, insofar as whether it
> is normative for every browser.

It does.

> Perhaps the best thing is to try to explicitly address “scope” by moving
> those parts of the introduction which address scope into a separate section.

Adam

[websec] more on sniffing Larry Masinter
Re: [websec] more on sniffing Adam Barth