[websec] more on sniffing

Larry Masinter <masinter@adobe.com> Sun, 08 January 2012 17:12 UTC

Return-Path: <masinter@adobe.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 5228921F8532 for <websec@ietfa.amsl.com>; Sun, 8 Jan 2012 09:12:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.926
X-Spam-Status: No, score=-106.926 tagged_above=-999 required=5 tests=[AWL=-0.328, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id lOfhNKDl7DOj for <websec@ietfa.amsl.com>; Sun, 8 Jan 2012 09:12:33 -0800 (PST)
Received: from exprod6og102.obsmtp.com (exprod6og102.obsmtp.com []) by ietfa.amsl.com (Postfix) with ESMTP id CE3FE21F8505 for <websec@ietf.org>; Sun, 8 Jan 2012 09:12:32 -0800 (PST)
Received: from outbound-smtp-2.corp.adobe.com ([]) by exprod6ob102.postini.com ([]) with SMTP ID DSNKTwnOa3OGhcKIGdnnOuisGaJb4o6JB3qv@postini.com; Sun, 08 Jan 2012 09:12:33 PST
Received: from inner-relay-4.eur.adobe.com (inner-relay-4b []) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q08HCAPu018321 for <websec@ietf.org>; Sun, 8 Jan 2012 09:12:11 -0800 (PST)
Received: from nacas03.corp.adobe.com (nacas03.corp.adobe.com []) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id q08HC87o019613 for <websec@ietf.org>; Sun, 8 Jan 2012 09:12:09 -0800 (PST)
Received: from SJ1SWM219.corp.adobe.com ( by nacas03.corp.adobe.com ( with Microsoft SMTP Server (TLS) id; Sun, 8 Jan 2012 09:12:08 -0800
Received: from nambxv01a.corp.adobe.com ([]) by SJ1SWM219.corp.adobe.com ([fe80::d55c:7209:7a34:fcf7%12]) with mapi; Sun, 8 Jan 2012 09:12:08 -0800
From: Larry Masinter <masinter@adobe.com>
To: "IETF WebSec WG (websec@ietf.org)" <websec@ietf.org>
Date: Sun, 8 Jan 2012 09:12:05 -0800
Thread-Topic: more on sniffing
Thread-Index: AczOKJtRf8SMofArQ4eEDCCPBpU92w==
Message-ID: <C68CB012D9182D408CED7B884F441D4D06123B4E47@nambxv01a.corp.adobe.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_C68CB012D9182D408CED7B884F441D4D06123B4E47nambxv01acorp_"
MIME-Version: 1.0
Subject: [websec] more on sniffing
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 08 Jan 2012 17:12:37 -0000

    <section anchor="intro" title="Introduction">

      <t>HTTP provides a way of labeling content with its
      Content-Type, as an indication of the file format / language by
      which the content is to be interpreted.  Unfortunately, many web
      servers, as deployed, supply incorrect Content-Type header
      fields with their HTTP responses.  In order to be compatible
      with these servers, web clients would consider the content of
      HTTP responses as well as the Content-Type header fields when
      determining how the content was interpreted (the "effective
      media type").  Looking at content to determine its type (aka
      "sniffing") is also used when no Content-Type header is

Seemed important to define "sniffing".

      <list style="symbols">
     <t> Q: Why doesn't file upload sniff? </t>
      <t>Q: where is the concept
      of 'privilege' defined?</t>
      <t> Why not treat sniffed content as a
      different origin to prevent XSS? </t>

I'm not sure, but at least some of the bigger unaddressed issues could be in the document? Probably the "status of this document" should just point to the tracker and I should enter in things as issues, not sure how the group wants to track these.

      <t>However, overly ambitious sniffing has resulted in a number
      of security issues in the past. For example, consider a simple
      server which allows users to upload content, which is then
      served as simple content such as plain text or an images.
      However, if the content is subsequently 'sniffed' to be active
      content; for example, a malicious user might be able to leverage
      content sniffing to mount a cross-site script attack by
      including JavaScript code in the uploaded file that a user agent
      treats as text/html.</t>

As I noted before, I wish there were more examples of sniffing security issues since that's the main justification for this document, at least as a 'websec' document.

      <t>This document describes a method for sniffing that carefully
      balances the compatibility needs of user agent implementors with the
      security constraints.</t>

I only changed "algorithm" to "method" because of the many unspecified options (e.g., how long to wait for additional data).

      <t>Often, sniffing is done in a context where the use
       of the data retrieved is not merely for independent presentation,
        but for embedding (as an image, as video) or other uses
        (as a style sheet, a script). </t>

I think this is the crux of some additional material, where you know that you're sniffing  a font or a script or a style sheet, and that knowledge influences the sniffing decision.

      <t>One can consider 'sniffing' in several categories:

       <list style="symbols">
                <t>Content delivered via a channel which does not allow
          supplying Content-Type </t>
                <t>Content delivered via HTTP, but No Content-Type supplied</t>
                <t>Content-Type is malformed</t>
         <t>Content-Type is duplicated with different values</t>
                <t>Content-Type is syntactically legal, but content clearly does not
           match constraints of specified content-type. </t>
         <t>Content-Type is syntactically legal, content may actually match
           constraints of specified content-type, but the content
           is intended for use in a limited context, in which the
           content could also be interpreted as another type.</t>
         <t>Content matches the specified content-type constraints, and that
           type is appropriate for the context of use, but there is some
           other belief that content has been mislabeled.</t>

       <t>The supplied content-type usually comes from HTTP, but in
       some situations, the link to the content contains a
       content-type.  (For example, in a style sheet or script.)

This is trying to address the question of when sniffing might result in "false positives".   The main issue is that sniffing needs to come up with a definitive answer ("what is this") even in situations where the signature of the data is consistent with multiple results (data could be interpreted as application/octet-stream, text/plain, application/xml, application/something1+xml, application/something2+xml, and all of those match the signature data; same issue happens with zip-based packaging formats...

      <t>ftp: and file: resources also examine the file extension.</t>

The widget packaging recommendation, which normatively references some version of sniffing, also uses file extensions for some content and not others, but I haven't figured out yet where that belongs.

      <t> The methods described here have been constructed with
      reference to content sniffing algorithms present in popular user
      agents, an extensive database of existing web content, and
      metrics collected from implementations deployed to a sizable
      number of users <xref target="BarthCaballeroSong2009" />.</t>

      <t>For reasons discussed in http://www.w3.org/2001/tag/doc/mime-respect,
     sniffing should be avoided when the content could likely be reasonably
     interpreted as the content-type supplied.  If it is necessary to sniff
     in such situations, it is preferable to do so only with care, e.g.,
     by offering the user an alternative or explicit choice, or by noting
     and remembering origins which have content that requires sniffing.</t>

This should turn into a reference.   I know current implementors don't  want to bother warning users that their favorite sites actually are sending out incorrect MIME labels, but we should still recommend it.

     <t>Sniffing is by its nature a heuristic process, because there are
     many situations where content matches the signatures and capabilities
     of many different possible content-type values. False positives result
     in security problems, while inconsistent sniffing results in
     interoperability problems. For these reasons, implementations of
     any receiver of content, attempting to follow the guidelines in this
     document, MUST NOT result in any value other than those permitted
     in this specification.</t>

I'm still not sure what the scope of this document is, insofar as whether it is normative for every browser.

Perhaps the best thing is to try to explicitly address "scope" by moving those parts of the introduction which address scope into a separate section.