[websec] more on sniffing
Larry Masinter <masinter@adobe.com> Sun, 08 January 2012 17:12 UTC
Return-Path: <masinter@adobe.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5228921F8532 for <websec@ietfa.amsl.com>; Sun, 8 Jan 2012 09:12:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.926
X-Spam-Level:
X-Spam-Status: No, score=-106.926 tagged_above=-999 required=5 tests=[AWL=-0.328, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lOfhNKDl7DOj for <websec@ietfa.amsl.com>; Sun, 8 Jan 2012 09:12:33 -0800 (PST)
Received: from exprod6og102.obsmtp.com (exprod6og102.obsmtp.com [64.18.1.183]) by ietfa.amsl.com (Postfix) with ESMTP id CE3FE21F8505 for <websec@ietf.org>; Sun, 8 Jan 2012 09:12:32 -0800 (PST)
Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob102.postini.com ([64.18.5.12]) with SMTP ID DSNKTwnOa3OGhcKIGdnnOuisGaJb4o6JB3qv@postini.com; Sun, 08 Jan 2012 09:12:33 PST
Received: from inner-relay-4.eur.adobe.com (inner-relay-4b [10.128.4.237]) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q08HCAPu018321 for <websec@ietf.org>; Sun, 8 Jan 2012 09:12:11 -0800 (PST)
Received: from nacas03.corp.adobe.com (nacas03.corp.adobe.com [10.8.189.121]) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id q08HC87o019613 for <websec@ietf.org>; Sun, 8 Jan 2012 09:12:09 -0800 (PST)
Received: from SJ1SWM219.corp.adobe.com (10.5.77.61) by nacas03.corp.adobe.com (10.8.189.121) with Microsoft SMTP Server (TLS) id 8.3.192.1; Sun, 8 Jan 2012 09:12:08 -0800
Received: from nambxv01a.corp.adobe.com ([10.8.189.95]) by SJ1SWM219.corp.adobe.com ([fe80::d55c:7209:7a34:fcf7%12]) with mapi; Sun, 8 Jan 2012 09:12:08 -0800
From: Larry Masinter <masinter@adobe.com>
To: "IETF WebSec WG (websec@ietf.org)" <websec@ietf.org>
Date: Sun, 08 Jan 2012 09:12:05 -0800
Thread-Topic: more on sniffing
Thread-Index: AczOKJtRf8SMofArQ4eEDCCPBpU92w==
Message-ID: <C68CB012D9182D408CED7B884F441D4D06123B4E47@nambxv01a.corp.adobe.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_C68CB012D9182D408CED7B884F441D4D06123B4E47nambxv01acorp_"
MIME-Version: 1.0
Subject: [websec] more on sniffing
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 08 Jan 2012 17:12:37 -0000
<section anchor="intro" title="Introduction"> <t>HTTP provides a way of labeling content with its Content-Type, as an indication of the file format / language by which the content is to be interpreted. Unfortunately, many web servers, as deployed, supply incorrect Content-Type header fields with their HTTP responses. In order to be compatible with these servers, web clients would consider the content of HTTP responses as well as the Content-Type header fields when determining how the content was interpreted (the "effective media type"). Looking at content to determine its type (aka "sniffing") is also used when no Content-Type header is supplied.</t> Seemed important to define "sniffing". <list style="symbols"> <t> Q: Why doesn't file upload sniff? </t> <t>Q: where is the concept of 'privilege' defined?</t> <t> Why not treat sniffed content as a different origin to prevent XSS? </t> </list> I'm not sure, but at least some of the bigger unaddressed issues could be in the document? Probably the "status of this document" should just point to the tracker and I should enter in things as issues, not sure how the group wants to track these. <t>However, overly ambitious sniffing has resulted in a number of security issues in the past. For example, consider a simple server which allows users to upload content, which is then served as simple content such as plain text or an images. However, if the content is subsequently 'sniffed' to be active content; for example, a malicious user might be able to leverage content sniffing to mount a cross-site script attack by including JavaScript code in the uploaded file that a user agent treats as text/html.</t> As I noted before, I wish there were more examples of sniffing security issues since that's the main justification for this document, at least as a 'websec' document. <t>This document describes a method for sniffing that carefully balances the compatibility needs of user agent implementors with the security constraints.</t> I only changed "algorithm" to "method" because of the many unspecified options (e.g., how long to wait for additional data). <t>Often, sniffing is done in a context where the use of the data retrieved is not merely for independent presentation, but for embedding (as an image, as video) or other uses (as a style sheet, a script). </t> I think this is the crux of some additional material, where you know that you're sniffing a font or a script or a style sheet, and that knowledge influences the sniffing decision. <t>One can consider 'sniffing' in several categories: <list style="symbols"> <t>Content delivered via a channel which does not allow supplying Content-Type </t> <t>Content delivered via HTTP, but No Content-Type supplied</t> <t>Content-Type is malformed</t> <t>Content-Type is duplicated with different values</t> <t>Content-Type is syntactically legal, but content clearly does not match constraints of specified content-type. </t> <t>Content-Type is syntactically legal, content may actually match constraints of specified content-type, but the content is intended for use in a limited context, in which the content could also be interpreted as another type.</t> <t>Content matches the specified content-type constraints, and that type is appropriate for the context of use, but there is some other belief that content has been mislabeled.</t> </list></t> <t>The supplied content-type usually comes from HTTP, but in some situations, the link to the content contains a content-type. (For example, in a style sheet or script.) </t> This is trying to address the question of when sniffing might result in "false positives". The main issue is that sniffing needs to come up with a definitive answer ("what is this") even in situations where the signature of the data is consistent with multiple results (data could be interpreted as application/octet-stream, text/plain, application/xml, application/something1+xml, application/something2+xml, and all of those match the signature data; same issue happens with zip-based packaging formats... <t>ftp: and file: resources also examine the file extension.</t> The widget packaging recommendation, which normatively references some version of sniffing, also uses file extensions for some content and not others, but I haven't figured out yet where that belongs. <t> The methods described here have been constructed with reference to content sniffing algorithms present in popular user agents, an extensive database of existing web content, and metrics collected from implementations deployed to a sizable number of users <xref target="BarthCaballeroSong2009" />.</t> <t>For reasons discussed in http://www.w3.org/2001/tag/doc/mime-respect, sniffing should be avoided when the content could likely be reasonably interpreted as the content-type supplied. If it is necessary to sniff in such situations, it is preferable to do so only with care, e.g., by offering the user an alternative or explicit choice, or by noting and remembering origins which have content that requires sniffing.</t> This should turn into a reference. I know current implementors don't want to bother warning users that their favorite sites actually are sending out incorrect MIME labels, but we should still recommend it. <t>Sniffing is by its nature a heuristic process, because there are many situations where content matches the signatures and capabilities of many different possible content-type values. False positives result in security problems, while inconsistent sniffing results in interoperability problems. For these reasons, implementations of any receiver of content, attempting to follow the guidelines in this document, MUST NOT result in any value other than those permitted in this specification.</t> I'm still not sure what the scope of this document is, insofar as whether it is normative for every browser. Perhaps the best thing is to try to explicitly address "scope" by moving those parts of the introduction which address scope into a separate section. Larry
- [websec] more on sniffing Larry Masinter
- Re: [websec] more on sniffing Adam Barth