Re: [websec] more on sniffing
Adam Barth <ietf@adambarth.com> Sun, 08 January 2012 20:55 UTC
Return-Path: <ietf@adambarth.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A070A21F857A for <websec@ietfa.amsl.com>; Sun, 8 Jan 2012 12:55:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.647
X-Spam-Level:
X-Spam-Status: No, score=-1.647 tagged_above=-999 required=5 tests=[AWL=-1.125, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, FRT_ADOBE2=2.455, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sPlXG817SoFJ for <websec@ietfa.amsl.com>; Sun, 8 Jan 2012 12:55:47 -0800 (PST)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by ietfa.amsl.com (Postfix) with ESMTP id 7146821F8533 for <websec@ietf.org>; Sun, 8 Jan 2012 12:55:45 -0800 (PST)
Received: by iabz21 with SMTP id z21so6289975iab.31 for <websec@ietf.org>; Sun, 08 Jan 2012 12:55:45 -0800 (PST)
Received: by 10.42.152.65 with SMTP id h1mr13262589icw.50.1326056144563; Sun, 08 Jan 2012 12:55:44 -0800 (PST)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by mx.google.com with ESMTPS id va6sm45162425igc.6.2012.01.08.12.55.43 (version=SSLv3 cipher=OTHER); Sun, 08 Jan 2012 12:55:43 -0800 (PST)
Received: by iabz21 with SMTP id z21so6289953iab.31 for <websec@ietf.org>; Sun, 08 Jan 2012 12:55:43 -0800 (PST)
Received: by 10.50.219.225 with SMTP id pr1mr7004358igc.21.1326056143165; Sun, 08 Jan 2012 12:55:43 -0800 (PST)
MIME-Version: 1.0
Received: by 10.231.62.139 with HTTP; Sun, 8 Jan 2012 12:55:12 -0800 (PST)
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D06123B4E47@nambxv01a.corp.adobe.com>
References: <C68CB012D9182D408CED7B884F441D4D06123B4E47@nambxv01a.corp.adobe.com>
From: Adam Barth <ietf@adambarth.com>
Date: Sun, 08 Jan 2012 12:55:12 -0800
Message-ID: <CAJE5ia9+MqfbcpCyr+weeRRThygGCRtquDR9Vr2QOCdzU8Emtg@mail.gmail.com>
To: Larry Masinter <masinter@adobe.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Cc: "IETF WebSec WG (websec@ietf.org)" <websec@ietf.org>
Subject: Re: [websec] more on sniffing
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 08 Jan 2012 20:55:48 -0000
On Sun, Jan 8, 2012 at 9:12 AM, Larry Masinter <masinter@adobe.com> wrote: > <section anchor="intro" title="Introduction"> > <t>HTTP provides a way of labeling content with its > Content-Type, as an indication of the file format / language by > which the content is to be interpreted. Unfortunately, many web > servers, as deployed, supply incorrect Content-Type header > fields with their HTTP responses. In order to be compatible > with these servers, web clients would consider the content of > HTTP responses as well as the Content-Type header fields when > determining how the content was interpreted (the "effective > media type"). Looking at content to determine its type (aka > "sniffing") is also used when no Content-Type header is > supplied.</t> > > Seemed important to define “sniffing”. > > <list style="symbols"> > <t> Q: Why doesn't file upload sniff? </t> Because it hasn't historically. > <t>Q: where is the concept > of 'privilege' defined?</t> RFC 6454, but we might want to update the terminology to "authority" to better align with that document. > <t> Why not treat sniffed content as a > different origin to prevent XSS? </t> I answered this question in my previous mail. > </list> > > I’m not sure, but at least some of the bigger unaddressed issues could be in > the document? Probably the “status of this document” should just point to > the tracker and I should enter in things as issues, not sure how the group > wants to track these. Referencing the tracker seems fine, but I would assume that's true of every working document in every IETF working group. > <t>However, overly ambitious sniffing has resulted in a number > of security issues in the past. For example, consider a simple > server which allows users to upload content, which is then > served as simple content such as plain text or an images. > However, if the content is subsequently 'sniffed' to be active > content; for example, a malicious user might be able to leverage > content sniffing to mount a cross-site script attack by > including JavaScript code in the uploaded file that a user agent > treats as text/html.</t> > > As I noted before, I wish there were more examples of sniffing security > issues since that’s the main justification for this document, at least as a > ‘websec’ document. Feel free to add a reference to <http://www.adambarth.com/papers/2009/barth-caballero-song.pdf>, which contains a number of concrete attacks. > <t>This document describes a method for sniffing that carefully > balances the compatibility needs of user agent implementors with the > security constraints.</t> > > I only changed “algorithm” to “method” because of the many unspecified > options (e.g., how long to wait for additional data). > > <t>Often, sniffing is done in a context where the use > of the data retrieved is not merely for independent presentation, > but for embedding (as an image, as video) or other uses > (as a style sheet, a script). </t> > > I think this is the crux of some additional material, where you know that > you’re sniffing a font or a script or a style sheet, and that knowledge > influences the sniffing decision. > > <t>One can consider 'sniffing' in several categories: > > <list style="symbols"> > <t>Content delivered via a channel which does not allow > supplying Content-Type </t> > <t>Content delivered via HTTP, but No Content-Type > supplied</t> > <t>Content-Type is malformed</t> > <t>Content-Type is duplicated with different values</t> > <t>Content-Type is syntactically legal, but content clearly > does not > match constraints of specified content-type. </t> > <t>Content-Type is syntactically legal, content may actually match > constraints of specified content-type, but the content > is intended for use in a limited context, in which the > content could also be interpreted as another type.</t> > <t>Content matches the specified content-type constraints, and that > type is appropriate for the context of use, but there is some > other belief that content has been mislabeled.</t> > </list></t> I'm not sure what the point of this taxonomy is. > <t>The supplied content-type usually comes from HTTP, but in > some situations, the link to the content contains a > content-type. (For example, in a style sheet or script.) > </t> > > This is trying to address the question of when sniffing might result in > “false positives”. The main issue is that sniffing needs to come up with a > definitive answer (“what is this”) even in situations where the signature of > the data is consistent with multiple results (data could be interpreted as > application/octet-stream, text/plain, application/xml, > application/something1+xml, application/something2+xml, and all of those > match the signature data; same issue happens with zip-based packaging > formats… Why not just say that then. > <t>ftp: and file: resources also examine the file extension.</t> > > The widget packaging recommendation, which normatively references some > version of sniffing, also uses file extensions for some content and not > others, but I haven’t figured out yet where that belongs. The widget spec is very confused. I would pay more attention to code that's been widely deployed. > <t> The methods described here have been constructed with > reference to content sniffing algorithms present in popular user > agents, an extensive database of existing web content, and > metrics collected from implementations deployed to a sizable > number of users <xref target="BarthCaballeroSong2009" />.</t> > > <t>For reasons discussed in > http://www.w3.org/2001/tag/doc/mime-respect, > sniffing should be avoided when the content could likely be reasonably > interpreted as the content-type supplied. If it is necessary to sniff > in such situations, it is preferable to do so only with care, e.g., > by offering the user an alternative or explicit choice, or by noting > and remembering origins which have content that requires sniffing.</t> I strongly disagree with this last paragraph. If you have your heart set on adding it, let's discuss it in a separate thread first. > This should turn into a reference. I know current implementors don’t want > to bother warning users that their favorite sites actually are sending out > incorrect MIME labels, but we should still recommend it. We shouldn't recommend behavior that implementations aren't going to implement. > <t>Sniffing is by its nature a heuristic process, because there are > many situations where content matches the signatures and capabilities > of many different possible content-type values. I disagree with this statement as well. The sniffing we're talking about here is not a heuristic. It's a historical anomaly that needs to be corrected for in order for user agents to be compatible with some web sites. > False positives result > in security problems, while inconsistent sniffing results in > interoperability problems. For these reasons, implementations of > any receiver of content, attempting to follow the guidelines in this > document, MUST NOT result in any value other than those permitted > in this specification.</t> > > I’m still not sure what the scope of this document is, insofar as whether it > is normative for every browser. It does. > Perhaps the best thing is to try to explicitly address “scope” by moving > those parts of the introduction which address scope into a separate section. Adam
- [websec] more on sniffing Larry Masinter
- Re: [websec] more on sniffing Adam Barth