Re: [websec] Content sniffing

Adam Barth <ietf@adambarth.com> Tue, 10 July 2012 01:53 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 14CB411E8100 for <websec@ietfa.amsl.com>; Mon, 9 Jul 2012 18:53:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.977
X-Spam-Level:
X-Spam-Status: No, score=-2.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KlpMRiEI6KL6 for <websec@ietfa.amsl.com>; Mon, 9 Jul 2012 18:53:42 -0700 (PDT)
Received: from mail-yx0-f172.google.com (mail-yx0-f172.google.com [209.85.213.172]) by ietfa.amsl.com (Postfix) with ESMTP id EB2EF11E8106 for <websec@ietf.org>; Mon, 9 Jul 2012 18:53:41 -0700 (PDT)
Received: by yenq13 with SMTP id q13so11768195yen.31 for <websec@ietf.org>; Mon, 09 Jul 2012 18:54:08 -0700 (PDT)
Received: by 10.236.75.232 with SMTP id z68mr50416617yhd.90.1341885247967; Mon, 09 Jul 2012 18:54:07 -0700 (PDT)
Received: from mail-ob0-f172.google.com (mail-ob0-f172.google.com [209.85.214.172]) by mx.google.com with ESMTPS id l49sm66211682yhj.8.2012.07.09.18.54.06 (version=SSLv3 cipher=OTHER); Mon, 09 Jul 2012 18:54:06 -0700 (PDT)
Received: by obbwc20 with SMTP id wc20so1448707obb.31 for <websec@ietf.org>; Mon, 09 Jul 2012 18:54:05 -0700 (PDT)
Received: by 10.182.37.41 with SMTP id v9mr14457207obj.23.1341885245548; Mon, 09 Jul 2012 18:54:05 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.226.5 with HTTP; Mon, 9 Jul 2012 18:53:34 -0700 (PDT)
In-Reply-To: <18D2BFE7-6F14-4B7E-BDDE-FA9F7E134E88@bbn.com>
References: <CC7E8027-2CCE-41B7-9244-1638C15830A5@bbn.com> <CAJE5ia-qAyM1v9JrKJaO6ORi48oVFfk9x13Pw48M8SnB746D9g@mail.gmail.com> <71595112-9084-47B8-BD2E-44381509536E@bbn.com> <CAJE5ia_hM0J4QBYUcLKkei6bv+Pk4mGxWLhVtpi1S_D0tv=ezA@mail.gmail.com> <18D2BFE7-6F14-4B7E-BDDE-FA9F7E134E88@bbn.com>
From: Adam Barth <ietf@adambarth.com>
Date: Mon, 9 Jul 2012 18:53:34 -0700
Message-ID: <CAJE5ia8zP0HhvWHA15-q6e4quwxe80i93hK2oWPOLO751rmu+A@mail.gmail.com>
To: "Richard L. Barnes" <rbarnes@bbn.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: websec@ietf.org
Subject: Re: [websec] Content sniffing
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jul 2012 01:53:47 -0000

On Mon, Jul 9, 2012 at 6:42 PM, Richard L. Barnes <rbarnes@bbn.com> wrote:
> On Jul 9, 2012, at 7:24 PM, Adam Barth wrote:
>> On Mon, Jul 9, 2012 at 4:19 PM, Richard L. Barnes <rbarnes@bbn.com> wrote:
>>> I haven't thought much about this, but a couple of thoughts:
>>>
>>> The binary prologue means that the document is not valid HTML, so in principle, it shouldn't be accepted as HTML.  It makes you wonder what other stuff you could put in there that the browser would stuff into the DOM without it being obvious on the wire, say, to a proxy.  I'm imagining things like encrypted / compressed Javascript code that could be unpacked by the more obviously HTML part of the page.
>>
>> You don't have to imagine.  It's specified in HTML5.
>
> Could you clarify?  What is "it"?  Reference would be helpful.

You mentioned that you were wondering what "other stuff" you could put
there that the browser would stuff into the DOM.  The HTML
specification [1] defines precisely what DOM you'll get for every
possible input, so you don't need to wonder.

> Is there really a use case for inserting into the DOM arbitrary octets that are not syntactically part of the HTML page?

This topic has been discussed at length in the HTML working group.
It's probably not worth re-hashing it on this list.  The short answer
is that it's what web sites expect browsers to do.

Adam

[1] http://whatwg.org/specs/web-apps/current-work/ (or
http://www.w3.org/TR/html5/ if you want to see the more official but
less up-to-day version).


>>> In a related vein, the "Text or Binary" section of draft-ietf-websec-mime-sniff says that nothing scriptable must come out of sniffing a binary blob.  Yet in this case, it produced "text/html", which is obviously scriptable.
>>
>> The browser isn't sniffing HTML in this case.  The server sent a
>> Content-Type header with text/html.
>>
>> Adam
>>
>>
>>> On Jul 9, 2012, at 5:05 PM, Adam Barth wrote:
>>>
>>>> Why is this sniffing gone awry?  Nothing bad seems to have happened in
>>>> this example.
>>>>
>>>> Adam
>>>>
>>>>
>>>> On Mon, Jul 9, 2012 at 1:55 PM, Richard L. Barnes <rbarnes@bbn.com> wrote:
>>>>> Related to draft-ietf-websec-mime-sniff, an example of sniffing gone awry:
>>>>> <http://lcamtuf.coredump.cx/squirrel/>
>>>>>
>>>>> It's a valid JPEG image that contains and HTML snippet in a comment segment.  As a result, when a browser loads the URL expecting an image, it renders the image content, and when it expects HTML, it skips the binary junk at the top and renders the HTML [*]. (In both cases, the server reports Content-Type text/html.)   What's even more startling is that Chrome helpfully adds the binary junk at the top as the first child of the <body> element in the parsed DOM!
>>>>>
>>>>> --Richard
>>>>>
>>>>>
>>>>> [*] At least in Chrome 20.0.1132.47
>>>>> _______________________________________________
>>>>> websec mailing list
>>>>> websec@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/websec
>>>
>