Re: [websec] #22: content-type sniffing should include charset sniffing

"Anne van Kesteren" <annevk@opera.com> Mon, 24 October 2011 10:28 UTC

Return-Path: <annevk@opera.com>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 513A921F8CDC for <websec@ietfa.amsl.com>; Mon, 24 Oct 2011 03:28:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.222
X-Spam-Level:
X-Spam-Status: No, score=-5.222 tagged_above=-999 required=5 tests=[AWL=-1.378, BAYES_00=-2.599, FRT_ADOBE2=2.455, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LAW9RTHnulWn for <websec@ietfa.amsl.com>; Mon, 24 Oct 2011 03:28:12 -0700 (PDT)
Received: from smtp.opera.com (smtp.opera.com [213.236.208.81]) by ietfa.amsl.com (Postfix) with ESMTP id 5531C21F8CDB for <websec@ietf.org>; Mon, 24 Oct 2011 03:28:11 -0700 (PDT)
Received: from annevk-macbookpro.local (EM114-48-14-90.pool.e-mobile.ne.jp [114.48.14.90]) (authenticated bits=0) by smtp.opera.com (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id p9OARxsB010439 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 24 Oct 2011 10:28:05 GMT
Content-Type: text/plain; charset="utf-8"; format="flowed"; delsp="yes"
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Larry Masinter <masinter@adobe.com>
References: <059.f8bc48a3163d95d888ee3a23a2ca7fb9@trac.tools.ietf.org> <4EA4DA67.3000502@gondrom.org> <CAJE5ia_7PO_g-0P9OvXsSazwkTkgWz6-Vs4N5tFvg=VygfFt5g@mail.gmail.com> <C68CB012D9182D408CED7B884F441D4D0605EFA3C4@nambxv01a.corp.adobe.com> <CAJE5ia_tq8D4wTc51rMV68N6KhUTMpeT_FTW6Xag7bT76tz5Xw@mail.gmail.com> <C68CB012D9182D408CED7B884F441D4D0605EFA3C6@nambxv01a.corp.adobe.com> <4EA5079B.9050700@it.aoyama.ac.jp> <C68CB012D9182D408CED7B884F441D4D0605EFA3CF@nambxv01a.corp.adobe.com>
Date: Mon, 24 Oct 2011 19:28:01 +0900
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Anne van Kesteren <annevk@opera.com>
Organization: Opera Software
Message-ID: <op.v3up0z1b64w2qv@annevk-macbookpro.local>
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D0605EFA3CF@nambxv01a.corp.adobe.com>
User-Agent: Opera Mail/11.51 (MacIntel)
Cc: "websec@ietf.org" <websec@ietf.org>
Subject: Re: [websec] #22: content-type sniffing should include charset sniffing
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Oct 2011 10:28:13 -0000

On Mon, 24 Oct 2011 15:47:46 +0900, Larry Masinter <masinter@adobe.com>  
wrote:
> The charset sniffing documentation in the HTML5 document isn't all that  
> complicated, anyway.

You have to run the HTML parser for it. It is orders of magnitude more  
complicated than MIME type sniffing.

Sniffing for an encoding always happens after you determine what the MIME  
type is though. So it depends on the outcome of the MIME type sniffing  
algorithm what kind of encoding sniffing you might want to do. E.g. you  
are not going to run XML encoding sniffing on a text/html resource.

It is probably confusing to refer to MIME type sniffing as Content-Type  
sniffing, because that is not really what it does.


-- 
Anne van Kesteren
http://annevankesteren.nl/