comments on draft-abarth-mime-sniff-03
Larry Masinter <masinter@adobe.com> Wed, 20 January 2010 22:42 UTC
Return-Path: <masinter@adobe.com>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A758828C0D8 for <apps-discuss@core3.amsl.com>; Wed, 20 Jan 2010 14:42:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.599
X-Spam-Level:
X-Spam-Status: No, score=-8.599 tagged_above=-999 required=5 tests=[AWL=-2.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZbZ9TwPxZ5-s for <apps-discuss@core3.amsl.com>; Wed, 20 Jan 2010 14:42:17 -0800 (PST)
Received: from exprod6og115.obsmtp.com (exprod6og115.obsmtp.com [64.18.1.35]) by core3.amsl.com (Postfix) with ESMTP id 24A8C3A6877 for <apps-discuss@ietf.org>; Wed, 20 Jan 2010 14:42:17 -0800 (PST)
Received: from source ([192.150.11.134]) by exprod6ob115.postini.com ([64.18.5.12]) with SMTP ID DSNKS1eGwczm9SWCr3tU/5Fw6bHZ1EjV+iBf@postini.com; Wed, 20 Jan 2010 14:42:13 PST
Received: from inner-relay-1.corp.adobe.com ([153.32.1.51]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id o0KMYP18029877; Wed, 20 Jan 2010 14:34:25 -0800 (PST)
Received: from nahub01.corp.adobe.com (nahub01.corp.adobe.com [10.8.189.97]) by inner-relay-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id o0KMfjf9008950; Wed, 20 Jan 2010 14:41:54 -0800 (PST)
Received: from nambxv01a.corp.adobe.com ([10.8.189.95]) by nahub01.corp.adobe.com ([10.8.189.97]) with mapi; Wed, 20 Jan 2010 14:41:45 -0800
From: Larry Masinter <masinter@adobe.com>
To: Adam Barth <w3c@adambarth.com>, Ian Hickson <ian@hixie.ch>
Date: Wed, 20 Jan 2010 14:41:43 -0800
Subject: comments on draft-abarth-mime-sniff-03
Thread-Topic: comments on draft-abarth-mime-sniff-03
Thread-Index: AcqaIb0iX+pEgQYkQxG7tIvGiF6OfQ==
Message-ID: <C68CB012D9182D408CED7B884F441D4D5FDE79@nambxv01a.corp.adobe.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Jan 2010 22:42:18 -0000
Although this document was discussed a while back in httpbis, it seems to address non HTTP sniffing as well. I'll send notes to http-wg, html-wg, www-tag to move discussion here (as advised). Several W3C specs are advancing using this document as a normative reference (including HTML), so getting this reviewed should be timely. The document should note the venue for discussion. I admit I haven't chased down the archives on the previous discussion of this, and maybe I've gotten some things wrong, but the current draft is already 4 months old. (Compared to other drafts the authors are updating regularly.) ============================= All or nothing: Whenever possible, user agents should avoid employing a content sniffing algorithm. The normative advice seems to apply to roles other than "user agents"; whether or not there is a user isn't really relevant. Secondly, it's not clear the scope of the "should avoid" or whether this is really intended to be normative. I'll assume that it is. However, if a user agent does employ a content sniffing algorithm, the user agent should use the algorithm in this document exactly The algorithm isn't exact (more below). And think this is the wrong normative advice. Implementations SHOULD NOT sniff labeled content, and MUST NOT sniff any way beyond the ways explicitly allowed here, and when they do sniff, should opt into sniffing on a situation by situation, type by type basis. The application of sniffing should be reserved for situations where the implementor has sufficient reason to believe that there is sufficient existing deployed content that would not function as expected if sniffing were not implemented, and these situations should be encouraged to be as fine a granularity as the implementor needs. This normative advice should not require updating as the deployed content evolves. ============================= TERMINOLOGY "resource" This document seems to have the same use of "resource" to talk about what is fetched and not just the source from which it is fetched, as discussed in HTML-WG at length: http://www.w3.org/html/wg/tracker/issues/81 For example For HTTP resources, only the last Content-Type HTTP header, if any, contributes any type information; the official type of the resource is then the value of that header, interpreted as described by the HTTP specifications. But HTTP specification cited says is: The "Content-Type" entity-header field indicates the media type of the entity-body. not applying "media type" to the resource but as metadata of the response. For example, instead of using "resource" inconsistently, this sentence might say: For HTTP resources ... the official media type of the representation is obtained from the value ... since of course, the same HTTP resource might return different representations depending on accept headers. ========================================================== Malformed content-type information vs. missing content-type I think the cases where content-type information supplied is malformed should be treated differently than the cases where content-type information is missing. In particular, it should be possible and allowed to raise an error if content-type information is malformed, even when sniffing is performed on well-formed, but incorrect, content-type. That is, the "opt-in" behavior could allow an implementation to sniff when no content-type, but not sniff when there is a malformed content-type. ============================================================== Contextual application There are many different contexts of use of MIME labeled material in the web, and the opt-in behavior for sniffing should be allowed to be contextualized. For example, one might opt in to sniffing for <img src=...> but opt out of sniffing a script or rel link and opt in when doing a GET on the main body of a web page. ================================================================ multiple content-type headers are malformed: > For HTTP resources, only the last Content-Type HTTP header, > if any, contributes any type information ============================================================= A message with more than one content-type header should be treated as malformed. If the Content-Type HTTP header is present but the value of the last such header cannot be interpreted as described by the HTTP specifications (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') character), then the resource has no type information (even if there are multiple Content-Type HTTP headers and one of the other ones is syntactically correct). ============================================= The "algorithm for extracting an encoding ...." Note: The above algorithm is a willful violation of the HTTP specification. [RFC2616] The word "encoding" is confusing, since HTTP uses content-encoding and this is actually talking about what HTTP calls "charset". The HTML algorithm for 'sniffing' a charset when one is explicitly supplied but is changed by HTML -- that isn't part of this draft? It is HTML only? It seems to be part of the same family of "sniffing" and should be reviewed with the same scrutiny, moving it either into this document or to a parallel one. (Perhaps someone can raise this as a bug in the HTML spec.) The nature of the "willful violation" (I.e., how it is different) and the justification for the "willful violation" should be included. I can't fathom any justification for it. ======================== file extensions: Note: It is essential that file extensions are not used for determining the media type for resources fetched over HTTP because file extensions can often by supplied by malicious parties. "Often" is dubious. How can file extensions be supplied more often than content-type headers? What is the security threat? For resources fetched over most other protocols, e.g. FTP, there is no type information. "most other protocols" is imprecise. Does this apply to imap? data:? There seems to be some attempt to define this for RSS? Is it the protocol or the scheme that determines the method, or both? I also don't think this matches implementations for FTP, aren't file extension used frequently to determine content-type? ========================= Waiting (section 5) 1. The user agent MAY wait for 512 or more bytes of the resource to be available. 2. Let /stream length/ be the smaller of either 512 or the number of bytes already available. *This* document doesn't define "available" or what it means to "wait" or really have much to do with what most of this document is about. I think the choice here is that type sniffing apparently can work on whatever prefix the agent chooses to give it, from anywhere from 0 to 512 bytes. If it chooses to send 0 bytes, this is equivalent for turning off MIME sniffing completely (since there is nothing to sniff). I'd think that the behavior of "how to sniff" should start out with what the inputs are (the first N bytes of some data from a response). Note that the n bytes are not the bytes of the actual traffic over the wire, but the bytes resulting from undoing any content-encoded. ============ Adding new types: " User agents may support additional types if desired, by implicitly adding to the above table." This makes no sense and is a disaster for interoperability. The only reason why we have badly deployed content is that user agents "sniffed" new types. If the deployed infrastructure doesn't deploy new kinds of sniffing, then people won't distribute content. This is a harmful escalating path and encouraging implementations to add to the table is disruptive and harmful. Don't do it.
- comments on draft-abarth-mime-sniff-03 Larry Masinter
- Re: comments on draft-abarth-mime-sniff-03 Adam Barth
- RE: comments on draft-abarth-mime-sniff-03 Larry Masinter
- Re: comments on draft-abarth-mime-sniff-03 Adam Barth
- RE: comments on draft-abarth-mime-sniff-03 Ian Hickson
- Re: comments on draft-abarth-mime-sniff-03 Ian Hickson
- Re: comments on draft-abarth-mime-sniff-03 Julian Reschke
- Re: comments on draft-abarth-mime-sniff-03 Adam Barth
- Re: comments on draft-abarth-mime-sniff-03 Adam Barth
- Re: comments on draft-abarth-mime-sniff-03 Adam Barth