Comments on content sniffing algorithm draft-abarth-mime-sniff-03
David Booth <david@dbooth.org> Thu, 21 January 2010 00:54 UTC
Return-Path: <david@dbooth.org>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id ABAFF3A6859 for <apps-discuss@core3.amsl.com>; Wed, 20 Jan 2010 16:54:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H9y48827kwdR for <apps-discuss@core3.amsl.com>; Wed, 20 Jan 2010 16:54:02 -0800 (PST)
Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by core3.amsl.com (Postfix) with SMTP id A15743A6828 for <apps-discuss@ietf.org>; Wed, 20 Jan 2010 16:54:02 -0800 (PST)
Received: (qmail 34518 invoked from network); 21 Jan 2010 00:53:57 -0000
Received: from 184.49.60.155 (HELO ?184.49.60.155?) (184.49.60.155) by relay03.pair.com with SMTP; 21 Jan 2010 00:53:57 -0000
X-pair-Authenticated: 184.49.60.155
Subject: Comments on content sniffing algorithm draft-abarth-mime-sniff-03
From: David Booth <david@dbooth.org>
To: apps-discuss <apps-discuss@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Date: Wed, 20 Jan 2010 19:53:56 -0500
Message-ID: <1264035236.23097.29453.camel@dbooth-laptop>
Mime-Version: 1.0
X-Mailer: Evolution 2.28.1
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Thu, 21 Jan 2010 08:14:00 -0800
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jan 2010 00:54:03 -0000
Some comments on http://tools.ietf.org/html/draft-abarth-mime-sniff-03 1. This point is not a criticism of the sniffing algorithm proposed, but rather a comment on the way that the problem is described. I don't have a specific suggestion for rewording, so perhaps you should just take this first comment as food for thought. It bothers me to see HTML being called a "high-privilege media type" . . . "(and thus privileged to execute any scripts contained therein)". It isn't the basic HTML that is dangerous, it is JavaScript that has been embedded in HTML that is dangerous, just as Flash, ActiveX or any other scripting language may be embedded. Basic HTML is relatively safe. HTML is really just embedded in text, just as JavaScript is embedded in HTML, yet we don't think of plain text as a high-privilege media type because our content types distinguish plain text from text that "embeds" HTML. But they do not distinguish plain HTML from HTML that embeds JavaScript or other scripting languages. This forces us to paint plain HTML with the same security brush as we paint JavaScript, and this seems wrong. 2. Section 2 says "The algorithm for extracting an encoding from a Content-Type, given a string s, is as follows." But what exactly is string s? Where is s bound? Is s the Content-Type? 3. Section 3.1 says "the last step in this set of steps". I think it would be slightly clearer to say "step 9", though this is perhaps a minor stylistic issue. 4. There are several uses of the word "resource" that should be "entity body", as this is the term used in RFC2616 section 14.17: http://tools.ietf.org/html/rfc2616#section-14.17 5. Section 3.3 says "the last such header has bytes that exactly match". I suggest changing the word "has" to be more specific, as "has" often means "contains", and I do not think "contains" is what you meant. (For example, one might say "File x has the word 'Foo' in it".) 6. Section 3 defines the term /sniffed type/, which is either sniffed or the /official type/. This is a little misleading. I suggest distinguishing three terms: /sniffed type/, which really is the sniffed type; /official type/, as already defined; and /effective type/, which is determined by your algorithm based on either /sniffed type/ or /effective type/. 7. Section 3.4 says "jump to the unknown type step below", but it is not clear what step you mean. Since the steps are numbered, it would be better to give the target step number instead of "the unknown type step below". [After reading further] Oh, it looks like you may have meant to refer to Section 5. 8. Section 4.2 says "already available". After how much time or after what event? 9. Section 4.4. mentions "binary data bytes". Where is this term defined? I.e., how exactly are binary data bytes identified? -- David Booth, Ph.D. Cleveland Clinic (contractor) Opinions expressed herein are those of the author and do not necessarily reflect those of Cleveland Clinic.