[websec] Minor feedback on draft-ietf-websec-mime-sniff-03

Willy Tarreau <w@1wt.eu> Sun, 15 January 2012 19:51 UTC

Return-Path: <w@1wt.eu>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B4ECA21F847F for <websec@ietfa.amsl.com>; Sun, 15 Jan 2012 11:51:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.235
X-Spam-Level:
X-Spam-Status: No, score=-6.235 tagged_above=-999 required=5 tests=[AWL=-4.192, BAYES_00=-2.599, HELO_IS_SMALL6=0.556]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WPzZn8hJvUOB for <websec@ietfa.amsl.com>; Sun, 15 Jan 2012 11:51:37 -0800 (PST)
Received: from 1wt.eu (1wt.eu [62.212.114.60]) by ietfa.amsl.com (Postfix) with ESMTP id D516921F847C for <websec@ietf.org>; Sun, 15 Jan 2012 11:51:36 -0800 (PST)
Received: (from willy@localhost) by mail.home.local (8.14.4/8.14.4/Submit) id q0FJpKLW007363; Sun, 15 Jan 2012 20:51:20 +0100
Date: Sun, 15 Jan 2012 20:51:20 +0100
From: Willy Tarreau <w@1wt.eu>
To: ietf@adambarth.com, ian@hixie.ch
Message-ID: <20120115195120.GG32205@1wt.eu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Mailman-Approved-At: Sun, 15 Jan 2012 19:21:26 -0800
Cc: websec@ietf.org
Subject: [websec] Minor feedback on draft-ietf-websec-mime-sniff-03
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Jan 2012 19:51:37 -0000

Hello Adam, Ian,

Today I came across your draft "draft-ietf-websec-mime-sniff-03", and
noticed the point below :

   2.  If the octets were fetched via HTTP and there is an HTTP Content-
       Type header field and the value of the last such header field has
       octets that *exactly* match the octets contained in one of the
       following lines:

      +-------------------------------+--------------------------------+
      | Bytes in Hexadecimal          | Textual Representation         |
      +-------------------------------+--------------------------------+
      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain                     |
      +-------------------------------+--------------------------------+
      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 |
      | 3b 20 63 68 61 72 73 65 74 3d |                                |
      | 49 53 4f 2d 38 38 35 39 2d 31 |                                |
     .../...

I was having a doubt about spaces being optional around the semi-colon,
so I just checked and indeed we have OWS before and after it :

   http://www.ietf.org/id/draft-ietf-httpbis-p3-payload-18.txt

   2.3.  Media Types

   HTTP uses Internet Media Types [RFC2046] in the Content-Type
   (Section 6.8) and Accept (Section 6.1) header fields in order to
   provide open and extensible data typing and type negotiation.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   attribute/value pairs.

     parameter      = attribute "=" value
     attribute      = token
     value          = word

Also, it is said here that quotes are allowed around the parameter
value :

   A parameter value that matches the token production can be
   transmitted as either a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.

So examples below are completely valid :

   Content-type: text/plain;charset="ISO-8859-1"

   Content-type: text/plain   ;  charset=ISO-8859-1

   Content-type: text/plain ;
         charset="ISO-8859-1"

Thus the byte matching can only apply to the tokens and values. I think the
safest thing to do would be to refer to the HTTP spec to define the header
format then suggest byte matches for each fields, for instance :

       If the octets were fetched via HTTP and there is an HTTP Content-
       Type header field and the value of the last such header *exactly*
       matches one of the media-types below, then the sniffed-type is
       defined as the concatenation of the unquoted matching parts :

       media-type = type "/" subtype *( OWS ";" OWS parameter )
       sniffed-type = type "/" subtype 1*( "; " attribute "=" value )

       All accepted media-types must *exactly* match :
          - type    = "text" (hex 74 65 78 74)
          - subtype = "plain" (hex 70 6c 61 69 6e)

       If a parameter is present, its attribute must be "charset"
       (hex 63 68 61 72 73 65 74) and the value must be one of :
          - "ISO-8859-1" (hex 49 53 4f 2d 38 38 35 39 2d 31)
          - "iso-8859-1" (hex 69 73 6f 2d 38 38 35 39 2d 31)
          - "UTF-8"      (hex 55 54 46 2d 38)

Please also note that HTTP indicates that some attributes accept a
case-insensitive value. I have not yet found in the spec if "charset"
accepts a case-insensitive value, but given that you identified two
possible cases for "iso-8859-1", it is likely that "charset" falls into
this case.

Best regards,
Willy