Re: [websec] #19: Do not sniff PDF

Tobias Gondrom <tobias.gondrom@gondrom.org> Mon, 24 October 2011 08:18 UTC

Return-Path: <tobias.gondrom@gondrom.org>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D528D21F8C07 for <websec@ietfa.amsl.com>; Mon, 24 Oct 2011 01:18:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -96.532
X-Spam-Level:
X-Spam-Status: No, score=-96.532 tagged_above=-999 required=5 tests=[AWL=0.246, BAYES_00=-2.599, FH_HELO_EQ_D_D_D_D=1.597, HELO_DYNAMIC_IPADDR=2.426, HELO_EQ_DE=0.35, HELO_MISMATCH_DE=1.448, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cxw3qHDsiogE for <websec@ietfa.amsl.com>; Mon, 24 Oct 2011 01:18:27 -0700 (PDT)
Received: from lvps83-169-7-107.dedicated.hosteurope.de (www.gondrom.org [83.169.7.107]) by ietfa.amsl.com (Postfix) with ESMTP id 93D3321F8B1C for <websec@ietf.org>; Mon, 24 Oct 2011 01:18:26 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gondrom.org; b=hKXXOmiKrcA/l6Ow0K8bnydcglWG8kRuJUDBhxwY2PbjEMUutBL1hoz3CPBoc5gk1osgoszVwK2l4L7mvN6mCg7+9WL1xi2AiMSbJ/xucqFVzU+ZHBng15k+UOGeaPNm; h=Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding;
Received: (qmail 26598 invoked from network); 24 Oct 2011 10:17:22 +0200
Received: from unknown (HELO ?10.5.5.61?) (61.8.220.69) by www.gondrom.org with (DHE-RSA-AES256-SHA encrypted) SMTP; 24 Oct 2011 10:17:22 +0200
Message-ID: <4EA51F11.7090504@gondrom.org>
Date: Mon, 24 Oct 2011 09:17:21 +0100
From: Tobias Gondrom <tobias.gondrom@gondrom.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110923 Thunderbird/7.0
MIME-Version: 1.0
To: masinter@adobe.com
References: <059.38de41cc08d30327b007c754bc555885@trac.tools.ietf.org> <4EA4D547.4030805@gondrom.org> <C68CB012D9182D408CED7B884F441D4D0605EFA3C1@nambxv01a.corp.adobe.com>
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D0605EFA3C1@nambxv01a.corp.adobe.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: websec@ietf.org
Subject: Re: [websec] #19: Do not sniff PDF
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Oct 2011 08:18:28 -0000

On 24/10/11 04:21, Larry Masinter wrote:
>> - in which way is it more certain that there is no mislabeled PDF than a mislabeled jpg or mislabeled rtf?
> I don't think this is relevant. There is likely mislabeled PDF. But I had specific feedback from implementors of PDF readers that sniffing from other content-type resulted in a worse situation than not sniffing. I don't have any information on jpg or rtf.
>
> Sniffing should only be done when it is justified by an improved user experience over not sniffing.
<hat="individual">
Fine by me. The browsers and OS started sniffing for exactly that reason 
in the first place, to improve user experience.

The reason why I am asking so specifically about the reasons for not 
doing PDF sniffing is the following:
In general I can imagine a number of scenarios where sniffing is 
disadvantageous (i.e. leads to security risks) for certain file types. 
The main threat with sniffing is it leads to false-positives being 
thrown into the application. Yet, it seems the browser vendors do so 
anyway.... - Which led us do this draft in the first place.

If we exclude one specific file-type from sniffing, there are two 
interesting points:
1. we should have a compelling explanation for the browsers/OS not to do 
so, so they will follow the RFC.
2. these reasons may likely also be true for other file-types. So 
looking at them, we might deduce that they hold true for other 
content-types as well. Which again would be very useful information.

>
> I think the obligation of evidence is "opt in": we should only sniff content when there is evidence of mislabeled content for which sniffing actually improves something, and the improvement outweighs other considerations.
>
>> - what about scenarios in which there is no content-type (e.g. ftp, filesystem), should in this case sniffing not be done?
> I didn't get any feedback on that. I don't know any workflows where valid PDF doesn't carry a file type label somehow (if only the file extension .pdf), so maybe sniffing based on file content itself doesn't matter.
>
> ((Maybe this is another issue? I just wonder if the algorithm for "no content-type" is the same, needs to be the same, as the algorithm for "content-type via HTTP".)

I can imagine that the cases "no content-type given" and "wrong 
content-type given" could be treated differently, but I am not sure 
about it.

>
>
>
>
> Larry
>