Re: [websec] #19: Do not sniff PDF

Tobias Gondrom <tobias.gondrom@gondrom.org> Mon, 24 October 2011 08:18 UTC

DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gondrom.org; b=hKXXOmiKrcA/l6Ow0K8bnydcglWG8kRuJUDBhxwY2PbjEMUutBL1hoz3CPBoc5gk1osgoszVwK2l4L7mvN6mCg7+9WL1xi2AiMSbJ/xucqFVzU+ZHBng15k+UOGeaPNm; h=Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding;
Message-ID: <4EA51F11.7090504@gondrom.org>
Date: Mon, 24 Oct 2011 09:17:21 +0100
From: Tobias Gondrom <tobias.gondrom@gondrom.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110923 Thunderbird/7.0
MIME-Version: 1.0
To: masinter@adobe.com
References: <059.38de41cc08d30327b007c754bc555885@trac.tools.ietf.org> <4EA4D547.4030805@gondrom.org> <C68CB012D9182D408CED7B884F441D4D0605EFA3C1@nambxv01a.corp.adobe.com>
In-Reply-To: <C68CB012D9182D408CED7B884F441D4D0605EFA3C1@nambxv01a.corp.adobe.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: websec@ietf.org
Subject: Re: [websec] #19: Do not sniff PDF
Precedence: list

On 24/10/11 04:21, Larry Masinter wrote:
>> - in which way is it more certain that there is no mislabeled PDF than a mislabeled jpg or mislabeled rtf?
> I don't think this is relevant. There is likely mislabeled PDF. But I had specific feedback from implementors of PDF readers that sniffing from other content-type resulted in a worse situation than not sniffing. I don't have any information on jpg or rtf.
>
> Sniffing should only be done when it is justified by an improved user experience over not sniffing.
<hat="individual">
Fine by me. The browsers and OS started sniffing for exactly that reason 
in the first place, to improve user experience.

The reason why I am asking so specifically about the reasons for not 
doing PDF sniffing is the following:
In general I can imagine a number of scenarios where sniffing is 
disadvantageous (i.e. leads to security risks) for certain file types. 
The main threat with sniffing is it leads to false-positives being 
thrown into the application. Yet, it seems the browser vendors do so 
anyway.... - Which led us do this draft in the first place.

If we exclude one specific file-type from sniffing, there are two 
interesting points:
1. we should have a compelling explanation for the browsers/OS not to do 
so, so they will follow the RFC.
2. these reasons may likely also be true for other file-types. So 
looking at them, we might deduce that they hold true for other 
content-types as well. Which again would be very useful information.

>
> I think the obligation of evidence is "opt in": we should only sniff content when there is evidence of mislabeled content for which sniffing actually improves something, and the improvement outweighs other considerations.
>
>> - what about scenarios in which there is no content-type (e.g. ftp, filesystem), should in this case sniffing not be done?
> I didn't get any feedback on that. I don't know any workflows where valid PDF doesn't carry a file type label somehow (if only the file extension .pdf), so maybe sniffing based on file content itself doesn't matter.
>
> ((Maybe this is another issue? I just wonder if the algorithm for "no content-type" is the same, needs to be the same, as the algorithm for "content-type via HTTP".)

I can imagine that the cases "no content-type given" and "wrong 
content-type given" could be treated differently, but I am not sure 
about it.

>
>
>
>
> Larry
>

[websec] #19: Do not sniff PDF websec issue tracker
Re: [websec] #19: Do not sniff PDF Tobias Gondrom
Re: [websec] #19: Do not sniff PDF Larry Masinter
Re: [websec] #19: Do not sniff PDF Adam Barth
Re: [websec] #19: Do not sniff PDF Tobias Gondrom
Re: [websec] #19: Do not sniff PDF Julian Reschke