Re: [Asrg] A MIME-safe DKIM canonicalization idea

Alessandro Vesely <vesely@tana.it> Sun, 25 July 2010 19:09 UTC

Return-Path: <vesely@tana.it>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9DEE03A6827 for <asrg@core3.amsl.com>; Sun, 25 Jul 2010 12:09:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.305
X-Spam-Level:
X-Spam-Status: No, score=-2.305 tagged_above=-999 required=5 tests=[BAYES_40=-0.185, HELO_EQ_IT=0.635, HOST_EQ_IT=1.245, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HL2uDvTGKd+4 for <asrg@core3.amsl.com>; Sun, 25 Jul 2010 12:09:45 -0700 (PDT)
Received: from wmail.tana.it (www.tana.it [62.94.243.226]) by core3.amsl.com (Postfix) with ESMTP id 307283A6814 for <asrg@irtf.org>; Sun, 25 Jul 2010 12:09:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tana.it; s=test; t=1280085001; bh=/OpNVvA54z5UKasccZcwcIZ1Rfv3MeQAFYz+V/Tnuqw=; l=3703; h=Message-ID:Date:From:MIME-Version:To:CC:References:In-Reply-To: Content-Transfer-Encoding; b=H+1Z4IX9Fpy8w19Nxs5DnUCREVcBIyCbKbMwlt2QeA2uaSjJZnqNFRixR3nL+GEs2 JbrkKLINT8tL/V3uO4IwtQ3/Q4CpMyLAWVrDZDG/1ludNwhBsOr4llzFyIolnk1apZ IWYnxS6wpRZhTJTabd/9kPuzQCU/W/AT0MqYNypo=
Received: from 1-38-169-90.live.vodafone.in.38.1.in-addr.arpa (93-32-129-143.ip33.fastwebnet.it [93.32.129.143]) (AUTH: CRAM-MD5 515, TLS: TLS1.0,256bits,RSA_AES_256_CBC_SHA1) by wmail.tana.it with ESMTPSA; Sun, 25 Jul 2010 21:10:01 +0200 id 00000000005DC036.000000004C4C8C09.0000613C
Message-ID: <4C4C8C0C.4060807@tana.it>
Date: Sun, 25 Jul 2010 21:10:04 +0200
From: Alessandro Vesely <vesely@tana.it>
User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228)
MIME-Version: 1.0
To: "Murray S. Kucherawy" <msk@cloudmark.com>
References: <BB012BD379D7B046ABE1472D8093C61C01F2842439@EXCH-C2.corp.cloudmark.com> <4BE96BB9.4030005@cru.fr> <4BE9987F.80003@mail-abuse.org> <4BF1109F.1090301@cru.fr> <4BF12EB2.4040706@tana.it> <BB012BD379D7B046ABE1472D8093C61C01F688154B@EXCH-C2.corp.cloudmark.com>
In-Reply-To: <BB012BD379D7B046ABE1472D8093C61C01F688154B@EXCH-C2.corp.cloudmark.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Subject: Re: [Asrg] A MIME-safe DKIM canonicalization idea
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 19:09:46 -0000

Hi Murray,
> Just brainstorming here...

Yeah, we need some...  However, I guess the hosting-list field has 
been stormed away as a side-effect :-)  I cc ASRG --you probably 
meant that one.

> What about a new body canonicalization called "simple-mime" (and maybe "relaxed-mime", with the obvious difference) that does the following:
> 
> - adds to the signature a list of MIME types that defines the list of parts that got signed and in which order

This resembles using "parallel arrays" vs arrays of structures.  A 
couple of pros that I can see are

* it may have higher chances to be left alone in case the MIME
   structure is rewritten, and

* it is easier to insert a single header field, as the signing
   application doesn't have to coordinate with the library for each
   entity.

On the cons side, there are the generic disadvantages mentioned in 
http://en.wikipedia.org/wiki/Parallel_array (obscurity being the 
most relevant, IMHO.)

> - all parts are decoded prior to signing/verifying

This is necessary, as there are multiple ways to encode the same 
data, e.g. varying the line length in base64.

> For example, given a message like so:
> 
> 	From: whatever
> 	To: whatever
> 	Subject: whatever
> 	Date: whatever
> 	MIME-Version: 1.0
> 	Conent-Type: multipart/mixed; boundary="foo"
> 
> 	Preamble text
> 
> 	--foo
> 
> 	La la la
> 
> 	--foo
> 	Content-Type: application/octet-stream
> 	Content-Transfer-Encoding: base64
> 
> 	Base64-stuff-here
> 
> 	--foo
> 
> 	This text should not be signed
> 
> 	--foo--
> 
> 	Postamble text
> 
> In this example you would generate a DKIM signature in the usual way, except:
> 
> - the signature will contain "c=relaxed/simple-mime"
> - the binary part will be decoded to 8-bit before being passed to the hash
> - preamble and postamble text is ignored
> 
> Are there cases that need to be addressed that this wouldn't cover?

If an attachment is dropped --or replaced with some boilerplate 
about the types of allowed attachments-- it may still be useful to 
verify the integrity of the rest of the message.

I'm aware that some software process text/html parts --or 
alternatives thereof-- in peculiar ways.  I've spent a few time 
googling for message transformation practices, e.g. for gatewaying 
mail to netnews, but failed to find a clean list of "acceptable" 
changes that a message may undergo in transit.  IMHO, we'd need such 
kind of list in order to decide what to cover.  (Will the DKIM base 
interoperability report include anything like that, possibly?)

> If there's a need to allow addition of appended text/plain parts, you could add an "m=" tag that somehow encodes the list of MIME parts that are part of the signature, such as this based on the above:
> 
> m=1/multipart/mixed:2/text/plain:2/application/octet-stream:2/text/plain
>
> This indicates which parts got signed and what the nesting looks like, preventing some amount of reordering (though in the above you could swap the text/plain parts).

I've seen using tags like "1.2" to refer to nested entities; e.g. 
http://www.courier-mta.org/reformime.html

I note that you said "add a tag".  In facts, like many specs, RFC 
4871 says that "Unrecognized tags MUST be ignored."  That way, tags 
can be added without breaking compatibility with exiting software. 
However, the same is not true for canonicalizations: "c=relaxed/ 
simple-mime" cannot pass existing verifiers.  Instead, with 
something like "c=relaxed; l=0; m=...", we might be able to get a 
somewhat more compatible MIME-safe scheme.  Would such kind of 
(possibly temporary) hack be worth its nuisance value?