[Asrg] New DKIM canonicalization to avoid broken signatures

Alessandro Vesely <vesely@tana.it> Fri, 30 April 2010 17:32 UTC

Message-ID: <4BDB140D.2030804@tana.it>
Date: Fri, 30 Apr 2010 19:31:57 +0200
From: Alessandro Vesely <vesely@tana.it>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: opendkim-users@lists.opendkim.org
Subject: [Asrg] New DKIM canonicalization to avoid broken signatures
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>

It is well known that DKIM validation is too strong to be of practical 
use for everyday messages. Signatures break too often. One solution, 
in Dave's words, is to

   Merely use l=0 and hash only the From: field or perhaps From: and
   Date: or perhaps...
   http://mipassoc.org/pipermail/ietf-dkim/2010q2/013231.html

A better solution might be to use a new "mellowed" canonicalization 
for the body, and never sign MIME headers. The rough idea is to 
produce a body hash that is invariant under 99.99% retransmissions, 
but still characterizes the body content somewhat better than l=0.

I think Bayesian filtering has brought a good experience with mail 
tokenization, as a side product that can be leveraged to achieve this 
task without reinventing the wheel. Poor HTML coding may require extra 
tweaks, though.

We'd need to discuss the details, implement them, and test.

Anyone interested?

-------- Original Message --------
Date: 30 Apr 2010 12:33:00 -0000
From: John Levine <johnl@iecc.com>
To: ietf-dkim@mipassoc.org
Subject: Re: [ietf-dkim] Broken signatures,
   was Why mailing lists should strip them

In article <4BDA70B5.4090708@tana.it> you write:
>On 29/Apr/10 01:12, SM wrote:
>> The diversity
>> of the email environment is such that you cannot come up with a
>> "mellowed" canonicalization to cope with every possible change.
>
>Yet, it would seem that by, say, hashing just invariants of binary
>representations of the first entity, e.g. discarding its white space
>and punctuation, one may reach very high percentages of unbroken
>retransmission.

It sounds like you want to experiment with different canon schemes for 
DKIM, rather than the two that exist now.  Wouldn't that be more 
appropriate for ASRG?

R's,
John
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html

[Asrg] New DKIM canonicalization to avoid broken … Alessandro Vesely
Re: [Asrg] New DKIM canonicalization to avoid bro… Murray S. Kucherawy
Re: [Asrg] New DKIM canonicalization to avoid bro… Alessandro Vesely
Re: [Asrg] New DKIM canonicalization to avoid bro… David Nicol
Re: [Asrg] New DKIM canonicalization to avoid bro… Alessandro Vesely