[Asrg] New DKIM canonicalization to avoid broken signatures

Alessandro Vesely <vesely@tana.it> Fri, 30 April 2010 17:32 UTC

Return-Path: <vesely@tana.it>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 02A263A6A02 for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 10:32:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.374
X-Spam-Level:
X-Spam-Status: No, score=-3.374 tagged_above=-999 required=5 tests=[AWL=-1.255, BAYES_50=0.001, HELO_EQ_IT=0.635, HOST_EQ_IT=1.245, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cd5E+-1p6oh4 for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 10:32:19 -0700 (PDT)
Received: from wmail.tana.it (www.tana.it [62.94.243.226]) by core3.amsl.com (Postfix) with ESMTP id 3910728C138 for <asrg@irtf.org>; Fri, 30 Apr 2010 10:32:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tana.it; s=test; t=1272648717; bh=UVeDA4gaaG+jVkjGLCAK5KXpLF2fn4HQrEXzK4LzUfc=; l=1931; h=Message-ID:Date:From:MIME-Version:To:CC: Content-Transfer-Encoding; b=I8vDK45HxYqI96n2LwT3V7pps3bdQUhEnAFiyM3i0MUA2yonBsnH2dlZ5giBETrgK bwLmcAYhLIVR2CxZVsgE/Z4hIaxV8zGimDjX43kMIfP09aqZrDVy8A9KG7HDoVlLlS pS3iMeRk4ymMMJtibnrWKFWU+nq2zjyiRQWwgh00=
Received: from [172.25.197.158] (pcale.tana [172.25.197.158]) (AUTH: CRAM-MD5 515, TLS: TLS1.0,256bits,RSA_AES_256_CBC_SHA1) by wmail.tana.it with ESMTPSA; Fri, 30 Apr 2010 19:31:57 +0200 id 00000000005DC02B.000000004BDB140D.00003FDE
Message-ID: <4BDB140D.2030804@tana.it>
Date: Fri, 30 Apr 2010 19:31:57 +0200
From: Alessandro Vesely <vesely@tana.it>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: opendkim-users@lists.opendkim.org
Subject: [Asrg] New DKIM canonicalization to avoid broken signatures
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2010 17:32:21 -0000

It is well known that DKIM validation is too strong to be of practical 
use for everyday messages. Signatures break too often. One solution, 
in Dave's words, is to

   Merely use l=0 and hash only the From: field or perhaps From: and
   Date: or perhaps...
   http://mipassoc.org/pipermail/ietf-dkim/2010q2/013231.html

A better solution might be to use a new "mellowed" canonicalization 
for the body, and never sign MIME headers. The rough idea is to 
produce a body hash that is invariant under 99.99% retransmissions, 
but still characterizes the body content somewhat better than l=0.

I think Bayesian filtering has brought a good experience with mail 
tokenization, as a side product that can be leveraged to achieve this 
task without reinventing the wheel. Poor HTML coding may require extra 
tweaks, though.

We'd need to discuss the details, implement them, and test.

Anyone interested?

-------- Original Message --------
Date: 30 Apr 2010 12:33:00 -0000
From: John Levine <johnl@iecc.com>
To: ietf-dkim@mipassoc.org
Subject: Re: [ietf-dkim] Broken signatures,
   was Why mailing lists should strip them

In article <4BDA70B5.4090708@tana.it> you write:
>On 29/Apr/10 01:12, SM wrote:
>> The diversity
>> of the email environment is such that you cannot come up with a
>> "mellowed" canonicalization to cope with every possible change.
>
>Yet, it would seem that by, say, hashing just invariants of binary
>representations of the first entity, e.g. discarding its white space
>and punctuation, one may reach very high percentages of unbroken
>retransmission.

It sounds like you want to experiment with different canon schemes for 
DKIM, rather than the two that exist now.  Wouldn't that be more 
appropriate for ASRG?

R's,
John
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html