Re: [Asrg] "Uncaught spam" research project

John Leslie <john@jlc.net> Fri, 30 April 2010 16:07 UTC

Return-Path: <john@jlc.net>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A8F3228C105 for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 09:07:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.415
X-Spam-Level:
X-Spam-Status: No, score=-3.415 tagged_above=-999 required=5 tests=[AWL=0.584, BAYES_50=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yXEUO0b7EAJu for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 09:07:19 -0700 (PDT)
Received: from mailhost.jlc.net (mailhost.jlc.net [199.201.159.9]) by core3.amsl.com (Postfix) with ESMTP id 76EA028C132 for <asrg@irtf.org>; Fri, 30 Apr 2010 09:07:12 -0700 (PDT)
Received: by mailhost.jlc.net (Postfix, from userid 104) id C94AC33C2C; Fri, 30 Apr 2010 12:06:58 -0400 (EDT)
Date: Fri, 30 Apr 2010 12:06:58 -0400
From: John Leslie <john@jlc.net>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Message-ID: <20100430160658.GR14169@verdi>
References: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
User-Agent: Mutt/1.4.1i
Subject: Re: [Asrg] "Uncaught spam" research project
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2010 16:07:24 -0000

Martijn Grooten <martijn.grooten@virusbtn.com> wrote:
> 
> I intend to do a little project where I send a lot of spam[1] through
> a large number of mostly commercial[2] spam-filters (which I'm doing
> anyway) and then look at differences between spam that's caught by
> all filters, spam that is misidentified by one filter and spam that
> is misidentified by more than, say, 25% of the filters. All with the
> purpose of finding where spam filters can be improved.
> 
> Things I want to look at include
> - the location of sender's IP,
> - the character se,
> - the size of the body,
> - the presence of an inline image (or attachment in general),
> - SPF[3]
> - and whether the message is caught when it is resent after an
>   hour/day/week. (The latter to see if it's just a matter of
>   signatures/blacklists not updating fast enough.)
>
> Feel free to suggest more things to look at,

   I'd definitely record the AS of the sender's IP.

> or make general suggestions for the project. I'm also happy to hear
> the suggestion not to run (or publish) the research at all.

   Oh, definitely run it... The question is how much to obscure when
you publish it.

> I am aware that this could also give spammers some insight in which
> techniques are more likely to evade filters.

   Filters, hopefully, are a moving target; so whatever you publish
will be of limited use a week later.

> [1] Spam in the context of this email is spam sent to spam traps.
> So the real, proper spam, not the perhaps-not-100%-CAN-SPAM-compliant
> spam.

   It will be necessary to at least sample the "interesting" cases,
since spamtraps do get some non-spam...

> [2] Several of these make use of open source filters (e.g.
> SpamAssassin), so it's fair to say that most filters are covered.
> The setup does exclude techniques such as TCP fingerprinting or
> greylisting though.

   That's OK, though it might be interesting to compare those
techniques. BTW are you saying that if a (commercial?) spam-filter
uses those techniques, your setup will exclude them?

> [3] I would love to include DKIM, but I can only distinguish between
> does have and does not have a DKIM-signature; the redacting of
> emails to hide the original recipient makes me unable to decide
> whether a present signature was actually valid.

   I would assume that the interesting datum is whether the DKIM
signature was valid when received, and that the DKIM signature
itself needs to be excised.

--
John Leslie <john@jlc.net>