Re: [Asrg] "Uncaught spam" research project
John Leslie <john@jlc.net> Fri, 30 April 2010 16:07 UTC
Return-Path: <john@jlc.net>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A8F3228C105 for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 09:07:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.415
X-Spam-Level:
X-Spam-Status: No, score=-3.415 tagged_above=-999 required=5 tests=[AWL=0.584, BAYES_50=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yXEUO0b7EAJu for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 09:07:19 -0700 (PDT)
Received: from mailhost.jlc.net (mailhost.jlc.net [199.201.159.9]) by core3.amsl.com (Postfix) with ESMTP id 76EA028C132 for <asrg@irtf.org>; Fri, 30 Apr 2010 09:07:12 -0700 (PDT)
Received: by mailhost.jlc.net (Postfix, from userid 104) id C94AC33C2C; Fri, 30 Apr 2010 12:06:58 -0400 (EDT)
Date: Fri, 30 Apr 2010 12:06:58 -0400
From: John Leslie <john@jlc.net>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Message-ID: <20100430160658.GR14169@verdi>
References: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
User-Agent: Mutt/1.4.1i
Subject: Re: [Asrg] "Uncaught spam" research project
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2010 16:07:24 -0000
Martijn Grooten <martijn.grooten@virusbtn.com> wrote: > > I intend to do a little project where I send a lot of spam[1] through > a large number of mostly commercial[2] spam-filters (which I'm doing > anyway) and then look at differences between spam that's caught by > all filters, spam that is misidentified by one filter and spam that > is misidentified by more than, say, 25% of the filters. All with the > purpose of finding where spam filters can be improved. > > Things I want to look at include > - the location of sender's IP, > - the character se, > - the size of the body, > - the presence of an inline image (or attachment in general), > - SPF[3] > - and whether the message is caught when it is resent after an > hour/day/week. (The latter to see if it's just a matter of > signatures/blacklists not updating fast enough.) > > Feel free to suggest more things to look at, I'd definitely record the AS of the sender's IP. > or make general suggestions for the project. I'm also happy to hear > the suggestion not to run (or publish) the research at all. Oh, definitely run it... The question is how much to obscure when you publish it. > I am aware that this could also give spammers some insight in which > techniques are more likely to evade filters. Filters, hopefully, are a moving target; so whatever you publish will be of limited use a week later. > [1] Spam in the context of this email is spam sent to spam traps. > So the real, proper spam, not the perhaps-not-100%-CAN-SPAM-compliant > spam. It will be necessary to at least sample the "interesting" cases, since spamtraps do get some non-spam... > [2] Several of these make use of open source filters (e.g. > SpamAssassin), so it's fair to say that most filters are covered. > The setup does exclude techniques such as TCP fingerprinting or > greylisting though. That's OK, though it might be interesting to compare those techniques. BTW are you saying that if a (commercial?) spam-filter uses those techniques, your setup will exclude them? > [3] I would love to include DKIM, but I can only distinguish between > does have and does not have a DKIM-signature; the redacting of > emails to hide the original recipient makes me unable to decide > whether a present signature was actually valid. I would assume that the interesting datum is whether the DKIM signature was valid when received, and that the DKIM signature itself needs to be excised. -- John Leslie <john@jlc.net>
- [Asrg] "Uncaught spam" research project Martijn Grooten
- Re: [Asrg] "Uncaught spam" research project John Leslie
- Re: [Asrg] "Uncaught spam" research project Martijn Grooten
- Re: [Asrg] "Uncaught spam" research project Aaron Wolfe
- Re: [Asrg] "Uncaught spam" research project Bill Cole
- Re: [Asrg] "Uncaught spam" research project Martijn Grooten
- Re: [Asrg] "Uncaught spam" research project Martijn Grooten