[Asrg] "Uncaught spam" research project

Martijn Grooten <martijn.grooten@virusbtn.com> Fri, 30 April 2010 14:37 UTC

Return-Path: <martijn.grooten@virusbtn.com>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 306093A695D for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 07:37:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level:
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PcCyPfuZjD8G for <asrg@core3.amsl.com>; Fri, 30 Apr 2010 07:37:38 -0700 (PDT)
Received: from mx1.sophos.com (mx1.sophos.com [195.166.81.52]) by core3.amsl.com (Postfix) with ESMTP id 3E6003A6925 for <asrg@irtf.org>; Fri, 30 Apr 2010 07:37:35 -0700 (PDT)
Received: from mx1.sophos.com (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id 81520E78005 for <asrg@irtf.org>; Fri, 30 Apr 2010 15:37:20 +0100 (BST)
Received: from uk-exch1.green.sophos (uk-exch1.green.sophos [10.100.199.16]) by mx1.sophos.com (Postfix) with ESMTP id 4E3EBE78002 for <asrg@irtf.org>; Fri, 30 Apr 2010 15:37:20 +0100 (BST)
Received: from UK-EXCHMBX1.green.sophos ([fe80:0000:0000:0000:e1bd:d3c1:23.222.229.221]) by uk-exch1.green.sophos ([192.168.5.67]) with mapi; Fri, 30 Apr 2010 15:37:20 +0100
From: Martijn Grooten <martijn.grooten@virusbtn.com>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Date: Fri, 30 Apr 2010 15:37:18 +0100
Thread-Topic: "Uncaught spam" research project
Thread-Index: AcrocqLOJBXSuwp7QNO7HdL1Zekqgg==
Message-ID: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
Accept-Language: en-US, en-GB
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-GB
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: [Asrg] "Uncaught spam" research project
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2010 14:37:39 -0000

I intend to do a little project where I send a lot of spam[1] through a large number of mostly commercial[2] spam-filters (which I'm doing anyway) and then look at differences between spam that's caught by all filters, spam that is misidentified by one filter and spam that is misidentified by more than, say, 25% of the filters. All with the purpose of finding where spam filters can be improved.

Things I want to look at include the location of sender's IP, the character se, the size of the body, the presence of an inline image (or attachment in general), SPF[3] and whether the message is caught when it is resent after an hour/day/week. (The latter to see if it's just a matter of signatures/blacklists not updating fast enough.) Feel free to suggest more things to look at, or make general suggestions for the project. I'm also happy to hear the suggestion not to run (or publish) the research at all. I am aware that this could also give spammers some insight in which techniques are more likely to evade filters.

Thanks.

Martijn.

[1] Spam in the context of this email is spam sent to spam traps. So the real, proper spam, not the perhaps-not-100%-CAN-SPAM-compliant spam.

[2] Several of these make use of open source filters (e.g. SpamAssassin), so it's fair to say that most filters are covered. The setup does exclude techniques such as TCP fingerprinting or greylisting though.

[3] I would love to include DKIM, but I can only distinguish between does have and does not have a DKIM-signature; the redacting of emails to hide the original recipient makes me unable to decide whether a present signature was actually valid.


Virus Bulletin Ltd, The Pentagon, Abingdon, OX14 3YP, England.
Company Reg No: 2388295. VAT Reg No: GB 532 5598 33.