Re: [Asrg] "Uncaught spam" research project

John Leslie <john@jlc.net> Fri, 30 April 2010 16:07 UTC

Date: Fri, 30 Apr 2010 12:06:58 -0400
From: John Leslie <john@jlc.net>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Message-ID: <20100430160658.GR14169@verdi>
References: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <18B53BA2A483AD45962AAD1397BE1325379ED80C30@UK-EXCHMBX1.green.sophos>
User-Agent: Mutt/1.4.1i
Subject: Re: [Asrg] "Uncaught spam" research project
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>

Martijn Grooten <martijn.grooten@virusbtn.com> wrote:
> 
> I intend to do a little project where I send a lot of spam[1] through
> a large number of mostly commercial[2] spam-filters (which I'm doing
> anyway) and then look at differences between spam that's caught by
> all filters, spam that is misidentified by one filter and spam that
> is misidentified by more than, say, 25% of the filters. All with the
> purpose of finding where spam filters can be improved.
> 
> Things I want to look at include
> - the location of sender's IP,
> - the character se,
> - the size of the body,
> - the presence of an inline image (or attachment in general),
> - SPF[3]
> - and whether the message is caught when it is resent after an
>   hour/day/week. (The latter to see if it's just a matter of
>   signatures/blacklists not updating fast enough.)
>
> Feel free to suggest more things to look at,

   I'd definitely record the AS of the sender's IP.

> or make general suggestions for the project. I'm also happy to hear
> the suggestion not to run (or publish) the research at all.

   Oh, definitely run it... The question is how much to obscure when
you publish it.

> I am aware that this could also give spammers some insight in which
> techniques are more likely to evade filters.

   Filters, hopefully, are a moving target; so whatever you publish
will be of limited use a week later.

> [1] Spam in the context of this email is spam sent to spam traps.
> So the real, proper spam, not the perhaps-not-100%-CAN-SPAM-compliant
> spam.

   It will be necessary to at least sample the "interesting" cases,
since spamtraps do get some non-spam...

> [2] Several of these make use of open source filters (e.g.
> SpamAssassin), so it's fair to say that most filters are covered.
> The setup does exclude techniques such as TCP fingerprinting or
> greylisting though.

   That's OK, though it might be interesting to compare those
techniques. BTW are you saying that if a (commercial?) spam-filter
uses those techniques, your setup will exclude them?

> [3] I would love to include DKIM, but I can only distinguish between
> does have and does not have a DKIM-signature; the redacting of
> emails to hide the original recipient makes me unable to decide
> whether a present signature was actually valid.

   I would assume that the interesting datum is whether the DKIM
signature was valid when received, and that the DKIM signature
itself needs to be excised.

--
John Leslie <john@jlc.net>

[Asrg] "Uncaught spam" research project Martijn Grooten
Re: [Asrg] "Uncaught spam" research project John Leslie
Re: [Asrg] "Uncaught spam" research project Martijn Grooten
Re: [Asrg] "Uncaught spam" research project Aaron Wolfe
Re: [Asrg] "Uncaught spam" research project Bill Cole
Re: [Asrg] "Uncaught spam" research project Martijn Grooten
Re: [Asrg] "Uncaught spam" research project Martijn Grooten