Re: [Asrg] Two ways to look at spam
Andrew Akehurst <A.D.Akehurst-99@student.lboro.ac.uk> Wed, 02 July 2003 22:13 UTC
Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA24499 for <asrg-archive@odin.ietf.org>; Wed, 2 Jul 2003 18:13:38 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19Xpqi-0001I1-JE for asrg-archive@odin.ietf.org; Wed, 02 Jul 2003 18:13:14 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h62MDCdf004951 for asrg-archive@odin.ietf.org; Wed, 2 Jul 2003 18:13:12 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19Xpqi-0001Hl-FW for asrg-web-archive@optimus.ietf.org; Wed, 02 Jul 2003 18:13:12 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA24369; Wed, 2 Jul 2003 18:13:05 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19Xpqd-0002r2-00; Wed, 02 Jul 2003 18:13:07 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19Xpqd-0002qz-00; Wed, 02 Jul 2003 18:13:07 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19XpqX-0001EW-A6; Wed, 02 Jul 2003 18:13:01 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19XpqI-0001Ds-MV for asrg@optimus.ietf.org; Wed, 02 Jul 2003 18:12:46 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA24276 for <asrg@ietf.org>; Wed, 2 Jul 2003 18:12:40 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19XpqE-0002p3-00 for asrg@ietf.org; Wed, 02 Jul 2003 18:12:42 -0400
Received: from bill.lut.ac.uk ([158.125.1.193]) by ietf-mx with esmtp (Exim 4.12) id 19XpqC-0002ok-00 for asrg@ietf.org; Wed, 02 Jul 2003 18:12:40 -0400
Received: from [158.125.1.117] (helo=studentpop1.lboro.ac.uk ident=root) by bill.lut.ac.uk with esmtp (Exim 4.14) id 19Xpq5-0007AT-Ev for asrg@ietf.org; Wed, 02 Jul 2003 23:12:33 +0100
Received: from [158.125.1.122] (helo=bod.lut.ac.uk) by studentpop1.lboro.ac.uk with esmtp (Exim 3.13 #1) id 19Xpq5-00011q-00 for asrg@ietf.org; Wed, 02 Jul 2003 23:12:33 +0100
Received: from apache by bod.lut.ac.uk with local (Exim 4.12) id 19Xpq5-0001xS-00 for asrg@ietf.org; Wed, 02 Jul 2003 23:12:33 +0100
To: asrg@ietf.org
Subject: Re: [Asrg] Two ways to look at spam
Message-ID: <1057183953.3f0358d15c92d@student-webmail.lboro.ac.uk>
From: Andrew Akehurst <A.D.Akehurst-99@student.lboro.ac.uk>
References: <20030702160005.7369.17123.Mailman@www1.ietf.org>
In-Reply-To: <20030702160005.7369.17123.Mailman@www1.ietf.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
User-Agent: IMP/PHP IMAP webmail program 2.2.8
X-Originating-IP: 172.185.57.212
X-Spam-Score: -19.2 (-------------------)
X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *19Xpq5-0007AT-Ev*PHgncWXEsHk*
X-Lboro-Filtered: bill.lut.ac.uk, Wed, 02 Jul 2003 23:12:33 +0100
Content-Transfer-Encoding: 8bit
Sender: asrg-admin@ietf.org
Errors-To: asrg-admin@ietf.org
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/pipermail/asrg/>
Date: Wed, 02 Jul 2003 22:12:33 +0000
Content-Transfer-Encoding: 8bit
Content-Transfer-Encoding: 8bit
> "Jon Kyme" <jrk@merseymail.com> writes: > >> The former would be useful, but I'm doubtful that it would have much > >> of an impact on spam. The latter seems to me to rely on the sender > >> accurately tagging their messages according to content---possibly > >> would happen often enough that it would be worthwhile, but I'm not > >> sure that it would. > I'm not sure about this, there seems to me (at the most general) to > be only one class of things that need be asserted in a consent > expression: How this message is classified by some engine. Your > second class seems to me to be the sort of thing that's routinely > handled by content-filters (imperfectly, I grant you). > > So rather than saying: > 1. message has html => noconsent > 2. message mentions 'septic tank enhancement' => consent > 3. message is from grandma => consent > 4. message has valid consent token => consent > 5. message has blacklisted source IP => noconsent > etc ... > > You might say something more like > positive_test(name_of_engine_1, engineargs, message) => noconsent > positive_test(name_of_engine_2, engineargs, message) => consent > etc... > I guess someone could standardise this (using whatever language they > wanted), and there are some kinds of content filter (probably quite > simple things---the sort of thing that SIEVE can do, say) that we > could standardise on. That might be useful. Here's a little idea I had which was inspired by the above. I'd like to apologise for pre-empting the forthcoming consent framework document (was it Yakov who was working on that?) but I wanted to write all of this down before I forget. Feel free to tear it to shreds, although constructive suggestions for improvement would also be nice. :-) One approach that seems to work well for packet filtering is the iptables format of rules used by the Linux Netfilter module (http://www.netfilter.org). Perhaps a similar structure could be applied here to each e-mail message? You could have some kind of list of rules against which an e-mail message is compared in sequence until it matches a rule which specifies some policy decision. The Netfilter architecture allows each rule to have an associated external module to evaluate a match (e.g. the "mac" module to match packets based on the interface's MAC address is specified using "--match mac") so that it is fully extensible. For each rule there is also either a destination decision which specifies the fate of any packet matching that rule or else the name of another table of rules to be applied in the same way. Netfilter's ability to combine tables of tables using jumps and RETURNs allows one to construct very powerful combinations of rules. Message matching modules could be supplied by a range of different companies/programmers and the local user (if this is done in their MUA) or else the site admin (for a MTA) could utilise whichever modules they prefer at their level. Thus there might be a module to implement DNS blacklisting, one for some kind of C/R, a module for digital signature checking, another for content-based filtering and so on. Typical destination outcomes for an e-mail message might be: - silently discard the message (analogous to Netfilter's DROP) - bounce the message back with an error (like Netfilter's REJECT) - accept the message for delivery (like Netfilter's ACCEPT) - log part or all of the message for use in spam statistics and abuser tracing (like Netfilter's LOG, processing need not terminate after doing this) There might be other policy options too, this isn't intended to be an exhaustive list. Just as with Netfilter, the table will need some kind of default policy for messages that don't match any of the rules listed. Users could choose a fail-open (ACCEPT) or fail-closed (DROP) approach depending on their preferences. So using a pseudo-Netfilter syntax, my spam filtering INPUT table might look something like this: --source my_mum@aol.com -j ACCEPT --match content --content-type text/html --contains JavaScript -j DROP --match content --content-type text/html --contains InvalidHTMLTags -j DROP --source friend@somewhere.net --match attachment --type EXE,SCR,PIF,BAT,VBS -j REJECT --match attachment --file-type EXE,SCR,PIF,BAT,COM -j DROP --source trusted_colleague@work.com --match attachment --file-type JPEG,GIF -j ACCEPT --match content --content-type text/plain -j ACCEPT ... with a default policy of DROP for anything else. Notice that I'm willing to send a rejection message to one of my friends to bounce a message back, as it seems only polite to warn them. But for most senders I would silently discard suspected spam in order to avoid giving away the fact that my address is valid and thus incurring more. This is of course merely an example of how I might specify my personal preferences. I'm not suggesting that anyone else should set theirs this way, nor would I presume to tell other people what their default should be. I think it should be entirely at the recipient's discretion as to what they choose to receive. As somebody who knows more about e-mail than the average user, I'm prepared to accept the risk of some genuine messages being dropped provided I can trust the rules and filter modules I'm using. But this is just my own preference. Incidentally I know the above syntax is ugly and unfriendly to end-users. It's just an example, of course. However it can be made much easier by providing simple forms or graphical tools for the user in order to generate the rules on their behalf. If this were integrated into the MUA and tied in with their address book it would be extremely simple to use. To simplify usability further, each site (organisation or ISP) might provide a series of default policies, ranging from "high spam protection" to "no spam protection". Users could choose a level based on how strongly they feel about the issue. One suggestion I've not seen so far is rather like the Internet Explorer classification of web sites into "zones" depending on their level of trust. Users might classify senders into zones in the MUA address book, or else define some rules which can map an individual message into a "zone". Then a default policy is provided for every zone (which advanced users are free to tweak and define their own "custom" settings). Of course it's hard to classify e-mail messages into such zones because it's difficult to determine their true origin, so perhaps that idea would be unworkable. Anyway this is just an aside, not essential to the idea I'm describing. One other thing occurred to me based on the Netfilter comparison. Netfilter has several tables of rules based on what stage of routing the packet has arrived at. So there is a PREROUTING table for filtering packets before they've undergone any NAT translation/mangling, then either the INPUT or FORWARD table (depending on the routing decision) gets a chance to process them again after translation. Under an analogous e-mail filtering system, one could define tables of rules as follows: - ARRIVAL - INPUT - FORWARD ARRIVAL - for rules to process a message at the time at which the SMTP client connection is open, before the message has been accepted and enqueued. Thus at some suitable point before issuing a "250 Mail queued for delivery" we check the message against the rules and decide what to do with it. The ARRIVAL rule table will probably only apply to MTAs. A decision to ACCEPT would be like a "250 Mail queued" kind of message. A decision to DROP might map onto a "250 Mail queued" reply code but where the message is then dropped without placing it into the mail queue. A decision to REJECT might map onto a "5xy" permanent refusal reply code. ... and so on. Other types of policy might provide for a DELAY, perhaps a transient "4xy" refusal code. The delay policy might be used to limit the rate at which suspected spam spreads around the net, by forcing it to remain in the previous MTA's queue. Of course this would likely require some stateful tracking of messages so it might be difficult in practice. I'll leave that one for the experts to decide. There might be other kinds of policy too... I don't want to be too prescriptive at this stage. Rules in the ARRIVAL chain might also add headers to the messages. For example, the addition of "X-SpamScore" headers by spam filter modules might be done here. INPUT - for a message which has been accepted by the ARRIVAL table and which is destinated for local delivery. The ISP's MTA might have a special set of rules which it only applies to mailboxes on its own servers; these rules could be entered here. Also, a user's MUA could have its own additional INPUT table for messages which the ISP has not blocked. A user might thus implement their own spam policy on their local machine in the event that their ISP's generic policy isn't good enough for them. My personal example above would fit in here, filtering messages as they are downloaded from the server (via POP3, IMAP or whatever protocol they like. I think IMAP has some interesting possibilities in its own right but that's an aside). FORWARD - for a message which has been accepted by the ARRIVAL table of an MTA but which is to be passed on elsewhere to some server which is outside the control of the organisation running this MTA. Rules included here might be based on reciprocal spam-blocking agreements between the organisations which operate MTAs (ISPs, companies, whoever). There might be scope for some kind of social co-operation here by favouring messages to/from well-behaved ISPs. Protocols for sharing/co-ordinating policy might be used to keep FORWARD rules up-to-date. As a simple example of an extreme, a honeypot server which traps spam but never delivers might have a FORWARD table consisting of a single LOG rule to keep a copy of the message for later analysis and a default DROP policy so nothing gets externally delivered. Of course I've not said anything about how the modules would communicate with such a system, nor about how (if at all) individual modules might communicate with each other. Suggestions or comments about that aspect of the system would be helpful to flesh out some detail. > It's not a solution to spam, though, because some things really are > things that can't be checked automatically, so the content filtering > will be imperfect. And (if it were to be standardised) we can expect > it to become more and more imperfect. Maybe filtering will be good enough for long enough to allow time to deploy a better long-term solution. At any rate it can give everyone some breathing space. Sorry this turned into such a huge message, I probably got carried away. Things I like about this idea are: - it could express consent policy at different stages in the network, as discussed previously on the list. There is scope for some mechanism (protocol?) to synchronise or distribute policies across the internet. I will leave the possible design of this to other people. - it is independent of the SMTP transport protocol and therefore does not require changes to the SMTP standard (of course, policy decisions will need to be mapped onto SMTP response codes somehow). - it doesn't need to be deployed by the whole world at once in order to reap benefits, just as Gordon has pointed out about his personal consent system. I've tried to show how some of his suggested categories of things might be matched and blocked in my INPUT example above Things that worry me about it: - it assumes that there are suitable filtering techniques (modules) available to fit into this system - it needs the writers of MUAs to get on-board and for users to upgrade to newer, more capable MUAs. I suspect that the users who hate spam enough would be quite willing to download better software if their ISP held them by the hand. Those who don't upgrade will still be able to communicate with the rest of the world, subject to the spam detection policies of their recipients - it would work best if MTAs were also redesigned to employ the scheme. Of course the interoperability means it can be deployed on a small scale first and gradually increased Thanks for reading... Andrew _______________________________________________ Asrg mailing list Asrg@ietf.org https://www1.ietf.org/mailman/listinfo/asrg
- [Asrg] Two ways to look at spam Yakov Shafranovich
- [Asrg] Two ways to look at spam Yakov Shafranovich
- RE: [Asrg] Two ways to look at spam Paul Judge
- Re: [Asrg] Two ways to look at spam Alan DeKok
- RE: [Asrg] Two ways to look at spam Yakov Shafranovich
- Re: [Asrg] Two ways to look at spam Yakov Shafranovich
- RE: [Asrg] Two ways to look at spam Barry Shein
- RE: [Asrg] Two ways to look at spam Bob Wyman
- Re: [Asrg] Two ways to look at spam C. Wegrzyn
- RE: [Asrg] Two ways to look at spam Paul Judge
- RE: [Asrg] Two ways to look at spam Yakov Shafranovich
- RE: [Asrg] Two ways to look at spam Yakov Shafranovich
- RE: [Asrg] Two ways to look at spam Bob Wyman
- Re: [Asrg] Two ways to look at spam Bruce Stephens
- Re: [Asrg] Two ways to look at spam Jon Kyme
- Re: [Asrg] Two ways to look at spam Dave Aronson
- Re: [Asrg] Two ways to look at spam Bruce Stephens
- Re: [Asrg] Two ways to look at spam Jon Kyme
- Re: [Asrg] Two ways to look at spam Kee Hinckley
- 6. Solutions - Detection (was Re: [Asrg] Two ways… Yakov Shafranovich
- Re: [Asrg] Two ways to look at spam Bruce Stephens
- Re: [Asrg] Two ways to look at spam Jon Kyme
- RE: [Asrg] Two ways to look at spam Barry Shein
- Re: [Asrg] Two ways to look at spam Andrew Akehurst
- Re: [Asrg] Two ways to look at spam Walter Dnes
- Re: [Asrg] Two ways to look at spam Bruce Stephens