Re: [Asrg] [ASRG] SMTP pull anyone?

Bill Cole <asrg3@billmail.scconsult.com> Tue, 18 August 2009 16:49 UTC

Return-Path: <asrg3@billmail.scconsult.com>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C9A2D3A6AE0 for <asrg@core3.amsl.com>; Tue, 18 Aug 2009 09:49:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.827
X-Spam-Level:
X-Spam-Status: No, score=-1.827 tagged_above=-999 required=5 tests=[AWL=0.457, BAYES_00=-2.599, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CaNaV0IIAziT for <asrg@core3.amsl.com>; Tue, 18 Aug 2009 09:49:44 -0700 (PDT)
Received: from toaster.scconsult.com (ns.scconsult.com [66.73.230.190]) by core3.amsl.com (Postfix) with ESMTP id 6D7A93A6BB2 for <asrg@irtf.org>; Tue, 18 Aug 2009 09:49:29 -0700 (PDT)
Received: from bigsky.scconsult.com (bigsky.scconsult.com [192.168.2.102]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by toaster.scconsult.com (Postfix) with ESMTP id 1453193E4EB for <asrg@irtf.org>; Tue, 18 Aug 2009 12:49:34 -0400 (EDT)
Message-ID: <4A8ADB9D.5080004@billmail.scconsult.com>
Date: Tue, 18 Aug 2009 12:49:33 -0400
From: Bill Cole <asrg3@billmail.scconsult.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090408 Eudora/3.0b2
MIME-Version: 1.0
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
References: <922a897b0908170253k60c0d57dh5e593c78f9137fab@mail.gmail.com>
In-Reply-To: <922a897b0908170253k60c0d57dh5e593c78f9137fab@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [Asrg] [ASRG] SMTP pull anyone?
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: asrg@irtf.org
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Aug 2009 16:49:51 -0000

Ravi shankar wrote, On 8/17/09 5:53 AM:
>
>
>  >DNS is used *as a medium* for various applications that are used to
> identify
>
>  >mail as legitimate or illegitimate by various standards of legitimacy,
> and a
>
>  >major reason for its use in those applications is to make it feasible for
>
>  >mail systems to do the validation synchronously during the SMTP
> session. By
>
>  >using a lightweight, distributed, cached database, mail systems are spared
>
>  >from deferring a message, queuing its validation, remembering the results,
>
>  >and waiting for the sender to offer it in an identical way again. You are
>
>  >suggesting that receivers should take on all the heavyweight
> management but
>
>  >retain using DNS for something unspecified. It makes no sense.
>
> Bill,
>
> Today's model is no different from what i have suggested in that they
> deploy costly anti-spam
>
> solutions, which utilise probably 10 fold resource than what this
> solution will use. By allowing the system to cut most of the spam
> through a simple pull mechanism, compares very well against today's
> anti-spam software model, which not all can afford.

I don't see how this reduces the effort required on the receiving side in 
comparison to currently common practices. I do see how it increases 
receiving system effort compared to currently common practices. I suspect 
that you don't understand those practices, so I'll explain at length...

It is very common for mail servers to apply multiple threshold criteria 
(often utilizing DNS) before the DATA command in a SMTP session to decide 
how to respond to the earlier commands, often making rejection decisions 
very early. SPF and the most common type of DNSBL can be checked that way 
and often are, along with rules like requiring the sender domain to have a 
valid MX or A record, shunning clients that use idiosyncratically invalid 
HELO names, etc. This does not require message data analysis, as it is done 
before the message data is offered. After receiving a RCPT command, the 
receiver knows the IP address of the sending client, the name it used for 
itself in the HELO or EHLO command, the envelope sender address, one or more 
recipient addresses, and the reject/accept results for any previously named 
recipients. In some cases where extensions to SMTP are used, it may also 
know some message and authentication metadata. It is quite normal for a mail 
server to use those facts and derivative facts (like the existence and 
content of DNS records related to them) to decide how to respond to that 
RCPT command.

For many mail systems, anti-spam measures done before the DATA command using 
metadata safely reject a large majority of spam (often a large majority of 
all email) and whitelist a smaller stream of messages. This sidesteps 
high-cost approaches that parse message data. For example, from the last 
10,000 connections to my own very small mail server, only 873 messages were 
passed to the part of my spam control system that examines the message data 
and 35 messages were cleared around that filtering. Obviously I can't get a 
perfect measurement for accuracy since I can't be sure that every error will 
be noticed and brought to my attention, but it has been many months and 
millions of messages since the last time I know that system to have rejected 
a legitimate message ahead of the data filters and it hasn't protected any 
spam from data filtering in the 5 years that I've been doing it. That 
performance is similar to what I've seen in the larger mail systems that 
I've managed for others.

The use of metadata rules (i.e. using envelope and session parameters and 
their derivatives) to reduce the flow of mail into message data filters is 
not a new or rare strategy, but rather is an evolutionary remnant of the 
earliest spam control tactics. For many years, spam exclusion was almost 
exclusively done before the DATA phase of SMTP because it worked well enough 
and because filtering based on message data was more resource-intensive than 
it could justify with results. To this day, well-run mail systems whose 
operators are concerned about the resource demands of spam control use the 
information available early in the SMTP transaction to decide whether to 
allow the sender to 'push' the message itself.

The 'pull' model you have described does not specify any way in which it can 
improve on the pre-data filtering that is already being done, but it does 
add a burden to both sides of legitimate transactions: keeping track of 
message offers that are pending a decision to pull and an actual pull 
attempt. In order to justify that added burden (in addition to the huge 
development and deployment costs) you would need to explain how your pull 
model facilitates better filtering than what sites do now. Sparing systems 
from message data filtering isn't enough, unless you have some case for your 
model doing that consistently and sustainably better than current tactics 
that operate during the SMTP session.


>  >The *most* that SPF can provide towards showing "legitimacy" is to confirm
>
>  >that the envelope sender address of a message is not forged. It is
> very rare
>
>  >for large senders of any sort to deploy records that can do that strongly.
>
>  >There is nothing about SPF that directly attacks spamming. It could in
>
>  >theory be used to attack sender forgery, but the collateral damage has
>
>  >proven to be too great for either sending or receiving systems to actually
>
>  >apply it strongly to that end. Meanwhile, a lot of spammers are sending a
>
>  >lot of spam with senders that are validated to the degree that SPF can
>
>  >validate anything.
>
> Actually SPF only validate the legitimacy of the sender IP and domain
> relation and i mentioned SPF as just a example.

SPF is specified as applying to the whole envelope sender. Explicit records 
using the %l macro are rare, but many domains assure that the hosts they 
affirm in SPF are using correct local parts in sender addresses. That is 
what would be expected with normal MTA software and configurations that 
could be affirmed in SPF.

> And if the large senders
> cannot implement something as simple as a TXT record for SPF (leave
> alone DKIM), then probably they do no care about spam.

I understand that it is easy and tempting to be dismissive about the lack of 
care among large senders, but it is self-defeating when trying to devise and 
evangelize a new spam control mechanism.

It is worth noting that Microsoft (as Hotmail) has been the most important 
actor in getting SPF records deployed by others, even though Hotmail systems 
are chronic spam sources and their inbound mail systems do not use SPF 
records in anything like a normal way.

 > SPF or DKIM are
> only effective when deployed by all the domains that send mails.

That is a ridiculously false statement. I have to assume that we are having 
a problem of differing idioms of English, or else I would think you a fool.


>  > 4. The sending server then hands over the message.
>
>  > 5. To overcome DDoS attacks, the receiving server can be made to request
>
>  > the next 10 or so Message IDs that it will assign to messages,
>
>  > so that if a attacker tries to give those details, it will know from the
>
>  > next list of message IDs that it's fake connection.
>
>  >>>That sentence makes no sense. What did you mean to say?
>

> What i mean is in order to prevent a system from getting overwhelmed, by
> anonymous submission, if for say domain1.com server knows the next 10
> message ID that will be sent by domain2.com, then it can confidently
> reject those message submission attempts that does not have any mails in
> this range (ofcourse this logic holds only if domain2.com is going to
> send those 10 message IDs domain1.com only)

Okay, so you are redefining "Message ID" as a new identifier defined by each 
MTA for each message that it handles, rather than as something related to 
the Message-ID mail header.

That concept is interesting, but it is not consistent with how mail systems 
work today. It brings into question whether you have a useful understanding 
of the range of ways that people use email and the range of ways that mail 
servers handle mail. The practices that would have to end in order to enable 
this facet of your idea include those which forced SPF into its arcane 
complexity and those which constrain its strength and deployability today.


>  >Nothing you have described would add to spam control as it is currently
>
>  >being done, as far as I can see. The 'model' is too vague to critique inn
>
>  >detail because you aren't really providing any meaningful details.
>
>  >In order to bring anything truly new and useful to controlling email
> spam, a
>
>  >new idea has to either attack spam in a way that existing tactics
> don't, do
>
>  >a demonstrably better job than existing tactics, or overcome the negative
>
>  >aspects of existing tactics. You have identified none of those in your new
>
>  >idea.
>
> I guess we are expecting a magic solution that will stop all the spam in
> a single go and would not require us from changing our system
> continuosly.

Not at all, and that is part of why I am skeptical about your suggestion. It 
would be a radically new way of handling email, to a degree that it would 
not really make sense to define it as an extension to SMTP.

> But unfortunately, every system has flaws and has to be
> corrected one step at a time, this i believe is the evolution.

Gradual evolutionary steps have to provide a real hope of some incremental 
benefit to early adopters without doing them immediate harm. Even if you had 
a fully detailed model for how this would work and had a deployable way to 
integrate it today into existing mail systems, you would need to assure that 
it would be harmless to offer now (i.e. no rejection of legitimate mail from 
non-users of the new system) and that it could provide some benefit for both 
senders and receivers who adopt it before it becomes widely deployed. As 
described, it increases the difficulty of handling mail for both sides and 
offers neither side any concrete benefits.


> I have done my best to detail how this system applies in various steps
> of a mail communication, may be i can work on a pictorial
> representation, if someone else requires it as well.

If this is what you consider "detail" then you have a major obstacle to 
being taken seriously. Drawing pictures wouldn't be a step forward. Defining 
a transaction protocol would be, but I wouldn't suggest you do that until 
you identify concrete ways that your model offers benefits that existing 
common practices cannot offer.