[Asrg] Some data on the validity of MAIL FROM addresses

Kee Hinckley <nazgul@somewhere.com> Sun, 18 May 2003 07:37 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA04412 for <asrg-archive@odin.ietf.org>; Sun, 18 May 2003 03:37:34 -0400 (EDT)
Received: (from mailnull@localhost) by www1.ietf.org (8.11.6/8.11.6) id h4I75uv18960 for asrg-archive@odin.ietf.org; Sun, 18 May 2003 03:05:56 -0400
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4I75uB18957 for <asrg-web-archive@optimus.ietf.org>; Sun, 18 May 2003 03:05:56 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA04408; Sun, 18 May 2003 03:37:04 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19HIkx-0002Wc-00; Sun, 18 May 2003 03:38:55 -0400
Received: from ietf.org ([132.151.1.19] helo=www1.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19HIkw-0002WZ-00; Sun, 18 May 2003 03:38:54 -0400
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4I71IB18874; Sun, 18 May 2003 03:01:18 -0400
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4I707B18804 for <asrg@optimus.ietf.org>; Sun, 18 May 2003 03:00:07 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA04376 for <Asrg@ietf.org>; Sun, 18 May 2003 03:31:14 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19HIfK-0002W3-00 for Asrg@ietf.org; Sun, 18 May 2003 03:33:06 -0400
Received: from www.somewhere.com ([66.92.72.194] helo=somewhere.com) by ietf-mx with esmtp (Exim 4.12) id 19HIfJ-0002Vz-00 for Asrg@ietf.org; Sun, 18 May 2003 03:33:05 -0400
Received: from [66.92.72.194] (account nazgul HELO [192.168.1.104]) by somewhere.com (CommuniGate Pro SMTP 3.5.7) with ESMTP-TLS id 2362272 for Asrg@ietf.org; Sun, 18 May 2003 02:34:22 -0500
Mime-Version: 1.0
X-Sender: nazgul@somewhere.com@pop.messagefire.com
Message-Id: <p06001254baeb12ff775c@[192.168.1.104]>
To: Asrg@ietf.org
From: Kee Hinckley <nazgul@somewhere.com>
Content-Type: text/plain; charset="us-ascii"
Subject: [Asrg] Some data on the validity of MAIL FROM addresses
Sender: asrg-admin@ietf.org
Errors-To: asrg-admin@ietf.org
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/pipermail/asrg/>
Date: Sun, 18 May 2003 03:34:14 -0400

Vernon has regularly made the claim that a significant proportion of 
spam messages have valid MAIL FROM's.  That means that bounces will 
go the the spammer.  This has significant ramifications for C/R 
systems (especially auto-respond ones) since it means that should 
they have to, spammers could respond to challenges.

To test this theory, I took a day's worth of bounce logs from 
somewhere.com (2003-05-15).  These should be fairly normal logs. 
There's been a bit of an upswing from a recent virus attack, but 
otherwise these are pretty normal bounce logs for somewhere.com. 
These are for addresses that do not, and have never, existed. 
Because they got on the spammer's lists primarily because someone 
entered the address on a web site, they get a mix of "true" spam and 
just standard bulk mail.  However if they bulkmailers are doing their 
job, those addresses should be removed fairly quickly.  If they 
aren't removing on bounces--then they look and smell a lot like 
spammers.

Known oddities in the data:

862 messages to wormalert@somewhere.com and variations.  These tend 
to run about 1/3 viruses, 1/3 real messages and 1/3 spam.  That set 
has 533 distinct MAIL FROM addresses.

12340 messages from olga@somewhere.com to mail@somewhere.com. 
(Misconfigured Axis video cameras.)

Since all I'm counting here are unique MAIL FROM addresses, neither 
of these should have a huge impact.

I ran a program which took each MAIL FROM address, parsed out the 
domain portion, looked up the MX record, and then connected to the 
SMTP port of the lowest numbered MX server.  I did a
	HELO somewhere.com
	MAIL FROM <postmaster+AntiSpamAddressVerification@somewhere.com>
	RCPT TO <appropriate-address>
	QUIT
Note that a few sites bounced me at the HELO prompt (didn't like that 
I was on DSL, or that my name was somewhere.com)  A few bounced at 
the MAIL FROM (didn't like somewhere.com--and one claimed that + 
wasn't a legal email character).  But the number of either of those 
was pretty low (less than half a dozen).  I'll do a better job of 
recording those separately in the future.

There were 39595 entries in the log, with 34404 distinct SMTP sessions.
There were 11559 unique MAIL FROM addresses.

+---------+-------+------------+
| errcode | total | percentage |
+---------+-------+------------+
|       0 |    99 |       0.86 |	???
|     250 |  5796 |      50.14 |
|     450 |     6 |       0.05 |
|     451 |    12 |       0.10 |
|     452 |     8 |       0.07 |
|     473 |     4 |       0.03 |
|     500 |     1 |       0.01 |
|     501 |     1 |       0.01 |
|     521 |     3 |       0.03 |
|     530 |     1 |       0.01 |
|     550 |  2341 |      20.25 |
|     551 |     3 |       0.03 |
|     552 |     2 |       0.02 |
|     553 |   288 |       2.49 |
|     554 |    48 |       0.42 |
|     555 |     1 |       0.01 |
|     556 |     1 |       0.01 |
|     571 |     1 |       0.01 |
|    1001 |  1880 |      16.26 |	No MX Record
|    1003 |  1055 |       9.13 |	No SMTP Server
|    1007 |     8 |       0.07 |	Invalid Email Format
+---------+-------+------------+

In aggregate.  51% of the addresses were valid.  49% were not.
Of the ones that were not valid, 52% didn't have a reachable mail server.

Now let's see how it breaks down by domain.

Here are the top 5 domains in the MAIL FROM's.
+-------------------------+-------+
| host                    | count |
+-------------------------+-------+
| yahoo.com               |   819 |
| hotmail.com             |   714 |
| aol.com                 |   632 |
| earthlink.net           |   209 |
| msn.com                 |   161 |
+-------------------------+-------+

Let's do the same stats for each of these.  Note that I have a 1-2% 
"No SMTP Server" rate.  This could mean that they were rate limiting 
my queries.  More likely it's do the the very short timeout I put on 
doing the query.  I'll have to adjust that in the future.

+-----------+---------+-------+------------+
| host      | errcode | total | percentage |
+-----------+---------+-------+------------+
| yahoo.com |    NULL |     1 |       0.12 |
| yahoo.com |     250 |   669 |      81.68 |
| yahoo.com |     553 |   129 |      15.75 |
| yahoo.com |    1003 |    20 |       2.44 |
+-----------+---------+-------+------------+
+-------------+---------+-------+------------+
| host        | errcode | total | percentage |
+-------------+---------+-------+------------+
| hotmail.com |    NULL |     1 |       0.14 |
| hotmail.com |     250 |   111 |      15.55 |
| hotmail.com |     550 |   602 |      84.31 |
+-------------+---------+-------+------------+
+---------+---------+-------+------------+
| host    | errcode | total | percentage |
+---------+---------+-------+------------+
| aol.com |       0 |    10 |       1.58 |
| aol.com |     250 |   581 |      91.93 |
| aol.com |     550 |    10 |       1.58 |
| aol.com |    1003 |    31 |       4.91 |
+---------+---------+-------+------------+
+---------------+---------+-------+------------+
| host          | errcode | total | percentage |
+---------------+---------+-------+------------+
| earthlink.net |     250 |    43 |      20.57 |
| earthlink.net |     550 |   149 |      71.29 |
| earthlink.net |     554 |    14 |       6.70 |
| earthlink.net |    1003 |     3 |       1.44 |
+---------------+---------+-------+------------+
+---------+---------+-------+------------+
| host    | errcode | total | percentage |
+---------+---------+-------+------------+
| msn.com |    NULL |     1 |       0.62 |
| msn.com |     250 |    62 |      38.51 |
| msn.com |     550 |    97 |      60.25 |
| msn.com |    1003 |     1 |       0.62 |
+---------+---------+-------+------------+

Interesting that the results vary so much by ISP.  Yahoo accounts are 
pretty valid.  Hotmail accounts are pretty bad.  AOL is quite good. 
Earthlink has a problem.  MSN's slightly better, but still negative.

In general though, it appears that Vernon is correct.  If my sample 
is representative, a large percentage of spam is coming from real 
email addresses.

I'll be making this data (and hopefully live update's to it) 
available on the web, hopefully in the next few days.


As an addition anecdotal piece of information.  In the past month 
I've seen five separate email accounts (including two of mine) get 
Joe-jobbed in a new way.  Instead of major bounceback, they just get 
one or two.  It smells like new spam software that uses the same 
database of addresses for From that they were using for To.  The goal 
might be to get through verification filters like the above.  But 
it's also interesting to consider what havoc that might wreak on C/R 
systems.  How is someone going to react with they get a challenge for 
a message they didn't send?  I predict that if people get used to C/R 
systems they'll just click send--and the spammer's message will get 
through.

Finally, as an addendum of sorts.  Here are the unique messages 
associated with the above error codes.  I've left out 250 and 550 
ones--I'm just tracking the less common ones.  And they've been 
normalized to remove email addresses and domain names.

+---------+-------+----------------------------------------------------+
| errcode | count | substring(message,1,50)                            |
+---------+-------+----------------------------------------------------+
|       0 |    99 |                                                    |
|     250 |  5796 | recipient ok                                       |
|     450 |     3 | <EMAILADDRESS>: User unknown in local recipient ta |
|     450 |     2 | <localhost.localdomain>: Helo command rejected: Ho |
|     450 |     1 | Mailbox unavailable.                               |
|     451 |     1 | 4.0.0 Can't create transcript file ./xfh4GNNYv0581 |
|     451 |     1 | 4.3.0 error creating message, status = StatusSpool |
|     451 |     1 | 4.3.5 Error getting LDAP results in map sbcldap:   |
|     451 |     4 | <EMAILADDRESS>: Temporary lookup failure           |
|     451 |     1 | <LOCALPART> ... Recipient mailbox is full          |
|     451 |     1 | Can't connect to bisman.com - psmtp                |
|     451 |     2 | Requested action aborted: local error in processin |
|     451 |     1 | Server Error                                       |
|     452 |     2 | 4.2.1 Mailbox temporarily disabled: EMAILADDRESS   |
|     452 |     2 | 4.2.2 Mailbox full                                 |
|     452 |     2 | 4.4.5 Insufficient disk space; try again later     |
|     452 |     2 | Message for <EMAILADDRESS> would exceed mailbox qu |
|     473 |     4 | EMAILADDRESS relaying prohibited. You should authe |
|     500 |     1 | <EMAILADDRESS>: Recipient address rejected: Recipi |
|     501 |     1 | Syntax error in sender: <postmaster+AntiSpamAddres |
|     521 |     1 | This User has too many concurrents, please try aga |
|     521 |     2 | this mailbox is disabled or invalid (#5.2.1)       |
|     530 |     1 | Delivery not allowed to non-local recipient, try a |
|     550 |  2341 | unknown user                                       |
|     551 |     1 | 5.0.0 Mailbox disabled,storage space exceeded      |
|     551 |     1 | EMAILADDRESS illegal name for an account           |
|     551 |     1 | not our customer                                   |
|     552 |     1 | <EMAILADDRESS>: Recipient address rejected: Sorry, |
|     552 |     1 | Requested action aborted: exceeded storage allocat |
|     553 |     1 | 5.0.0 <EMAILADDRESS>... No such user               |
|     553 |     1 | 5.1.3 <EMAILADDRESS>... Invalid route address      |
|     553 |    17 | 5.3.0 <EMAILADDRESS>... Addressee unknown, relay=[ |
|     553 |     1 | 5.3.0 <EMAILADDRESS>... Delivery ERROR!!!User does |
|     553 |     4 | 5.3.0 <EMAILADDRESS>... No such user               |
|     553 |     6 | 5.3.0 <EMAILADDRESS>... No such user here          |
|     553 |     2 | 5.3.0 <EMAILADDRESS>... That address is not curren |
|     553 |     1 | 5.3.0 <EMAILADDRESS>... Try LOCALPART@symantec.com |
|     553 |     1 | 5.3.0 <EMAILADDRESS>... User LOCALPART mailbox ful |
|     553 |     3 | 5.3.0 <EMAILADDRESS>... User unknown               |
|     553 |     8 | 5.5.3 <EMAILADDRESS>... Invalid                    |
|     553 |     1 | <EMAILADDRESS>... User unknown                     |
|     553 |     2 | No mailbox here by that name, sorry (#5.7.1)       |
|     553 |     1 | RCPT TO:<EMAILADDRESS> refused                     |
|     553 |     7 | Requested action not taken: mailbox name not allow |
|     553 |   143 | VS10-RT Possible forgery or deactivated due to abu |
|     553 |    88 | sorry, that domain isn't in my list of allowed rcp |
|     553 |     1 | sorry, your envelope sender is in my badmailfrom l |
|     554 |     1 | 5.0.0 ADMIN.COM ISN'T THE DOMAIN YOU'RE LOOKING FO |
|     554 |     3 | <EMAILADDRESS>: Recipient address rejected: Access |
|     554 |     1 | <EMAILADDRESS>: Recipient address rejected: Domain |
|     554 |     9 | <EMAILADDRESS>: Recipient address rejected: Not ac |
|     554 |     1 | <EMAILADDRESS>: Recipient address rejected: Relay  |
|     554 |     5 | <EMAILADDRESS>: Relay access denied                |
|     554 |     1 | <localhost.localdomain>: Helo command rejected: Ho |
|     554 |     1 | EMAILADDRESS Mail quota exceeded                   |
|     554 |     1 | Mail for EMAILADDRESS rejected for policy reasons. |
|     554 |    21 | Quota violation for EMAILADDRESS                   |
|     554 |     1 | Relay rejected for policy reasons.                 |
|     554 |     2 | SPAM-Relay detected                                |
|     554 |     1 | recipient <EMAILADDRESS>, Transaction failed       |
|     555 |     1 | sorry, your envelope recipient is in my badrcptto  |
|     556 |     1 | invalid email address EMAILADDRESS (5.5.6)         |
|     571 |     1 | <www.somewhere.com[66.92.72.194]>: Client host rej |
|    1001 |  1880 | No MX Record                                       |
|    1003 |  1055 | No SMTP Connection                                 |
|    1007 |     8 | Bad Address Format                                 |
+---------+-------+----------------------------------------------------+

-- 
Kee Hinckley
http://www.messagefire.com/          Junk-Free Email Filtering
http://commons.somewhere.com/buzz/   Writings on Technology and Society

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg