Re: [Asrg] What are the IPs that sends mail for a domain?

Douglas Otis <dotis@mail-abuse.org> Thu, 02 July 2009 19:43 UTC

Return-Path: <dotis@mail-abuse.org>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7504B3A6D84 for <asrg@core3.amsl.com>; Thu, 2 Jul 2009 12:43:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.942
X-Spam-Level:
X-Spam-Status: No, score=-5.942 tagged_above=-999 required=5 tests=[AWL=-0.258, BAYES_00=-2.599, J_CHICKENPOX_16=0.6, RCVD_IN_DNSWL_MED=-4, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id va0Jskomo+1x for <asrg@core3.amsl.com>; Thu, 2 Jul 2009 12:43:33 -0700 (PDT)
Received: from harry.mail-abuse.org (harry.mail-abuse.org [168.61.5.27]) by core3.amsl.com (Postfix) with ESMTP id 3F78E3A693F for <asrg@irtf.org>; Thu, 2 Jul 2009 12:43:33 -0700 (PDT)
Received: from [IPv6:::1] (gateway1.sjc.mail-abuse.org [168.61.5.81]) by harry.mail-abuse.org (Postfix) with ESMTP id F1676A94448 for <asrg@irtf.org>; Thu, 2 Jul 2009 19:43:53 +0000 (UTC)
Message-Id: <6C4133DD-CAD2-4FE3-8087-9301B46832F6@mail-abuse.org>
From: Douglas Otis <dotis@mail-abuse.org>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
In-Reply-To: <4A4CCC56.8090804@tana.it>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v935.3)
Date: Thu, 02 Jul 2009 12:43:53 -0700
References: <200906180105.VAA21834@Sparkle.Rodents-Montreal.ORG> <C8F0F10E-E1A4-4D25-AF20-31E3F0DB68DF@mail-abuse.org> <200906182044.QAA05200@Sparkle.Rodents-Montreal.ORG> <FED77586-8800-4BA6-99EA-30A1D9C089B6@mail-abuse.org> <200906190149.VAA06902@Sparkle.Rodents-Montreal.ORG> <B5252B96-F0AB-4D4A-A0DA-8314AA8E038F@mail-abuse.org> <4A3D366E.2020304@tana.it> <934f64a20906201606pff54ca3y904da141013f1d2a@mail.gmail.com> <4A490CC5.8020601@billmail.scconsult.com> <4A49C1DD.8020205@tana.it> <20090630200150.GL57980@verdi> <4A4B709C.2000109@tana.it> <CA9E386E-44BA-4E3B-8A91-A99B07393BA0@mail-abuse.org> <4A4CCC56.8090804@tana.it>
X-Mailer: Apple Mail (2.935.3)
Subject: Re: [Asrg] What are the IPs that sends mail for a domain?
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jul 2009 19:43:34 -0000

On Jul 2, 2009, at 8:03 AM, Alessandro Vesely wrote:

> Douglas Otis wrote:
>> On Jul 1, 2009, at 7:20 AM, Alessandro Vesely wrote:
>>> John Leslie wrote:
>>>> The CSV paradigm is that the operator of a MTA should exercise  
>>>> some responsibility for what is sends. The HELO string identifies  
>>>> the MTA (though not necessarily one string exclusively by one  
>>>> MTA), and the DNS management for that domain-name string states  
>>>> whether that domain exercises responsibility (and by automatic  
>>>> return of A)ddress RRs on SRV queries, what IP address(es) that  
>>>> MTA uses).
>>>
>>> The link from the MTA to its operator is still missing.
>> Disagree.  Based on our results, when only a few domains publish an  
>> IP addresses of an Outbound MTA, it is rather safe to assume the  
>> domains represented by verified EHLO information resolve who is  
>> administrating the MTA.
>
> In general an MTA provides a FQDN like mta.example.com, and we are  
> unable to say whether it "is a member of" example.com. In addition,  
> the MTA's administrator may be unrelated to example.com's registrant.

When the EHLO host name references IP addresses that match the  
Outbound MTA, this verifies there is a common administration between  
the FQDN and DNS.  CSV ensures this relationship can be established,  
and also asserts the system is an outbound MTA.

>> When there are many domains, this appears to represent either MTAs  
>> operating behind a NAT, or compromised systems; sometimes both.
>
> I don't understand that kind of setup. Multiple MTAs operating  
> behind a NAT have no way to receive mail, so they are only sending  
> SMTP clients. Is inertia the only reason why they don't use an MSA?

Unfortunately, Outbound MTAs behind NATs is more common that it should  
be, especially in Poland and Brazil.  Outbound MTAs can operate behind  
NATs.  Outbound MTAs can be unrelated to where their mail is  
received.  Even so, port 25 of the NAT can be nailed to a host within  
the NAT's private address space.  Outbound MTAs behind a NAT is fairly  
common in office environments.  The presence of multiple EHLO host  
names within the NATed office environment however appears to be most  
often due to compromised systems behind the same NAT.

>> It appears to be rare for legitimate Outbound MTAs to change domain  
>> affiliations.  From a reputation standpoint, verified EHLO  
>> information offers stable identifiers in which to effectively and  
>> efficiently manage email abuse.
>
> It seems that vanity domains don't mind if the MX record points to  
> an MTA in another domain. The intricacies of having CNAMEs near MX  
> settings presumably refrained also from assigning multiple names,  
> one for each vanity domain, to a given MTA. However, some do it.

Don't confuse Outbound MTAs with that of Inbound MTAs.  This has  
nothing to do with MX records.

> On an SMTP connection, a client says: "Hello mta.example.com", and  
> after I accept that, it goes on with "Mail from:<xyz@example.ORG>".  
> What does that mean, in terms of accountability?

With EHLO mta.example.com and a history of that host name being used  
by small range of IP addresses, and conversely, the IP address  
consistently reporting the same host name, then acceptance of messages  
from this Outbound MTA is likely safe.  When the mta.example.com also  
references a CSA record, there is even further assurance the exchange  
is not emitted by a compromised system, which currently represents the  
greatest source for spam.

 From a reputation perspective, the name and IP address relationships  
that need to be tracked for Outbound MTAs represent data sets orders  
of magnitude smaller and dramatically more stable that that  
represented by MAIL commands or PRAs.  Once EHLO relationship can be  
verified, it should permit safe inclusion of IPv6 addresses.  MAIL  
commands or PRAs do not offer the needed stability nor are these  
relationships readily verified without potentially burdensome overhead  
that might be directed to innocent third-parties.

>>> To this end, I'd prefer the use of a domain name.
>
> This guy is relaying on behalf of someone else. If I had verified  
> the "example.com" possibly registered domain, I would spot it more  
> easily. If example.ORG is a vanity name, i.e. it has the same MX as  
> example.com, I accept it. If not, I wouldn't know who is accountable  
> for the message, so I reject it.

Do not confuse Inbound MTAs with Outbound MTAs.  Even without the host  
name having been verified, the host name and IP address information  
inconsistencies can lead to safe rejections.

> In case mta.example.com runs outbound MTA services for third  
> parties, I would hold the originating third party accountable for  
> the message.

Why not hold the entity offering access to those abusing email  
accountable?

You have no assurance whether third-party domains originated the  
message, even when the domain offers authorization.  This assumes  
Outbound MTAs ensure both the PRA and MAIL commands are restricted to  
specific and verified users.  While this might be the case, this is  
not the norm.  It would also be foolish to annotate email as having  
been "authenticated" on the basis of Outbound MTA authorization for  
the same reason.  Large companies place a higher priority on their  
messages being received than ensuring authorizations do not expose  
exploits to bad actors.  In addition, a growing number of exploits now  
leverage social networks that convey who recipients trust.  Spoofed  
sources of phish are not necessarily that of large banks or financial  
institutions.

> However, there is no way I can tell that is indeed the case, neither  
> from the IP address nor from the name. I'd need, say, a CSA SRV  
> record from example.ORG saying: "we authorize mta.example.com",  
> _and_ on starting a new session the client shall say: "Hello on  
> behalf of example.ORG: check their CSA settings". No guessing  
> required, then.

Yes, the CSA record would help, but this EHLO information still offers  
value.  Especially when considering the use of IPv6 where block lists  
are unlikely to prove effective.

>> While larger ISPs are likely to have a few hundred outbound MTAs,  
>> they represent a very small percentage of overall legitimate  
>> Outbound MTAs.
>
> However, they transmit lots of messages.

It is still easier to correlate EHLO host names to that of a small set  
of IP addresses.  That is not the case for MAIL commands or PRAs.

>> Being able to identify legitimate Outbound MTAs reduces the vetting  
>> of hundreds of millions of domains associated with Mail From or  
>> PRAs, where each domain likely covers massive address lists.
>
> Exactly. Those legitimate Outbound MTAs are probably connected to an  
> MSA. When we identify them, we enjoy the advantages deriving from  
> users going through their relevant MSAs, as we recommended, rather  
> than relaying through whatever MTA they have at hands, or directly.

The issue is not whether there is some mapping of users to specific  
MSAs.  In fact, it is not practical to track this type of many to many  
relationships.  The issue is simpler than that.  The issue is to  
simply hold the Outbound MTA accountable for the message sources for  
whom it grants access.  DKIM offers a reliable means to verify where a  
message originated without impractical and unscalable efforts aimed at  
registering all authorized paths for MAIL commands or PRAs.  DKIM is  
not about managing spam however.

>> Efforts to combine the addresses used by a domain is counter  
>> productive when it comes to resolving problems, or when dealing  
>> with initial SMTP connections.  When it comes to SMTP, direct  
>> relationships involve less overhead which improves efficacy and  
>> efficiency to the point of perhaps permitting use of IPv6.
>
> In some cases, names can be handled better than IP numbers. After  
> all, that's why they invented the DNS...

While true for EHLO commands, IP addresses associated with MAIL  
commands or PRAs will normally exceed DNS response limits where  
truncation will induce failure.  The resulting chaining of DNS  
transactions, along with inclusion of macros found with SPF, failed to  
properly consider the potential system impact nor what DNSSEC could  
produce.  Verifying a DNS relationship between single hosts and their  
limited IP address list remains much less problematic and far safer.

DNS CIDR notation is specified by RFC 3123.  This resource record can  
still be used to establish email white-list strategies instead of SPF  
records.   APL records exclude potentially problematic macros as  
well.  Since email CIDR records should not be obtained in response to  
SMTP connections, APL might be chained with a convention for _n._smtp  
APL, as an alternative publishing location for _smtp APL.  When _smtp  
is truncated, either TCP fallback could be used, or _[0-9]._smtp APL  
queries might be used to recover from response truncation.

-Doug