Re: [Asrg] DNSBL and IPv6

Matthias Leisi <matthias@leisi.net> Fri, 26 October 2012 13:28 UTC

Return-Path: <matthias@leisi.net>
X-Original-To: asrg@ietfa.amsl.com
Delivered-To: asrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A822521F85F9 for <asrg@ietfa.amsl.com>; Fri, 26 Oct 2012 06:28:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.677
X-Spam-Level:
X-Spam-Status: No, score=-1.677 tagged_above=-999 required=5 tests=[AWL=-1.000, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, MANGLED_TOOL=2.3, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3DBZW5bmoPmB for <asrg@ietfa.amsl.com>; Fri, 26 Oct 2012 06:28:56 -0700 (PDT)
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by ietfa.amsl.com (Postfix) with ESMTP id D7A4921F85E8 for <asrg@irtf.org>; Fri, 26 Oct 2012 06:28:55 -0700 (PDT)
Received: by mail-ob0-f182.google.com with SMTP id wc20so3361705obb.13 for <asrg@irtf.org>; Fri, 26 Oct 2012 06:28:55 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=jvDt9h7k6Fth+dFRBZjowAFpA1NLVxJRbLR7p2jOfh0=; b=ARnrczI6IFcQRpKd6WmFffTtiliFXmNhquTv6bOXOxL8zc5ufQEpHktWyk21v6eE9x aUDKEJjjZp78aWyStLKcHXSyWjyDSf5LozGTLe8MsKQz2JXu5cexVugJH1FbHU0KF3h4 yD6TOXg5D6ymyKjhNDIFYj3qI92KUxySZw+RUoEA5Uoq7WDKUx1/1kkwhEJX203EhFwu /hn996du1FXw6wsjGIYn5peiEeL0HQ9nh7tlrf3O+tymLi6k/UCbNBBhNnYUp5rbPb5L PdeUnnjzXju38Tn9NlDKscvLpfDlArajrPZ3dtnaMthYvzWUMtH5PSe7L1VkLlhmJhtM W5BA==
Received: by 10.182.76.194 with SMTP id m2mr18157674obw.39.1351258135392; Fri, 26 Oct 2012 06:28:55 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.76.3.173 with HTTP; Fri, 26 Oct 2012 06:28:35 -0700 (PDT)
In-Reply-To: <20121026003459.5415A800037@ip-64-139-1-69.sjc.megapath.net>
References: <20121026003459.5415A800037@ip-64-139-1-69.sjc.megapath.net>
From: Matthias Leisi <matthias@leisi.net>
Date: Fri, 26 Oct 2012 15:28:35 +0200
Message-ID: <CALgnk9ooaypY8iBoqmu7TWZheUUGj3iNR1s6S7cmqubx4bm1jw@mail.gmail.com>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQnY+9UzLKTQ9L+/V0DAkD0mtmea2cQ/cmVrQYskGjDlefPOn6SAWHh3/jIu+NY63cvDlpyS
Subject: Re: [Asrg] DNSBL and IPv6
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Oct 2012 13:28:56 -0000

On Fri, Oct 26, 2012 at 2:34 AM, Hal Murray <hmurray@megapathdsl.net> wrote:

>> I'm obviously biased since I run dnswl.org, but an IPv6-based whitelist may
>> work better than an IPv6-based blacklist. Enumerating the goodness is
>> generally easier than enumerating the badness.
>
> What fraction of email comes from hosts you have listed?  How hard would it
> be to scale your list up to cover the whole world?

We do store IPv6 addresses, but we don't publish them yet (since there
is no standard yet - the result of the discussion here may lead to the
emergence of a de-facto standard, or at least a first trial). So I can
not really tell what fraction we have listed in IPv6 world.

I can tell about our estimate on what we cover in terms of IPv4 based
on the dnswl.org stats. These stats are based not on SMTP trafffic,
but on the DNS traffic which we log (sample) on some of the public
mirrors. Larger senders with better cache utilization are likely to be
somewhat underrepresented in our data. We then do not use the absolute
numbers, but logarithmic magnitudes:

Magnitude	Percent
10.0	100%
9.0	10%
8.0	1%
7.0	0.1%
6.0	0.01%
5.0	0.001%
4.0	0.0001%
3.0	0.00001%
2.0	0.000001%
1.0	0.0000001%

Extract from our magnitudes report (I'll happily share more data on request):
1	9.63	IPs where we have no record
2	9.24	IPs which are in our DB*, not published - contains a lot of
"bad apples" ("DNSWL Id 0")
3	8.86	IPs which are in our DB*, not published - with fewer "bad
apples" ("DNSWL Id 1")
4	8.44	Yahoo
5	8.38 Google
6	8.36 Internally blacklisted (snowshoe ranges etc)
7	8.12 Hotmail
8	8.03 Facebook
9	7.91	Exacttarget
10	7.89	cheetahmail.com	

* Through imports from third parties or "learned" through the DNS logs

It's clear that key is the mag 9.63 of where we have no record. This
category contains all the residential/dynamic/botnet IPs; we do not
count the number of different IPs.

DNSWL Id 0 contains about 300k IPs, DNSWL Id 1 contains about 100k
IPs, all the other (published) DNSWL records contain about 200k IPs.
Records in 0 and 1 have quite some fluctuation (eg removed since they
are not present in the import source any more; promoted from "0" to
"1" based on some criteria; demoted from "1" to "0" based on the lack
of the same criteria). Entries are manually promoted from "0" and "1"
to one of the published DNSWL Ids.

Possibly 80k in each of the two "special" records may be considered
"good" (as in: not operated by spammer/bot herders), which would lead
to ~ 360k mostly legitimate SMTP-sending IPs.


> Assuming that you don't want to put all your eggs in one basket, how many
> white lists would you need and/or how would you decide the order to check
> them?

One? Definitely too risky. Two? Not sufficient, in my view. Three?
That may work. Four? Could improve diversity. Five? There is a point
where returns of additional lists become diminishing (eg higher
latency, larger overlaps).

-- Matthias