Re: [Endymail] where's the end, was spam versus cleartext

"John R Levine" <johnl@taugh.com> Sun, 07 September 2014 18:27 UTC

Return-Path: <johnl@taugh.com>
X-Original-To: endymail@ietfa.amsl.com
Delivered-To: endymail@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 253C91A066B for <endymail@ietfa.amsl.com>; Sun, 7 Sep 2014 11:27:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.762
X-Spam-Level:
X-Spam-Status: No, score=0.762 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HELO_MISMATCH_COM=0.553, HOST_MISMATCH_NET=0.311, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QEcO5rGeSBxO for <endymail@ietfa.amsl.com>; Sun, 7 Sep 2014 11:27:32 -0700 (PDT)
Received: from miucha.iecc.com (abusenet-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:1126::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1B1D61A0664 for <endymail@ietf.org>; Sun, 7 Sep 2014 11:27:32 -0700 (PDT)
Received: (qmail 73874 invoked from network); 7 Sep 2014 18:27:31 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:cc:subject:in-reply-to:references:mime-version:content-type:user-agent; s=12091.540ca393.k1409; bh=DI106In9bMlo7woVfvtBv9Avh5rEs494x+KKEwjyw3g=; b=GSItyxZlKPzJ1j06gwjjFmSS0GqjbL4oHL4/Vg4bsq9EzZnPV6tDExBGLBf7a+B7WGPYhe35E/FQ0Pkpa7YLSpG3B0rdl/7UGE2M2V5JTJE9zpxBUJY4GlNst28lV+xKb9yNo8ToDj13wVj8FM9ezePEF3MUt0xIaEeacMgaquL75vwyxn6i/Shp3RU+n8tdcOnVYLTO5iz2PyIcwTHy4LlMn8C0PXPVJzzJB1dKfb/tzhDE3d3VhVIO3ZBG13HG
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=taugh.com; h=date:message-id:from:to:cc:subject:in-reply-to:references:mime-version:content-type:user-agent; s=12091.540ca393.k1409; bh=DI106In9bMlo7woVfvtBv9Avh5rEs494x+KKEwjyw3g=; b=7HWuVqfdp5dGoowrdUvPJSAzg0bQksyU2rzY8+8ryYH7Q5AWnJSRs4R0jAre08y25mmKDpQXhe4aLf6ZvWQ1NizAQHxjxWGPxfv9he4ZoJcNNZ9WVhN4v5qLoB7S/VZXCRqZmiSXbD0znBuTcIQTxZ0ilnVjhwV1EFQhYPJr3NKV8k6f4+xX+xRbAgKT76eA3ZX5AY7wHzZiyjUrCQQtW21Ey4wwEwcLerEyQ3BftS3Y4HaIi26jGmD9IY7li7OJ
Received: from localhost ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.0/X.509/SHA1) via TCP6; 07 Sep 2014 18:27:30 -0000
Date: 7 Sep 2014 14:27:30 -0400
Message-ID: <alpine.BSF.2.11.1409071424080.15242@joyce.lan>
From: "John R Levine" <johnl@taugh.com>
To: "Watson Ladd" <watsonbladd@gmail.com>
In-Reply-To: <CACsn0cmoZY7Peqashw-UEamtH5tWz0ohcRpBJCjg7ni2gBLxOw@mail.gmail.com>
References: <CB73389C55B1C9BC50D5E016@cyrus-3.local> <20140907175424.15182.qmail@joyce.lan> <CACsn0cm_xQriBp3cvvMHiAZM92KWeg2KJWfB7hUpQUAdQhasWA@mail.gmail.com> <alpine.BSF.2.11.1409071403410.15200@joyce.lan> <CACsn0cmoZY7Peqashw-UEamtH5tWz0ohcRpBJCjg7ni2gBLxOw@mail.gmail.com>
User-Agent: Alpine 2.11 (BSF 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Archived-At: http://mailarchive.ietf.org/arch/msg/endymail/OhyyY15Zq645ZHzQ2xRA2MAbQCc
Cc: endymail <endymail@ietf.org>
Subject: Re: [Endymail] where's the end, was spam versus cleartext
X-BeenThere: endymail@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <endymail.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/endymail>, <mailto:endymail-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/endymail/>
List-Post: <mailto:endymail@ietf.org>
List-Help: <mailto:endymail-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/endymail>, <mailto:endymail-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Sep 2014 18:27:33 -0000

>> I see you're a gmail user.  You have no idea how much spam Gmail is
>> rejecting or discarding before it ever gets anywhere near your inbox.
>
> It doesn't matter how much spam is rejected overall: what matters is
> how much ham is rejected, and how early the spam can be rejected, and,
> in the case of the phone, how much more tightly we can draw the
> filtering compared to a desktop machine. ...

Really, you cannot assume that the mail you get at your gmail account is 
typical of anything.  You also have no idea how much spam Gmail doesn't 
even put in your spam folder, based on content analysis they didn't tell 
you about.

> This can be easily quantified in the case of naive Bayesian
> classifiers, by looking at the entropy gain of each signal, and doing
> the usual sort of threshold picking analysis.

Um, have you ever talked to people who run large mail systems about the 
way their spam filtering really works? Many of us here have done so, 
and it's a lot more complicated than it might seem.

R's,
John