Re: [Asrg] Summary of junk button discussion

"Chris Lewis" <clewis@nortel.com> Sat, 27 February 2010 06:20 UTC

Return-Path: <CLEWIS@nortel.com>
X-Original-To: asrg@core3.amsl.com
Delivered-To: asrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 31A7428C387 for <asrg@core3.amsl.com>; Fri, 26 Feb 2010 22:20:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.458
X-Spam-Level:
X-Spam-Status: No, score=-6.458 tagged_above=-999 required=5 tests=[AWL=-0.016, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, SUBJECT_FUZZY_TION=0.156]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rlwfZ-qqgQup for <asrg@core3.amsl.com>; Fri, 26 Feb 2010 22:20:10 -0800 (PST)
Received: from zrtps0kp.nortel.com (zrtps0kp.nortel.com [47.140.192.56]) by core3.amsl.com (Postfix) with ESMTP id B487028C2B6 for <asrg@irtf.org>; Fri, 26 Feb 2010 22:20:03 -0800 (PST)
Received: from zrtphxs1.corp.nortel.com (zrtphxs1.corp.nortel.com [47.140.202.46]) by zrtps0kp.nortel.com (Switch-2.2.6/Switch-2.2.0) with ESMTP id o1R6MHD02335 for <asrg@irtf.org>; Sat, 27 Feb 2010 06:22:18 GMT
Received: from zrtphx5h0.corp.nortel.com ([47.140.202.65]) by zrtphxs1.corp.nortel.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 27 Feb 2010 01:22:02 -0500
Received: from [47.130.64.55] (47.130.64.55) by zrtphx5h0.corp.nortel.com (47.140.202.65) with Microsoft SMTP Server (TLS) id 8.1.340.0; Sat, 27 Feb 2010 01:22:02 -0500
Message-ID: <4B88BA09.7050700@nortel.com>
Date: Sat, 27 Feb 2010 01:22:01 -0500
From: Chris Lewis <clewis@nortel.com>
Organization: Nortel
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: asrg@irtf.org
References: <20100225054546.16850.qmail@simone.iecc.com> <4B86172D.2080702@nortel.com> <4B86AD93.1050800@tana.it> <4B86DD80.8060508@nortel.com> <4B87BD07.9000502@tana.it>
In-Reply-To: <4B87BD07.9000502@tana.it>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 27 Feb 2010 06:22:02.0761 (UTC) FILETIME=[2D017F90:01CAB775]
Subject: Re: [Asrg] Summary of junk button discussion
X-BeenThere: asrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
List-Id: Anti-Spam Research Group - IRTF <asrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/asrg>
List-Post: <mailto:asrg@irtf.org>
List-Help: <mailto:asrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/asrg>, <mailto:asrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2010 06:20:17 -0000

On 2/26/2010 7:22 AM, Alessandro Vesely wrote:
> On 25/Feb/10 21:28, Chris Lewis wrote:
>> On 2/25/2010 12:04 PM, Alessandro Vesely wrote:
>>> On 25/Feb/10 07:22, Chris Lewis wrote:
>>>> On 2/25/2010 12:45 AM, John Levine wrote:
>>>>
>>>>> The only think other than a junk button that appears useful is a
>>>>> not-junk button to display when looking at stuff in a junk folder. I
>>>>> suppose we could do that, but then we'd have to define what a junk
>>>>> folder is.
>>
>>> I don't think John meant a "general" definition here... :-/
>>
>> John seemed to be implying you can't have a "non-junk" button without a junk folder (eg: in an IMAP sense).
>
> That seems reasonable...
>
>> I was just pointing out Thunderbird's "not junk" implementation, which is functional independent of the existence of any kind of foldering mechanism, IMAP or otherwise.
>
> Yup. The JunQuilla extension makes that even more manageable. It is an
> interactive tool running on the end-user's box, though.

It doesn't have to be.

>> Heck, SpamAssassin even manages to tune Bayesian without having any end-user feedback at all.

> I never adventured into such esoteric settings. Are there howtos or
> any docs about it?

I think it's called "Autolearn".  I think it works by treating SA scores 
 > <threshold> as "spam", and scores < <possibly a different threshold> 
as "ham", and tunes Bayesian from that.  IOW: the existing SA rules 
refine Bayesian, and in the long term this allows Bayesian to 
cross-correlate across individual emails, and Bayesian score stuff that 
the SA rules don't necessarily even see.

>>> To recap, junk buttons can be embedded within a more sophisticated
>>> architecture (as for IMAP). But not the other way around: anti-spam
>>> filter training cannot (in general) be based upon junk buttons and
>>> abuse reporting.
>>
>> Of course you can train spam filters based on abuse reports. We've been
>> doing precisely that for 13 years in several different incarnations.
>
> Hm... I've been tinkering with my server's settings based on users'
> reports as well, but not automatically.

I've been doing it for years.

> There are various mechanisms,
> e.g. Vipul's Razor, that allow users to share their verdicts about the
> spamminess of a given message. However, in order to attach to junk
> buttons a meaning of "filter messages /like/ this" we would need to
> define what that means in rather unambiguous terms.

No, you don't.  That's up to the implementer of the report handler what 
it does.

Just as it is with Bayes.

Why are you treating this any different than spam/ham training in Bayes? 
  It's no different.

>> It may well make sense to include an "tickle IMAP" server as part of a
>> spec, but, also having an abuse reporting mechanism makes sure that you
>> have just about all implementations covered, IMAP or otherwise.

>> We could spec both, and leave it up to an installation or user to decide
>> which (or both) to use in any particular instance.

> I'd lean toward specifying just how to deliver abuse reports. Neither
> junk buttons nor their color should be mandated.

Who is trying to specify buttons or their color?

I'm aiming for a specification that permits a single <user action> to 
communicate upstream for _both_ filtering and reporting purposes, where 
whether it's used for filtering or reporting or both in any given 
instance is up to the site admin and/or end-user.