Re: [Asrg] Adding a spam button to MUAs

Jose-Marcio Martins da Cruz <> Mon, 21 December 2009 14:33 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DB3F83A6767 for <>; Mon, 21 Dec 2009 06:33:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.414
X-Spam-Status: No, score=0.414 tagged_above=-999 required=5 tests=[AWL=0.093, BAYES_40=-0.185, HELO_EQ_FR=0.35, SUBJECT_FUZZY_TION=0.156]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id qkU2KlMk0lz8 for <>; Mon, 21 Dec 2009 06:33:12 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id D299F3A67D9 for <>; Mon, 21 Dec 2009 06:33:11 -0800 (PST)
Received: from localhost.localdomain ( []) (authenticated bits=0) by (8.14.3/8.14.3/JMMC-11/Feb/2009) with ESMTP id nBLEWpXL007707 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <>; Mon, 21 Dec 2009 15:32:51 +0100 (MET)
Message-ID: <>
Date: Mon, 21 Dec 2009 15:32:41 +0100
From: Jose-Marcio Martins da Cruz <>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20090908 Fedora/1.1.18-1.fc11 SeaMonkey/1.1.18
MIME-Version: 1.0
To: Anti-Spam Research Group - IRTF <>
References: <alpine.BSF.2.00.0912082138050.20682@simone.lan> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Miltered: at boipeva with ID 4B2F8713.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 4B2F8713.000/<>
Subject: Re: [Asrg] Adding a spam button to MUAs
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To:, Anti-Spam Research Group - IRTF <>
List-Id: Anti-Spam Research Group - IRTF <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 21 Dec 2009 14:33:13 -0000

Ian Eiloart wrote:


>> These error rates are most of the time bigger than what can be achieved
>> by spam filters. So it's probably a bad idea to consider that user
>> feedback is reliable. User interface shall be as simple as possible.
> But wait, we're talking about messages that the spam filter hasn't 
> rejected. Additional data *has* to be useful.

Sure, I've just mentionned that user feedback isn't reliable. There are many reasons. One thing to
do to mitigate user feedback lack of reliability is to make user interface as "good" as possible.

> The false positive rates are only a problem if the admin stupid enough 
> to consider a single report as definitive. If you deliver a message to 
> 100 users, and three report it as spam, then you probably take no 
> action. If 20 report it as spam, then you need to take a closer look.

Agree, but if 20 report it as spam, but if the other 80 reported it as ham or if they didn't
reported anything these are different situations.

> I certainly don't think a 7% error rate is enough to determine that 
> users should not be given the opportunity to distinguish between 
> unwanted mail and reportable junk.

Sure but... I don't think about a binary decision : using or not user feedback, nor trying to 
correct user feedback till have a perfect clean feedback. The idea behind Cormack paper is to handle 
user feedback the same was as a noisy signal. Try to correct it trivial errors, but accept existence 
of noise and make your system as insensitive as possible to noise.

I made an experiment with some users recently. In this particular experiment, I noticed some obvious 
things : people are concerned about wanted and unwanted mail - ham and spam isn't their problem; 
appreciation for wanted/unwanted messages varies not only from user to user but also depends on the 
moment. The latter means that, at different times, the same messages won't be judged the same way by 
the same user. Also, error rates aren't the same for different kind of messages, even inside the 
same class.

I don't generalize this and, unfortunately, I don't have numbers reliable enough to publish, but I'd 
like to have. I don't believe in opinions which aren't supported by reliable numbers.

> You can also combine reporting rates with your bayesian content analyser 
> or spamassassin score, or with your reputational score for the sender 
> domain, etc.

Sure, but even after that, you still won't have a perfectly clean user feedback.