[Asrg] Brad Templeton's C/R Guidelines

Yakov Shafranovich <research@solidmatrix.com> Tue, 27 May 2003 16:52 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA19397 for <asrg-archive@odin.ietf.org>; Tue, 27 May 2003 12:52:05 -0400 (EDT)
Received: (from mailnull@localhost) by www1.ietf.org (8.11.6/8.11.6) id h4RGpeP02601 for asrg-archive@odin.ietf.org; Tue, 27 May 2003 12:51:40 -0400
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4RGpdB02598 for <asrg-web-archive@optimus.ietf.org>; Tue, 27 May 2003 12:51:39 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA19381; Tue, 27 May 2003 12:51:34 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19KheE-00065z-00; Tue, 27 May 2003 12:50:02 -0400
Received: from ietf.org ([132.151.1.19] helo=www1.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19KheD-00065w-00; Tue, 27 May 2003 12:50:01 -0400
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4RGnaB02527; Tue, 27 May 2003 12:49:36 -0400
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4RGmhB02495 for <asrg@optimus.ietf.org>; Tue, 27 May 2003 12:48:43 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA19348 for <asrg@ietf.org>; Tue, 27 May 2003 12:48:38 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19KhbO-00065e-00 for asrg@ietf.org; Tue, 27 May 2003 12:47:06 -0400
Received: from 000-231-222.area5.spcsdns.net ([68.27.142.83] helo=68.27.142.83) by ietf-mx with smtp (Exim 4.12) id 19KhbH-00065b-00 for asrg@ietf.org; Tue, 27 May 2003 12:47:00 -0400
Message-Id: <5.2.0.9.2.20030527124403.00ba6508@pop.pocketmail.com>
X-Sender: research@solidmatrix.com
X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9
To: asrg@ietf.org
From: Yakov Shafranovich <research@solidmatrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-MimeHeaders-Plugin-Info: v2.03.00
X-GCMulti: 1
Subject: [Asrg] Brad Templeton's C/R Guidelines
Sender: asrg-admin@ietf.org
Errors-To: asrg-admin@ietf.org
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/pipermail/asrg/>
Date: Tue, 27 May 2003 12:48:00 -0400

Here is a list of C/R guidelines compiled by Brad Templeton, who wrote one 
of the early C/R systems (from 
http://www.templetons.com/brad/spam/challengeresponse.html). To summarize:

o Never challenge any mail that's a reply to a private message you sent.
o Avoid challenging replies to public messages
o Use multiple addresses
o Never challenge mailing list mail
o Never challenge a challenge!
o Make the "From" on your challenge match the address mailed to
o Put an In-reply-to header on your challenge
o Include the subject of the original message in the challenge
o Present a regular summary of all blocked mail
o Make the challenge as easy as you can make workable.
o Don't force users to re-send mail
o Detect all attempts to subscribe to mailing lists
o Detect mailing lists subscribed to in the user's mail archives
o Detect patterns of possible incoming mailing lists
o Think about anonymous E-mail


-----------snip-------------
Proper principles for Challenge/Response anti-spam systems

Back in 1997 I wrote what is probably the first of the challenge/response 
(C/R) spam-blocking systems. These are systems that, when they see an 
E-mail from somebody you've never corresponded with before, hold the mail 
and e-mail back a "challenge" to confirm that the person is a real sender 
and not a mailing robot, in particular a spammer. The other person gets the 
challenge, and responds to it in some way. If they do this properly, your 
system releases the mail that was held, and from then on they can mail 
without challenge.

There are a number of these systems springing up -- it's a very effective 
system and a fairly obvious idea -- but not everybody is doing it right, so 
I thought I would lay out some "best practices" based on my 6 years of 
experience. I don't even do all of these things, because I wrote my system 
before they became necessary, but if I were writing a new version, I would.
Never challenge any mail that's a reply to a private message you sent.

If you send somebody private mail (from any address you have), and they 
reply to you with any mailer, you should accept their mail and not send 
them a challenge. This is true even if they reply from a different address 
than you sent the mail to. Many people have mail aliases, and receive mail 
on one address and send on another. Some people use other anti-spam systems 
that generate new addresses every time they mail.

What this means is that simply whitelisting all addresses you mail to is 
not enough, though it is of course an important thing to do.

One of the easiest ways to do this, by the way, is to have multiple 
addresses yourself. Send out private mail with an address that does not do 
challenges. An old fashioned unfiltered E-mail box (though you may want to 
note the addresses on incoming mail to whitelist them.) However, you must 
be sure this address won't get out to spammers, or you will have to switch 
it to another. (You must be prepared to do this.)

In general, you should probably put an un-challenged address on business 
cards too. Save filtered addresses for public use. Postings to mailing 
lists, listings on web pages, listings in conference directories, etc.
Avoid challenging replies to public messages

If you can do it, avoid challenging replies to your public messages to 
mailing lists and newsgroups. With private mailing lists (not archived in 
public) you can of course accept any replies with reasonable safety based 
on subject line and in-reply-to. With public postings, consider accepting 
replies unchallenged for a few days to weeks after postings, then add a 
challenge for late replies which are more likely to be spammers.
Use multiple addresses

Any good spam filtering system will support giving the user multiple 
aliases under which to receive mail. This has two functions. One, you can 
filter some aliaes more than others. For example, you might have "public" 
addresses used in newsgroup postings and on a web site, and private 
addresses used only in mails to private parties, replies etc. You would use 
less filtering, and perhaps no challenge/response, on private addresses.

It's also handy to provide a gamut of addresses to use so that you can use 
a different address every time you give out an address. For example, if 
entering data on a web page that asks for your E-mail address, use a 
different one each time. That way if any address gets on spammer's lists, 
you can delete it or give it very high spam filtering with minimal risk to 
mail from others.

The best plan is to have your own subdomain for mail, allowing an infinite 
space of addresses. However, if that is not available to you, sendmail 
treats mail to "userid+anything" as mail to the given userid. For example, 
if a sendmail user has the address fbaggins@shire.org, then 
fbaggins+ring@shire.org and fbaggins+bagend@shire.org and all other such 
addresses will be delivered to the main address. Qmail does a similar 
system, using a dash instead of a plus. That's better, since unfortunately 
there are huge numbers of badly coded web forms that, because they map "+" 
to a space, don't accept fully legal e-mail addresses with a plus in them.

The personal domain is also best because spammers can easily guess the root 
address on a plus-sign based address. If you use this, you must have filter 
the base address, and have unfiltered addresses use the plus.

Some systems generate a new address for every mail sent, using a special 
random string in the address itself. Some use a cryptographically secure 
hash to generate the string so they can immediately identify any address 
they identified without having to remember them.

Be aware, however, that in generating many addresses, you may mesh badly 
with other whitelist systems expecting your mail to come from the same 
address. One option is to use the base address in the "From:" and put any 
generated address, especially an unfiltered one, in the Reply-to. Beware 
that there are mailers that botch Reply-to out there.
Never challenge mailing list mail

For decades, all good mail responders have known not to respond to mailing 
list mail. An unofficial standard has indicated that bulk mail of various 
forms would have a header like "Precedence: bulk" or sometimes "Precedence: 
list" to mark it as bulk. "Precedence: junk" is rarely used for it would 
declare things to be spam!

You can also test to see that none of the addresses in the "To" and "CC" 
lines is an address for the person getting the mail, though that does 
present a maintenance problem since there is no automatic way to know all 
those addresses. However, you definitely should not challenge any mail with 
the above precedence headers.
Who to challenge?

There are three possible addresses you can challenge. They are the 
"Envelope From", the regular "From:" and the address in a "Reply-to:" header.

Most merit points to challenging the Envelope From, which is the address 
you would send bounce errors to. The "From:" is the person who wrote the 
message (and thus in most cases, though not all, the person you are trying 
to confirm is a human being.) The "Reply-to" is the address that the sender 
expected actual replies to the mail to go.

Unfortunately, you definitely should not challenge more than one of these.

A challenge is similar to a bounce error, but unfortuantely in many cases 
it is not handled by a human -- it was in fact designed to be not handled 
by a human. Most such cases are list mail, which you should not be 
challenging at all. In the case of list mail, the Envelope-From always 
identifies the list manager itself, not the particular poster to the 
mailing list. Sometimes it is a unique address, so that programs can 
automate detecting bounces without having to parse them to try to figure 
out what mail bounced.

The From is often the actual person who posted to the mailing list, or the 
real sender of a person to person mail. Some lists have all mail come 
"from" the list manager, however. Some lists have the list address be in 
the Reply-to.

You must not challenge individual mailers to a list, so only challenge the 
 From or Reply-to when you are sure it is not list mail. If you challenge 
individual mailers you'll get bounced of the list very quickly.

The answer here thus depends on how good your detection of list mail is. If 
it's reliable, you may decide to challenge the From or Reply-to, since that 
is more assured to be a human. On the other hand, challenging the 
Envelope-From has many merits. The worst case is that it's not a human (or 
it's a list that is not tagging list mail as such) and this mail will 
appear in the digest, hopefully near the top.
Never challenge a challenge!

The other person might have a C/R program or a whitelist.
Make the "From" on your challenge match the address mailed to

When they send out their mail they will have whitelisted the address they 
sent to, so any challenge From that address should get through.
Put an In-reply-to header on your challenge

The challenge should refer to the message-id of the mail being challenged. 
A good whitelist program should remember the message-id of every mail the 
user sends out, and every challenge sent out. If a challenge comes back 
with an in-reply-to, you can identify it as a valid challenge. In the end, 
this may become the main technique, once spammers try to guess the names of 
your friends and send spam disguised as challenges. They can't fake this 
message-id.

The other reason to record the outgoing message-id is to be sure you never 
challenge anybody replying to mail you sent out. If mail has an in-reply-to 
that matches an outgoing message-id of a private mail of yours, you let it in.
Include the subject of the original message in the challenge

C/R programs should also log outgoing subjects, so that they can detect 
replies (and challenges) to the user's messages.
Present a regular summary of all blocked mail

No system is perfect, so the system must present a summary on some 
reasonable interval, of mail that was blocked by the system. This would 
include mailing list mail that was unchallenged, and mail to which the 
challenge was never responded.

This should be presented as a summary digest, which allows a quick scan of 
all these messages. The summary should show a minimal set of relevant 
headers (From, To, Subject, CC etc.) and a few lines from the body. It 
should also show a "spam score" calculated for the message, and the digest 
should be sorted by spam-score, so the lowest scores appear at the top.

With each message in the digest, the user should be able to select the 
message to define what to do with it, including delivering it, whitelisting 
the sender, whitelisting the mailing list it came from, and combinations. 
It can also offer options like blacklisting the sender, tuning the 
spam-score, and reporting the spam to collaborative filters.

Any existing spam scoring system can be used. The fact that the challenged 
address did not exist or the mail to it bounced may give a high spam-score, 
but one should be wary of the affect of this on anonymous mail.

The summary can be e-mailed every so often (once a day typically, or less 
frequently for people who read mail less frequently) or a web option should 
be available to see the latest summary. Normally messages would not appear 
in the summary until they have had some period of time to get a response to 
the challenge -- typically a daily digest will have the prior day's 
messages in it.

This step is vital. If this is not done, users will miss mail for mailing 
lists they joined, mail from people who decide not to answer challenges, 
and mail from people whose mail software is incompatible with the challenge.
Understand mail/postings to public vs. private addresses

As noted, the best practice is to use an address that does not have C/R on 
mail to private parties. It is important however to use a C/R filtered 
address if the mail/posting will go out in public. This includes all 
newsgroup postings, and any mail to mailing lists which have public 
archives. An ideal system would modify outgoing mail, using a non-filtered 
address on private mail, and a public address on mail that may be exposed 
in public.
Make the challenge as easy as you can make workable.

Spammers are not currently trying to rake responses to spam challenges, but 
they will. Until they do, asking for any reply at all actually works well 
as a challenge. Once they do, challenges must require some special action 
from the responder, something to prove they are human. Even so, try to make 
it as easy as possible, and provide several means of responding to the 
challenge.

For example, send your challenge as a multipart/alternative with plain text 
and HTML. In both, include a link the user can click on to make their 
response via a browser. However, since many people read mail offline or 
without a browser handy, always allow the response to come in E-mail.

Don't require the user to be online to see the challenge, ie. don't use 
inlined image files unless absolutely necessary.

While the challenge must come "From:" the address that was mailed, it can 
have a Reply-to that sends the response to a specific handler with a unique 
address that lets you know what challenge is being answered. Since some 
users will not deal properly with the Reply-to, it is advised you also 
detect responses at the address which was in the From: of the challenge. In 
your challenge, put a magic token in the Subject line, Message-id and body, 
and if that token appears in any part of the response -- Subject, 
In-Reply-To or body, you will be able to identify the response, no matter 
what address it comes from.

If you ask the user to answer a question, be as forgiving as possible i 
finding it in the body or subject of the response. If the user makes a bad 
response, give them an error to know their mail is not yet delivered.
Don't force users to re-send mail

Some challenges indicate the original mail was not delivered, and ask the 
user to send it again. Users will balk at this, and if they felt they were 
doing the recipient a favour (such as answering a question they asked in a 
public forum) they often will not bother to jump through any hoops to 
respond to challenges or re-send mail. You must make it as easy as possible.
Detect all attempts to subscribe to mailing lists

Watch outgoing mail and look for any attempts by the user to subscribe to a 
mailing list. This includes mail to "-subscribe" or "-request" addresses 
especially with "subscribe" in the subject or at the start of a line in the 
body. Try to understand the subscribe requests of most major mailing list 
systems, such as majordomo, listserv, topica, yahoo egroups, etc.

When the user subscribes to a list, you need to identify the list and 
whitelist it.

You can subscribe to lists via the web, though many then do a 2nd 
confirmation of the subscribe -- usually also by web -- which you may be 
able to look for. You must also avoid challenging these confirmations, even 
though they will not come with a Precedence bulk. In some cases users may 
have to avoid signing up for lists via the web without telling the C/R system.
Detect mailing lists subscribed to in the user's mail archives

Most C/R systems do a pre-scan of the user's archived mail folders, 
outgoing and incoming, as well as address books, to whitelist all proper 
correspondents in advance. Detect the presence of mailing lists in these 
archives to whitelist them in advance. You can't challenge mailing list 
mail so this is important. You will need to extract the Envelope From, as 
opposed to the "From:" header, in many cases, to properly spot mailing 
lists. Of course, you must avoid scanning spam to avoid whitelisting it.
Detect patterns of possible incoming mailing lists

Fortunately most spammers don't actually maintain real mailing lists that 
send multiple mailings to a user with the same Envelope From, and they 
don't use Precedence headers. You should, however, look for patterns in 
these headers on incoming list mail. (List mail to be identified by 
Precedence header and lack of the user's address in To/Cc headers.)

For example, if you get a sudden surge of messages, all with the same 
Envelope-From for the target user, this may be a mailing list the user has 
subscribed to. This is especially true if the messages have low spam scores.

In this event, consider placing a special note at the top of the digest 
summary, or in a special message, saying something like, "You have recently 
received 6 mailing list messages from a list identifying itself as XYZ" and 
provide a means to say they wish to whitelist the list or perhaps blacklist 
it. If they whitelist it, deliver the mail. Give them a way to examine the 
potential list mail.

This is needed because you won't catch every mailing list subscription they 
do. Especially since in many cases you can subscribe to lists via the web.

Be warned however, that some mailing list managers put magic tokens in the 
envelope-from, to more easily track bounces. However many popular list 
managers also put in special "list" headers that help you identify the 
list. This includes headers like List-ID, and a "Sender" header.
Think about anonymous E-mail

Anonymous E-mail is still a useful thing. In part, you allow it by 
providing the daily digest of mail that was unresponded, with low spam 
scores coming first. Of course two-way remailers let you send a challenge 
and get a response by E-mail. If you insist on response by web you make it 
a little harder. Offering both lets the anonymous mailer select the best 
way to protect her identity.

Other systems (e-stamps etc.) which may not work on their own can have 
application to allow anon mailers to get through C/R systems.
Spammers may try to fake the things you detect

Spammers will eventually try to fake out all things you look for in order 
to avoid challenging or filtering e-mail. However, they will not do this 
right away. Since all things you do that make it harder for mail to get in 
will increase your risk of blocking desired mail, don't apply any stricter 
test until it actually becomes necessary.

Among the tests I have listed here, risks exist in the following areas.
Spammers will eventually try to guess what mailing lists you are on, or 
what correspondents you have whitelisted, and they will forge mail to 
appear like that. This is especially true with any publicly archived 
mailing list you post to. Lists will eventually need digital signature if 
this attack becomes common.
If you allow replies to your messages to come in based on subject, then 
spammers will form replies to your public messages. To avoid this, you may 
wish to allow unchallenged replies only for a limited time on public messages.
Try to be liberal at first, and only close down when spammers abuse the 
liberty. Don't try to prevent something that's not yet happening if it has 
a risk of blocking legitimate mail.

C/R may, over time, lose its utility if most spammers try to target it 
directly. However, it still has several years of life. It can also be 
combined with other techniques. For example, if you have a good spam 
filter, you might decide to challenge only messages with high spam scores 
or other reasons to suspect they are spam, and let through other mail.
-----------snip-------------

_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg