[Asrg] 2.a.1 Analysis of Actual Spam Data - next steps
"Peter Kay" <peter@titankey.com> Wed, 20 August 2003 17:31 UTC
Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA21304 for <asrg-archive@odin.ietf.org>; Wed, 20 Aug 2003 13:31:32 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pWnb-0008F8-1C for asrg-archive@odin.ietf.org; Wed, 20 Aug 2003 13:31:08 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h7KHV7v7031680 for asrg-archive@odin.ietf.org; Wed, 20 Aug 2003 13:31:07 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pWna-0008Et-UK for asrg-web-archive@optimus.ietf.org; Wed, 20 Aug 2003 13:31:06 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA21282; Wed, 20 Aug 2003 13:31:00 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19pWnY-00044T-00; Wed, 20 Aug 2003 13:31:04 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19pWnY-00044Q-00; Wed, 20 Aug 2003 13:31:04 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pWmX-0008AD-Bq; Wed, 20 Aug 2003 13:30:01 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pWlu-00089O-T6 for asrg@optimus.ietf.org; Wed, 20 Aug 2003 13:29:22 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA21192 for <asrg@ietf.org>; Wed, 20 Aug 2003 13:29:16 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19pWls-000438-00 for asrg@ietf.org; Wed, 20 Aug 2003 13:29:20 -0400
Received: from imail.centuryc.net ([216.30.168.20]) by ietf-mx with esmtp (Exim 4.12) id 19pWlr-00042y-00 for asrg@ietf.org; Wed, 20 Aug 2003 13:29:19 -0400
Received: from cybercominc.com [66.91.134.126] by imail.centuryc.net (SMTPD32-8.00) id A02EA100DA; Wed, 20 Aug 2003 07:30:22 -1000
Received: from a66b91n134client123.hawaii.rr.com (66.91.134.123) by cybercominc-zzt with SMTP; Wed, 20 Aug 2003 17:36:23 GMT
X-Titankey-e_id: <650110b2-42b1-42d9-a308-27c825c225f8>
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
Message-ID: <DD198B5D07F04347B7266A3F35C42B0B0D94FD@io.cybercom.local>
Thread-Topic: 2.a.1 Analysis of Actual Spam Data - next steps
Thread-Index: AcNnLdZ2gtjlrIv2SwuHO/2+bQxGGQADDTkg
From: Peter Kay <peter@titankey.com>
To: asrg@ietf.org
Content-Transfer-Encoding: quoted-printable
Subject: [Asrg] 2.a.1 Analysis of Actual Spam Data - next steps
Sender: asrg-admin@ietf.org
Errors-To: asrg-admin@ietf.org
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/asrg/>
Date: Wed, 20 Aug 2003 07:30:54 -1000
Content-Transfer-Encoding: quoted-printable
Content-Transfer-Encoding: quoted-printable
OK, gang, another summary here. So far, one person has volunteered to assist w/ the project in terms of "owning" an email address. Mahalo Nui to Selby Hatch for standing up. Terry and gang have done an admirable job of setting up the outline of a plan which I've edited/summarized below. Very much appreciated. So what we have now is 1 email owner volunteer and a plan that calls for 60 email addresses, possibly spread out across N domains. And one call to use 60 address per domain potentially gives us hundreds of addresses to deal with. And if we're going to do a Terry-resilient test, we will need to treat these email addresses as normally as possible, normal meaning that we open them, click on them, etc. Unless we get more people to agree to assist, we can't do this experiment. Last chance to stand up and help. If you're going to do this, do it now. The 8/21 deadline (for this project to gather up a sufficiently-sized group) is fast approaching and 1 volunteer ain't gonna cut it. Peter ============== Experimental Design ================ 550 Response Experiment What we're trying to determine is: will "hard bounce" handling of spam (such as 550 no such user) reduce the amount of "spam attacks" to a given email address over time versus an email address that does not employ such tactics? If you're confident of your stat background, skip this first paragraph and go straight to the bullet points. If not, then the background material in this paragraph provides context for the bullet points. Viewed thru the lens of inferential statistics, there's only two kinds of variance in the world: explained/unexplained, aka between-group/within-group, or systematic/random. Robust experimental design exerts as much control as possible over everything BUT the independent variable, which is the only thing that's allowed to vary systematically between the groups. Everything else that varies, however slightly, between the two conditions hurts chances of meangingful results. (If the extraneous variance is systematic, then the design is confounded, and the results--whatever they are--meaningless; if the extraneous variance is random, then statistical power is compromised.) - Ensure *crisp* separation of the independent variables. If the analytical goal is to study the effects of 550s, then have that be the *only* source of systematic variance. DO NOT "dilute" your systematic variance by confounding it with other variables (visibility, phase of the moon, eye color, etc.) - (Under the heading of "Hey doc, it hurts when I do this...") If daily spam volume is too noisy (and the DATA, not the statistician, "say" that it is), then pick a dependent measure that's more naturally noise-resistant (say, monthly spam volume, or even quarterly, if need be). Reliablility of initial measurement is always preferable to _post hoc_ "noise reduction." - Studiously ensure and maintain homogeneity of the experimental conditions throughout the course of the experiment. Mechanics: - Create an absolute minimum of 60 *pairs* of email addresses. (The "magic number 30" assumes data to be noise-free. Statistical power is a function of the number of "subjects," not the number of measurements.) The use of "otherwise identical" *pairs* of addresses allows a little more statistical power to be squeezed from the data at analysis time. If the one-TLD experiment uses 60 pairs of adresses, then a multi-TLD experiment must use 60 pairs for each TLD. Simple as that. Going to even a small number of TLDs (eg 3 TLDs) while keeping the original number of addreseses as you suggest is going to be a disaster if the TLD does have some effect, as it reduces the amount of data which can tell you about the effects of the 550 responses where they are the only independent variable by a factor of three. It would be helpful not to restrict the TLDs to those where English is thh prime language, as in the three you list. Maybe use .com, .uk, .fr, .de (plus .org and .net maybe). There are four potential gains to using several TLDs, provided that enough data is collected to make a valid experiment within each individual TLD. First, we can see whether the 550 method has different effects in different domains; second, we can get some idea of the effect of tld on spam volume (anecdotal evidence conflicts here, and I've seen no solid numbers); third, if the tld does in fact make no difference we have several times as much data to work with; fourth, if the 550 response does indeed have an effect we will be able to see if part of that effect is a reduction or increase in the unexplained variance. - Randomly assign each address in a pair to an experimental condition; the addresses in the experimental group never (repeat: never) do anything but throw 550s; addresses in the control group "take anything." * Cautionary aside: If it were me, I would zealously protect from the general public any knowledge of which addresses were in which experimental group. As proof against experimenter mortality, I'd ensure that 3 different people knew which-was-which, so that the study could continue if I got hit by a bus. But I'd also limit that knowledge to *just* those 3 folks. (In experimental-design jargon, the study is referred to as "blind.") - In a perfect universe, all addresses in both groups are served from one and only one mail server. That way, "server status" affects all address pairs in both groups identically. - Insofar as possible, ensure that each address within a pair achieves/receives "identical" visibility. Each control address within a pair should "shadow" its experimental counterpart as precisely as possible. * if one address signs up for a list, posts to a newsgroup, appears on a Web page, or whatever, the other one should do it too, on the very same day . Perhaps and acceptable automated approach would be to publish the addresses on the same newsgroups on the same day, and also publish the email addresses on the same Web site. - While waiting for time to pass, order a copy of Kanji, G. (1999). One hundred statistical tests. ISBN: 0-7619-6151-8 (This is a very handy "cookbook" that contains raw-score formulae for just about every inferential statistical test there is.) - At the end of the experiment, pull the pairs apart and compute a regression equation for each experimental group. - Some folks may recall me saying that the slope of the regression line is not intrinsically informative. (And it isn't; dispersion, not slope, expresses the degree of relatedness between regression variables.) However, the *difference between two slopes* of otherwise "identical" conditions can be informative. * if the beta weight of the regression line for the control group is smaller (even slightly) than that of the experimental group, stop. Fail to reject the null hypothesis and move on to something else. * Differences between the slopes can compared via t-test. If that difference-in-slopes doesn't make at least 0.01 (TWO-tailed), stop. Fail to reject the null hypothesis and move on to something else. (Remember, getting "doubles" when throwing a pair of dice is "statistically significant" at p=0.05.) - Having determined the "direction" of the effect, the magnitude of the effect can be estimated via paired t-test. Again, the goal is 0.01 or bust (though 1-tailed 0.01 is now "within reach"). _______________________________________________ Asrg mailing list Asrg@ietf.org https://www1.ietf.org/mailman/listinfo/asrg