RE: [Asrg] 4. Survey of Solutions - Consent Model
gep2@terabites.com Tue, 15 July 2003 04:58 UTC
Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA02976 for <asrg-archive@odin.ietf.org>; Tue, 15 Jul 2003 00:58:52 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cHtS-0007S1-3z for asrg-archive@odin.ietf.org; Tue, 15 Jul 2003 00:58:26 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h6F4wQJO028641 for asrg-archive@odin.ietf.org; Tue, 15 Jul 2003 00:58:26 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cHtR-0007Rs-Af for asrg-web-archive@optimus.ietf.org; Tue, 15 Jul 2003 00:58:25 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA02900; Tue, 15 Jul 2003 00:58:20 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19cHtO-0001UD-00; Tue, 15 Jul 2003 00:58:22 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19cHtN-0001UA-00; Tue, 15 Jul 2003 00:58:21 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cHt2-0007Mb-Nj; Tue, 15 Jul 2003 00:58:00 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cHsQ-0007M9-N4 for asrg@optimus.ietf.org; Tue, 15 Jul 2003 00:57:22 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA02825 for <asrg@ietf.org>; Tue, 15 Jul 2003 00:57:18 -0400 (EDT)
From: gep2@terabites.com
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19cHsN-0001TG-00 for asrg@ietf.org; Tue, 15 Jul 2003 00:57:19 -0400
Received: from h001.c000.snv.cp.net ([209.228.32.65] helo=c000.snv.cp.net) by ietf-mx with smtp (Exim 4.12) id 19cHsM-0001TD-00 for asrg@ietf.org; Tue, 15 Jul 2003 00:57:18 -0400
Received: (cpmta 8453 invoked from network); 14 Jul 2003 21:57:18 -0700
Received: from 12.239.18.238 (HELO WinProxy.anywhere) by smtp.terabites.com (209.228.32.65) with SMTP; 14 Jul 2003 21:57:18 -0700
X-Sent: 15 Jul 2003 04:57:18 GMT
Received: from 192.168.0.30 by 192.168.0.1 (WinProxy); Mon, 14 Jul 2003 23:55:42 -0600
Received: from 192.168.0.240 (unverified [192.168.0.240]) by nts1.terabites.com (EMWAC SMTPRS 0.83) with SMTP id <B0000024653@nts1.terabites.com>; Tue, 15 Jul 2003 00:25:16 -0500
Message-ID: <B0000024653@nts1.terabites.com>
MIME-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Subject: RE: [Asrg] 4. Survey of Solutions - Consent Model
To: asrg@ietf.org
X-Mailer: SPRY Mail Version: 04.00.06.17
Content-Transfer-Encoding: 7bit
Sender: asrg-admin@ietf.org
Errors-To: asrg-admin@ietf.org
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/pipermail/asrg/>
Date: Tue, 15 Jul 2003 00:25:16 -0500
Content-Transfer-Encoding: 7bit
Content-Transfer-Encoding: 7bit
>>CONSENT - an expression of wanting to receive specific email >LACK OF CONSENT - an expression of not wanting to receive >specific email or absence of prior CONSENT I think that at various stages of incoming message triage there are various things that need to be done, and there is at least some natural ordering of those operations that ought to take place. First and foremost, I think there needs to be things that are strictly envelope/header related. There might be some specific and familiar senders whose messages are NEVER to be delivered, or whose messages are ALWAYS going to be delivered (although perhaps with subsequent modifications!), and there might be some messages (many, in practice) which can't be decided or handled on the basis of header alone. Sometimes individual message header lines will need examination and qualification. Second, messages often consist of multiple parts. Those parts can require individual, further attention... decoding, name and format testing, content scanning, or whatever. In some cases, a multipart message needs to have one or more parts removed; it's even possible that an originally multipart message might be reduced to a single-part message before releasing the message to the mail client software or ongoing MTA. Third, both within individual parts and within entire messages I think there needs to be provision to call external processing modules (which might be user or corporate-written) to provide additional decision-making options. These might be individual batch-type processes, or DLLs, or Web services, or any of a variety of other technologies. The important thing is that the processing of messages can be customized by user-written code in a multitude of ways. >>I agree with you that CONSENT has not been defined properly, I >am wondering >how we should redefine it properly.Maybe something like this: > >CONSENT - an expression of wanting to receive email from a >specific SENDER >LACK OF CONSENT - an expression of not wanting to receive email from a >specific SENDER or absence of prior CONSENT for that SENDER I think that "LACK OF CONSENT" needs to be further qualified with DENIAL (an expression of NOT wanting to receive mail from a specific, known sender) as opposed to simple "not (yet) authorized". >>However, we need to take into account filters which check not [only just] >>for specific senders, but rather for specific types of email. Perhaps the two >definitions above should be combined. > In our R&D of Message Sniffer we've developed a model for consent which fits very well in this discussion. I recommend that the ASRG adopt this generalization of our model for defining consent: > There seem to be really 4 cases, so perhaps CONSENT should be defined within these 4 cases: > 1. CONSENT - a direct expression of wanting to receive email from a sender. > 2. SOFT CONSENT - an indirect expression of wanting to receive email from a sender. I'm not sure I understand here (yet?) the point being made with "direct" versus "indirect" on these first two cases. Is #2 a "default - unrecognized sender" situation and #1 a known sender from which mail is ALWAYS desired (i.e. "whitelist")? > 3. SOFT DENIED CONSENT - an indirect expression of wanting not to receive email from a sender. > 4. DENIED CONSENT - a direct expression of wanting not to receive email from a sender. Likewise...? > NOTE: 2 and 3 are required to handle anonymous senders. I think that 2 and 3 also might be required to handle cases where a qualitative decision is made (i.e. messages not specifically "whitelisted" or "blacklisted"). But there are further cases that you're not considering here, and that is at least equally important. It's not just whether a message is actually desired for delivery, but also how to handle cases where a message is NOT to be delivered to the original addressee. Is the message to be simply blackholed? Is a forged "destination mailbox unknown" reply to be returned? Is a polite reply to be returned requesting that the message be resent without attachments or HTML or encoding or whatever? Is the message simply to be bounced back? > In the above also: > Direct Expression = an explicit white rule or black rule. > Indirect Expression = expression by the evaluation of some mechanism chosen by the recipient including any kind of filtering engine. Ah, okay. > Sender = defined by any combination of Sender IP or email route, Might it be useful to provide for coherency of message routing? e.g. a message with a From: address of HOTMAIL.COM or AOL.COM but which has passed through a mail server or relay (say) in China or Korea or some other distant country? Or, say, a message with wildly incoherent dates in Received: headers or Date: header? > Sender address, or other Authentication mechanisms. The recipient may define any of these that may be required to define a particular sender, and may also define which authentication mechanisms (if any) are acceptable. Note that this might not be only based on the headers. In particular, one might require that a message from a certain original sender MUST be signed with that sender's specific sig file or PGP key or something in order to be considered "authenticated". Certain senders (say, specific Yahoogroups mailing lists) for instance might NEVER legitimately contain attachments. If someone spoofs a popular Yahoogroups mailing list as the "sender" but actually still sends an attachment (which that group should NEVER be sending) then that's evidence of a forged From: address and/or an unauthenticated sender. > If all of the above are acceptable then a policy of consent could be established and utilized in a very clear way: > FIRST: Each inbound message is evaluated first against the sender policy to define "Sender" for the sake of evaluation. This definition may include the "Unknown Sender" which would limit policy evaluation to "SOFT" policies (2 and 3 above). Note also that "authentication mechanisms" may be defined by the recipient to support DNSbl or other services such as Bonded Sender mechanisms, or any other mechanisms that may arise. > SECOND: The message is then evaluated against the consent policy to define the case that matches the message. This includes (in case 2 and 3) the application of any evaulation mechanisms that the user may define to evaluate the content of the message or evaluate it's other characteristics. > THIRD: A specific action mapped to the identified case in the policy should be executed. For example, reject the message, submit the message to some process, redirect the message to some mailbox, some combination of actions. > As SOFT evaluations can be difficult to quantify and must be open to new mechanisms that become available in future I recommend that a "Consent Definition Language" be developed that provides for specific actions based on the evaluation of the message against the policy, and that in SOFT cases (2 and 3) the "Consent Definition Language" be extensible to take into account results that may be returned from the soft evaulation mechanisms. In the ultimate case of this, you really end up wanting to write a program... do we need yet another programming language? Personally, I'd favor the use of SPITBOL for stuff like this... it's probably about the most powerful language there is for text processing and pattern recognition and data structure manipulation... and that's what this whole process is really all about. SPITBOL has the additional nice property that one can bring in new program segments dynamically according to stages already passed or decisions already made... so that the program can be easily extended at runtime to add new rules or whatever, based on (say) specific senders or specific types of message content. It's hard to imagine how one would devise a specific "Consent Definition Language" that doesn't end up being in essence a "programming language" (and that's NOT a bad thing, necessarily, but I'd hate to see the "standard" set to use a specific language, especially if that language ends up being less satisfactory for this use than something already existing such as SPITBOL or whatever). > For example, some "tests" may return weights, others may return probabilities, others may return categories of content, others may return specific heuristics that fail. Sure. And some processes might need the intermediate results resulting from prior processing. Others might be independent, and in that case it would also be nice to allow multiple tests on the message to perhaps proceed in parallel (multitasking/multiprocessing/whatever) so as to reduce overall time spent in processing each message. > However, in general there structure of a working policy model tends to be hierarchiacal so an XML based framework can be very efficient. I've heard XML called a LOT of things although I don't think that I can remember too many times that it was accused of being "efficient". :-) Just because XML is presently trendy is NO reason IMHO to impose that degree of overhead onto anything as core as E-mail processing. In another list I'm on we've been discussing the XML overhead/performance issues and found that a typical situation results in XML record descriptors taking twice as much time (or more) than a simple delimited data representation. (And I think that it often can take a LOT more than even that...) > As XML is naturally extensible then so would a "Consent Definition Language (CDL)" based on XML. Honestly, there is nearly NOTHING that is especially more "extensible" about XML than there is with MANY other data representations. I have a number of objections to XML in principle, largely based on the fact that individual field names have to be parsed again and again for EVERY record handled... and not just field names, but also the higher-level issues of ensuring that every required field is present for every record... > In the interests of sharing, proofing, and evaluating CDL based policies there should be a mechanism for defining standarized representations of tests. Ultimately, again, this is going down a slippery slope to defining a new programming language... and I simply don't think that's necessary here. There are a number of languages that could be used, from primitive languages like C to braindead RegEx-based things like Perl or "real" pattern-matching languages like SNOBOL or SPITBOL. Ultimately, I suspect that individual implementations are likely to be done in whatever language the implementor is most comfortable with. That's perhaps the way it should be. Despite the fact that I *personally* think that SPITBOL would be *wonderful* for writing stuff like this, I recognize that a lot of people aren't familiar with it and would probably pick Perl or something instead just because that's all they know. I'm not even sure, really, that we have to go all that far in terms of defining what the actual consent definition language or corresponding data representations are... I'm not all that convinced that we'll ever see (or even that we SHOULD) a single standardized worldwide agreement for stuff like this, and different mail filtering systems and tools are likely to develop their own approaches and techniques. (And if someone does a distinctly "better" one, hopefully it will win out even over a "standard" one.) > For example, specific DNSbl tests that are "well known" may have names names defined for them where those names would be adopted by the community in the same way well known ports are adopted for services. I think it probably makes more sense to simply provide a mechanism (or better, several) for calling external processing units. Then the script (or whatever) can add whatever steps a person wishes. Again, though, a lot of these are implementation issues within a particular filter; I'm not sure we have to produce anything to that level. > Any such namimg conventions should be an enhancemnet rather than a requirement in the CDL. For example, if a recipient wishes to leverage a DNSbl (or other service) that does not have a "well known name" then the definition of the test in the policy should be clear and consistent and no more difficult to implement in the CDL than any other DNSbl. > Similar guildelines should be in place for the implementation other SOFT mechanicsms that might be used for: authentication (defining the sender), or developing SOFT CONCENT (such as filtering systems such as Message Sniffer, Spam Assassin, Bogo Filter, and others...) Good examples of external procedures which might be used within the consent model. > Based on personal experience, the framework defined above _should_ be able to encoumpass all of the current and proposed mechanisms used for curbing abuse without significant difficulty or complexity. Perhaps, although it sounds awfully complex to me (and specifically I really don't see why we need to jump onto the currently-trendy XML bandwagon here). The real issue, I think, is how far we're going to go toward writing the actual filtering application as part of the consent model standard (and even, for that matter, whether we NEED a standard consent description). Even just simple "whitelists" or "blacklists" don't always tell the story... for example, I might have a Yahoogroup I'm a subscriber to but that group (which I might whitelist) should **never** send me a message containing (say) an executable attachment. If it does, I definitely want to (at a minimum) trash the untrusted attachment. Likewise, the mere presence of a blacklisted domain reference in a message may not be enough to justify t-canning the message... for instance, the messages I get from the suespammers.org domain might refer to a particularly heinous spammer or quote from one of their E-mail spams, and I wouldn't want to t-can the message just because of that. I guess I personally feel that what we need to do more is to establish that there are certain broad areas that will typically be used to perform triage on incoming E-mails, whether at the user level or at the domain or ISP service level. These areas include header-level coherency and tests (acceptable user identity, no routing through known open relays, etc) as well as content-based tests (no HTML-burdened content, no obscured URLs, no bogus HTML tags, no obscured content tricks, no embedded images, no known-disreputable URLs or domains or IP addresses, no attachments (or maybe no executable attachments) etc etc) and that we need to provide for different sender-specific rulesets for specific authenticated familiar senders, specific familiar disreputable senders, and unfamiliar senders. I still think that it is absolutely essential that HTML-burdened content (or at least large classes of frequently-abused HTML) and presence of attachments or encoded message text should be offered as an optional (and probably recommended!) cause for denial of delivery of messages from unfamiliar senders. Gordon Peterson http://personal.terabites.com/ 1977-2002 Twenty-fifth anniversary year of Local Area Networking! Support the Anti-SPAM Amendment! Join at http://www.cauce.org/ 12/19/98: Partisan Republicans scornfully ignore the voters they "represent". 12/09/00: the date the Republican Party took down democracy in America. _______________________________________________ Asrg mailing list Asrg@ietf.org https://www1.ietf.org/mailman/listinfo/asrg
- [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- Re: [Asrg] 4. Survey of Solutions - Consent Model Selby Hatch
- Re: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- RE: [Asrg] 4. Survey of Solutions - Consent Model Madscientist
- RE: [Asrg] 4. Survey of Solutions - Consent Model Bob Wyman
- RE: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- RE: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- Re: [Asrg] 4. Survey of Solutions - Consent Model Walter Dnes
- Re: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- RE: [Asrg] 4. Survey of Solutions - Consent Model Eric Dean
- RE: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- RE: [Asrg] 4. Survey of Solutions - Consent Model Madscientist
- Re: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- Re: [Asrg] 4. Survey of Solutions - Consent Model Yakov Shafranovich
- RE: [Asrg] 4. Survey of Solutions - Consent Model gep2
- RE: [Asrg] 4. Survey of Solutions - Consent Model Pete McNeil
- Re: [Asrg] 4. Survey of Solutions - Consent Model Andrew Akehurst