RE: [Asrg] 4. Survey of Solutions - Consent Model

"Pete McNeil" <madscientist@microneil.com> Tue, 15 July 2003 14:35 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA21975 for <asrg-archive@odin.ietf.org>; Tue, 15 Jul 2003 10:35:39 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cQte-0000v0-Bd for asrg-archive@odin.ietf.org; Tue, 15 Jul 2003 10:35:14 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h6FEZEod003528 for asrg-archive@odin.ietf.org; Tue, 15 Jul 2003 10:35:14 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cQtd-0000up-H8 for asrg-web-archive@optimus.ietf.org; Tue, 15 Jul 2003 10:35:13 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA21964; Tue, 15 Jul 2003 10:35:07 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19cQta-0002VU-00; Tue, 15 Jul 2003 10:35:10 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19cQtU-0002VR-00; Tue, 15 Jul 2003 10:35:04 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cQtR-0000rl-F9; Tue, 15 Jul 2003 10:35:01 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19cQsv-0000qd-4n for asrg@optimus.ietf.org; Tue, 15 Jul 2003 10:34:29 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA21934 for <asrg@ietf.org>; Tue, 15 Jul 2003 10:34:23 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19cQss-0002VE-00 for asrg@ietf.org; Tue, 15 Jul 2003 10:34:26 -0400
Received: from mnr1.microneil.com ([216.88.36.96]) by ietf-mx with esmtp (Exim 4.12) id 19cQsh-0002Uw-00 for asrg@ietf.org; Tue, 15 Jul 2003 10:34:15 -0400
Received: by mnr1.microneil.com (Postfix, from userid 93) id B36C729C077; Tue, 15 Jul 2003 10:33:25 -0400 (EDT)
Received: from MicroNeil.com (mail.microneil.com [216.88.36.161]) by mnr1.microneil.com (Postfix) with ESMTP id 4656E29C075 for <asrg@ietf.org>; Tue, 15 Jul 2003 10:33:25 -0400 (EDT)
Received: from MNR01 [216.88.36.10] by MicroNeil.com with ESMTP (SMTPD32-6.05) id A09A25FB0112; Tue, 15 Jul 2003 10:32:58 -0400
From: Pete McNeil <madscientist@microneil.com>
To: asrg@ietf.org
Subject: RE: [Asrg] 4. Survey of Solutions - Consent Model
Message-ID: <003401c34add$fc080e00$0a2458d8@MNR01>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook, Build 10.0.2627
In-reply-to: <B0000024653@nts1.terabites.com>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300
X-Declude-Spoolname: D109a112.SMD
Content-Transfer-Encoding: 7bit
Sender: asrg-admin@ietf.org
Errors-To: asrg-admin@ietf.org
X-BeenThere: asrg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=unsubscribe>
List-Id: Anti-Spam Research Group - IRTF <asrg.ietf.org>
List-Post: <mailto:asrg@ietf.org>
List-Help: <mailto:asrg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/asrg>, <mailto:asrg-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/pipermail/asrg/>
Date: Tue, 15 Jul 2003 10:32:57 -0400
Content-Transfer-Encoding: 7bit
Content-Transfer-Encoding: 7bit

>I think that at various stages of incoming message triage 
>there are various 
>things that need to be done, and there is at least some 
>natural ordering of 
>those operations that ought to take place.

I agree. However the ordering may not follow the processing of an
incoming message. We may need to use a subsumption architecture to allow
policies to syncrhonize with the way the email system implementing that
policy operates.

>First and foremost, I think there needs to be things that are strictly 
>envelope/header related.  There might be some specific and 

The basis of a consent policy is most naturally expressed as a hierarchy
of rules rooted in the sender - so the first step in implementing a
policy is to define "SENDER". Envelope and header information are
generally the right place to start, but they don't always establish the
SENDER by themselves.

>Second, messages often consist of multiple parts.  Those parts 
>can require 
>individual, further attention... decoding, name and format 
>testing, content 
>scanning, or whatever.  In some cases, a multipart message 
>needs to have one or 
>more parts removed;  it's even possible that an originally 
>multipart message 
>might be reduced to a single-part message before releasing the 
>message to the 
>mail client software or ongoing MTA.  

I'm not sure it makes sense for the consent policy to get involved in
modifying the message in any way although I do agree that establishing
the correct policy to execute may require some decoding.

>Third, both within individual parts and within entire messages 
>I think there 
>needs to be provision to call external processing modules 

In order for a consent policy to be an open document it must be possible
to establish a collection of "tests" to establish the inputs required
for executing the policy, and also some "actions" for responding to
policy decisions. Both of these will be somewhat implementation
specific.

In an open architecture, which I believe can be expressed in an XML
compatible format, "tests" and "actions" would be defined with
implementation specific definitions.

(The following introduces the idea that CONSENT can be based not only on
SENDER but also on other "features" that can be defined in the POLICY.
The new, obvious feature here is content... with the idea being that a
RECIVER might grant CONSENT to a SENDER only for specific CONTENT. It's
important to realize that the rules are only formally based on specific
tests and that CONTENT is only a policy-specific concept implemented by
a combination of tests... if required.

Completely artificial example/illustration:

<TESTS>
  <TEST>
    <NAME VALUE="PORNOGRAPHIC" />
    <IMPLEMENTATION>
      <EXECUTE>
        <PROGRAM VALUE="/usr/bin/heuristic-test" />
        <PARAM1 VALUE= $MESSAGE />
      </EXECUTE>
      <VALUE>
        <TYPE VALUE=BOOLEAN />
        <TRUE>
          $RESULT = 54
        </TRUE> 
      </VALUE>
    </IMPLEMENTATION>
  </TEST>
<TESTS>

...

I had to bake that on the fly, I hope it makes sense.

Presumably, the above defines a TEST for the content of a message based
on an program that uses heuristics. The program
"/usr/bin/heuristic-test" is called with one parameter which is the
message itself ($MESSAGE). The TEST is TRUE if the $RESULT code of the
program is equal to 54.

The test could just as easily been based on some feature of the MTA
software (such as a DNSbl lookup), some environment variable, etc...

The concept is that defining the SENDER and the CONTENT requires that
some TESTs be executed. These TESTs should be defined in an way that is
implementation specific. Similarly the ACTIONS should also be defined
this way. This allows the POLICIES section to be shared anywhere that
the TESTs and ACTIONs named in the policy can be executed, and allows
those TESTs and ACTIONs to be defined for each system independently.

There should also be some "well known" tests that MUST be implemented in
order to be compliant... these would be tests such as the connecting IP,
envelope information, header information, etc.

>> 1. CONSENT - a direct expression of wanting to receive email from a
>sender.
>
>> 2. SOFT CONSENT - an indirect expression of wanting to receive email
>from a sender.
>
>I'm not sure I understand here (yet?) the point being made 
>with "direct" versus 
>"indirect" on these first two cases.  Is #2 a "default - 
>unrecognized sender" 
>situation and #1 a known sender from which mail is ALWAYS 
>desired (i.e. 
>"whitelist")?

Yes. Ultimately these are only conceptual constructs. A policy always
establishes a boolean CONSENT or DENIED CONSENT for a given message,
however in establishing a definition of how a policy document works it
is useful to distinguis between CONSENT which is based on a known sender
and SOFT CONSENT which is based on an evaluation of a message from an
unknown sender. The primary differnce between them is the relative error
rate.

With SOFT CONSENT / SOFT DENIED CONSENT the policy is more likely to err
so the ACTIONs that are taken in these cases should be "softer". Note
the careful use of _should_.

While it is possible for a given policy implementation to delete any
messages that fail a given DNSbl or heuristic test if that fits the
needs of the RECEIVER, the RECEIVER would be wise consider that such a
harsh policy might have unintended consequences due to the error rates
of these tests. As a result they _should_ chose a less aggressive ACTION
in these cases such as rejecting the message or holding the message for
review.

>> 3. SOFT DENIED CONSENT - an indirect expression of wanting not to
>receive email from a sender.
>
>> 4. DENIED CONSENT - a direct expression of wanting not to receive 
>> email
>from a sender.
>
>Likewise...?

Consider case 1 and 4 (CONSENT / DENIED CONSENT) like a white and black
rule respectively. There is no room for interpretation (save that which
determines the SENDER)

Consider case 2 and 3 (SOFT: CONSENT / DENIED CONSENT) like a collection
of tests that attempt to evaluate the message based on content and other
rules... These tests will have a varying degree of confidence in all
cases.

>
>> NOTE: 2 and 3 are required to handle anonymous senders.
>
>I think that 2 and 3 also might be required to handle cases 
>where a qualitative 
>decision is made (i.e. messages not specifically "whitelisted" or 
>"blacklisted").

Yes. Even if you know the sender, you may want to decide CONSENT based
also on some other grounds... The policy model must support this.

>But there are further cases that you're not considering here, 
>and that is at 
>least equally important.  It's not just whether a message is 
>actually desired 
>for delivery, but also how to handle cases where a message is 
>NOT to be 
>delivered to the original addressee.  Is the message to be 
>simply blackholed?  
>Is a forged "destination mailbox unknown" reply to be 
>returned?  Is a polite 
>reply to be returned requesting that the message be resent 
>without attachments 
>or HTML or encoding or whatever?  Is the message simply to be 
>bounced back?

A policy will define ACTIONS that can be taken when consent is granted
or denied. The ACTIONS would be defined in an open way just like the
tests and would be dependent on the facilities available with that
implementation.

The policy would be able to define any of these actions once consent was
granted or denied, and would presumably define these actions based on
parameters that were established by the tests defined in policy.

something like:

<TESTS>
  ... # Definitions for test implementations on this system.
</TESTS>

<ACTIONS>
  ... # Definitions for action implementations on this system.
</ACTIONS>

<SENDERS>
  ... # Definitions of senders (by applying tests)
</SENDERS>

<CONSENT>
  <GRANTED>
    <ACTION-POLICY>
      <USE-ACTION VALUE=DELIVER />
      ... # Definitions of sender cases.
    </ACTION-POLICY>

    <ACTION-POLICY>
      <USE-ACTION VALUE=REDIRECT />
      ... # parameters.
      ... # Definitions of sender cases.
    </ACTION-POLICY>

    <ACTION-POLICY>
      <USE-ACTION VALUE=COPY />
      ... # parameters
      ... # Definitions of sender cases.
    </ACTION-POLICY>

  <DENIED>
    <ACTION-POLICY>
      <USE-ACTION VALUE=BOUNCE />
      ... # Definitions of sender cases.
    </ACTION-POLICY>

    <ACTION-POLICY>
      <USE-ACTION VALUE=BLACKHOLE />
      ... # Definitions of sender cases.
    </ACTION-POLICY>

</CONSENT>

>Might it be useful to provide for coherency of message 
>routing?  e.g. a message 
>with a From: address of HOTMAIL.COM or AOL.COM but which has 
>passed through a 
>mail server or relay (say) in China or Korea or some other 
>distant country?
>
>Or, say, a message with wildly incoherent dates in Received: 
>headers or Date: 
>header?

Such as: (Another half baked example)

<SENDERS>
  <SENDER>
    <NAME VALUE="FORGED-AOL>
    <TEST>
      <SENDER-DOMAIN> AOL.COM </SENDER-DOMAIN>
      <OR> # Tests use boolean AND by default
        <ROUTED-THRU> China </ROUTED-THRU>
        <ROUTED-THRU> Korea </ROUTED-THRU>
      </OR>
    </TEST>
  </SENDER>
</SENDERS>
...
<CONSENT>
  ...
  <DENIED>
    <ACTION-POLICY>
      <USE-ACTION VALUE=BLACKHOLE />
      <USE-ACTION VALUE=LOG-DENIED />
      <CASE>
        <SENDER VALUE=FORGED-AOL />
        <SENDER VALUE=FORGED-HOTMAIL />
      </CASE>
    </ACTION-POLICY>
    ...
  </DENIED>
</CONSENT

>Certain senders (say, specific Yahoogroups mailing lists) for 
>instance might 
>NEVER legitimately contain attachments.  If someone spoofs a 
>popular Yahoogroups 
>mailing list as the "sender" but actually still sends an 
>attachment (which that 
>group should NEVER be sending) then that's evidence of a 
>forged From: address 
>and/or an unauthenticated sender.

The existence of an attachment on the message could be a test and that
test would be used to define the SENDER for this yahoo group. If the
message had the attachment then the SENDER definition would not match
and the policy would be denied by default - more specifically the
UNKNOWN sender would be selected and the action policy for that case
would be executed.

>In the ultimate case of this, you really end up wanting to 
>write a program... do 
>we need yet another programming language?  Personally, I'd 

I think what we want to do is establish a standard policy document in
XML format and then allow the implementor to include any tools,
programming languages, tests, etc... that they wish to.

The purpose of a "compliant" consent policy is to standardise the
definition so that consent policies can be aggregated, shared, and
discussed, and so that there is a "standard" framework for developing
"abuse management" tools for email.


>favor the use of 
>SPITBOL for stuff like this... it's probably about the most 
>powerful language 
>there is for text processing and pattern recognition and data 
>structure 
>manipulation... and that's what this whole process is really 
>all about.  SPITBOL 

I may wish to use Java, someone else may wish to use Lisp, and so on.
The standard consent policy document should allow for any or all of
these to be implemented as features of the local system. In my current
thinking this is done in the TESTS and ACTIONS section where the
implementor can define any number of external or internal program
features that they might make available.

>has the additional nice property that one can bring in new 
>program segments 
>dynamically according to stages already passed or decisions 
>already made... so 
>that the program can be easily extended at runtime to add new 
>rules or whatever, 
>based on (say) specific senders or specific types of message content.

Presumably an advanced implementation of a "standard consent policy"
would optimize test execution dynamically at runtime and would take
advantage of any features that are available in their software.

One caveat to this, however, is that highly specialized implementations
may not be easily transported to other systems and may not be shareable
beyond the confines of the local system.

This has implications regarding the aggregation of consent policies -
which can be a very powerful feature of a "standard consent policy"
framework. Aggregated consent policies would allow common user policy
decisions to be moved toward the gateway MTAs of a large system with
profound implications for resource utilization.

Highly proprietary implementations may not travel well... the result
would be that policies dependent on TESTS or ACTIONS that are not
available to the common policy would not be aggregated.

This open architecture allows for highly specific implementations -
however those who use proprietary TESTS or ACTIONS will need to
understand the implications of those decisions.

>It's hard to imagine how one would devise a specific "Consent 
>Definition 
>Language" that doesn't end up being in essence a "programming 
>language" (and 

I think it's possible... in fact it's probably done (see above). I'm
fairly certain that the language I'm proposing can be defined without
defining any specific programming language - and, in fact, that would be
the point.

>Others might be independent, and in that case it would also be 
>nice to allow 
>multiple tests on the message to perhaps proceed in parallel 
>(multitasking/multiprocessing/whatever) so as to reduce 
>overall time spent in 
>processing each message.

My current thinking is that the language "presumes" all required tests
are executed before a SENDER is defined (based on the results of those
test). Then the correct ACTION-POLICY is implemented based on those
results.

The reality would be implementation specific. One would presume, for
example, that a highly efficient implementation would execute only those
tests required to establish each case, and that the order of test
execution and policy selection would be established so that the least
amount of processing would be required for each case.

For example, if there is a sender defined by the connection IP and that
sender has a normal delivery action policy then the engine would first
test for the connection IP and having discovered that the IP matches
this policy would immediately deliver the message - eliminating all
other tests.

HOWEVER - all of that would need to be implementation specific.
It is very important for the policy definition langage to remain
agnostic on implementation and programming languages if there is to be
any hope for wide adoption.

>I've heard XML called a LOT of things although I don't think 
>that I can remember 
>too many times that it was accused of being "efficient".  :-)

Well, it does at least fit the bill here. It is extensible by design,
and it is fairly well understood and usually easy to grasp (look how
well HTML cought on), and it translates well into databases, parsers,
etc. So, on the whole I think it's the most efficient for this task.

Mind you - I'm not saying that everyone, or ANYONE, should learn XML and
the consent policy definition language... Presumably those who build
software that implements these policies would make nice, simple, slick
GUI tools for users and admins to establish policies etc. The important
thing is that those poilices can be expressed in a standard way so that
they can be shared, aggregated, and discussed easily.

>Just because XML is presently trendy is NO reason IMHO to 
>impose that degree of 
>overhead onto anything as core as E-mail processing.  In 
>another list I'm on 
>we've been discussing the XML overhead/performance issues and 
>found that a 
>typical situation results in XML record descriptors taking 
>twice as much time 
>(or more) than a simple delimited data representation.  (And I 
>think that it 
>often can take a LOT more than even that...)

One would presume that the XML would be parsed once when the policy
chages or the system starts and after that a highly optimized engine
would simply execute the tests and actions defined in the document.

HTML, for example, is not necessarily a very efficient way of defining
documents when measured in some ways... however it is very easy to
deploy and leverage... that's what I'm going for here.

>I have a number of objections to XML in principle, largely 

Make another recommendation then and have it compete on it's own
merrits. I'm not particuarly "fond" of XML either, nor am I any expert
on it, nor do I have any investment in it.

HOWEVER, as an engineer I recognize that there are a lot of tools out
there to work with XML, that it is a widely accepted standard, and that
it has a number of characteristics that are beneficial to this task.

If there is a better choice then we should go with it. Right now, I'm
sticking with this because at the very least I think the structures I'm
proposing with this language are sound and would likely be implemented
in whatever language is selected to represent them.

>Ultimately, again, this is going down a slippery slope to 
>defining a new 
>programming language... and I simply don't think that's 
>necessary here.  There 
>are a number of languages that could be used, from primitive 
>languages like C to 
>braindead RegEx-based things like Perl or "real" 
>pattern-matching languages like 
>SNOBOL or SPITBOL.

I think you're missing an important point. All of the languages you are
talking about are procedural in nature. A policy should not be
procedural because it's purpose is to capture a policy, not a procedure.

Specifically the "root" construct of the problem at hand is:

I want mail that is like this.
I don't want mail that is like that.

A procedural language is required _under the covers_ to implement the
policy, but the poicy itself should not be defined procedurally.

This is the same kind of separation that makes SQL powerful for managing
data. The user of SQL does not need to understand how the data is
stored, nor how it can be accessed, nor what the most efficient search
mechanisms might be for their system.

Rather, they simply say: "Give me these records"... or more precisely:

Select * where x="these records";

Does that make sense?

---

In the same way, a system administrator can establish a number of TESTS
based on the features at their disposal... Some of these might even
execute perl scripts, C++ programs, SPITBOL or other engines. But in the
end, the user (or administrator) defining the policy only needs (or
wants) to say:

"I want mail that passes these tests."

Some of the tests should be on the _MUST IMPLEMENT_ list for a compliant
system. Specifically those that are "well known" and defined by this
group (and the group(s) that eventually manage and contribute to a cdl).

For example the test for a specific connecting IP or network.

>I'm not even sure, really, that we have to go all that far in 
>terms of defining 
>what the actual consent definition language or corresponding data 
>representations are... I'm not all that convinced that we'll 
>ever see (or even 
>that we SHOULD) a single standardized worldwide agreement for 
>stuff like this, 
>and different mail filtering systems and tools are likely to 
>develop their own 
>approaches and techniques.  (And if someone does a distinctly 
>"better" one, 
>hopefully it will win out even over a "standard" one.)  

You may be right, but I think it's a good idea. If for no other reason
than it will provide us with a "common language" for discussing the
different policies we may have or may propose, and the implications of
implementing those policies.

I've seen a lot of chaos on this (and other lists) because there is not
yet a common language for defining policies. The cdl I've proposed does
that, and more - it provides the groundwork for what could become a
common standard for implementing policies. Whether it succeeds in the
latter or not, the benefits of the former are significant.

>> Based on personal experience, the framework defined above _should_ be
>able to encoumpass all of the current and proposed mechanisms 
>used for curbing abuse without significant difficulty or complexity.
>
>Perhaps, although it sounds awfully complex to me (and 
>specifically I really 
>don't see why we need to jump onto the currently-trendy XML 
>bandwagon here).

"trendiness" is no reason to reject a possible solution.

>The real issue, I think, is how far we're going to go toward 
>writing the actual 
>filtering application as part of the consent model standard 
>(and even, for that 
>matter, whether we NEED a standard consent description).  
>
>Even just simple "whitelists" or "blacklists" don't always 
>tell the story... for 

The point of defining a CDL is not to specify, recommend, or define any
specific whitelists, blacklists, or other tests. Rather it is to define
a standard framework within which those tests can be defined and how,
once defined, they can be applied.

>example, I might have a Yahoogroup I'm a subscriber to but 
<snip>
>Likewise, the mere presence of a blacklisted domain reference 
<snip>

One goal of a CDL is to provide a framework where *almost* any policy
could be described - even the complex policies you describe.

>I guess I personally feel that what we need to do more is to 
>establish that 
>there are certain broad areas that will typically be used to 
>perform triage on 
>incoming E-mails, whether at the user level or at the domain 
>or ISP service 
>level.  These areas include header-level coherency and tests 
>(acceptable user 
>identity, no routing through known open relays, etc) as well 
>as content-based 
>tests (no HTML-burdened content, no obscured URLs, no bogus 

In the CDL I propose, all of these would be TEST definitions. With a CDL
in place it would be easier for us to discuss, contrast, and explore
these definitions.

At the same time, we would be able to explore the implications of
policies based on these tests.

In the same way we can use the CDL to explore TEST we can also use it to
explore ACTIONS.

That's what the CDL is really for... It is only a hope that the CDL
would be translated into actual systems that implement policies
researched, reccomeneded, and documneted by this and other groups using
CDL.

>I still think that it is absolutely essential that 
>HTML-burdened content (or at 
>least large classes of frequently-abused HTML) and presence of 
>attachments or 
>encoded message text should be offered as an optional (and probably 
>recommended!) cause for denial of delivery of messages from 
>unfamiliar senders.

That is a policy proposal. With a CDL, you could be _very precise_ about
what you mean by this proposal and everyone would be able to discuss it
in a common framework.

I hope I've cleared things up... It was not my intent to be combative if
any of my comments appeared that way.

Thanks,

_M


_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg