Re: draft-freed-sieve-in-xml status?

"Robert Burrell Donkin" <robertburrelldonkin@gmail.com> Wed, 07 January 2009 01:14 UTC

Return-Path: <owner-ietf-mta-filters@mail.imc.org>
X-Original-To: ietfarch-sieve-archive-Aet6aiqu@core3.amsl.com
Delivered-To: ietfarch-sieve-archive-Aet6aiqu@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id DDC403A6930 for <ietfarch-sieve-archive-Aet6aiqu@core3.amsl.com>; Tue, 6 Jan 2009 17:14:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.256
X-Spam-Level:
X-Spam-Status: No, score=-2.256 tagged_above=-999 required=5 tests=[AWL=-0.257, BAYES_00=-2.599, J_CHICKENPOX_33=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tWYk0GX9FnXY for <ietfarch-sieve-archive-Aet6aiqu@core3.amsl.com>; Tue, 6 Jan 2009 17:14:10 -0800 (PST)
Received: from balder-227.proper.com (properopus-pt.tunnel.tserv3.fmt2.ipv6.he.net [IPv6:2001:470:1f04:392::2]) by core3.amsl.com (Postfix) with ESMTP id C49A73A6888 for <sieve-archive-Aet6aiqu@ietf.org>; Tue, 6 Jan 2009 17:14:09 -0800 (PST)
Received: from balder-227.proper.com (localhost [127.0.0.1]) by balder-227.proper.com (8.14.2/8.14.2) with ESMTP id n070vt7M037721 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 6 Jan 2009 17:57:55 -0700 (MST) (envelope-from owner-ietf-mta-filters@mail.imc.org)
Received: (from majordom@localhost) by balder-227.proper.com (8.14.2/8.13.5/Submit) id n070vtui037720; Tue, 6 Jan 2009 17:57:55 -0700 (MST) (envelope-from owner-ietf-mta-filters@mail.imc.org)
X-Authentication-Warning: balder-227.proper.com: majordom set sender to owner-ietf-mta-filters@mail.imc.org using -f
Received: from mail-bw0-f12.google.com (mail-bw0-f12.google.com [209.85.218.12]) by balder-227.proper.com (8.14.2/8.14.2) with ESMTP id n070vq21037712 for <ietf-mta-filters@imc.org>; Tue, 6 Jan 2009 17:57:53 -0700 (MST) (envelope-from robertburrelldonkin@gmail.com)
Received: by bwz5 with SMTP id 5so15349620bwz.10 for <ietf-mta-filters@imc.org>; Tue, 06 Jan 2009 16:57:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=gx759oNeARk68IptmnT18/Q2alpoAwJiARIQIywUeiA=; b=xzzXU77a6ohEHIa2h7mH6y8cwDPe+P3i3b1lF0koMHJm1TCOZkxppy0/OTdbneEt0T yWyU2sLq87MrbafFg2DhmWetvSFNY6LOvaP21jdLimLguF67cVFkocIV84bxQKbfkbBX ih9eqRqcQGQ8r5QWg6v3TR1IX9H7ROZvPezxg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=hjLYj6w7K8Uvqa+QSRGh1IvoZZ5tlzwCM73OL+6F2skY2z56pFXiNiVm0gPkerNKvC vUapPrSHKHP3i5XYJAJR45ZwknNLymreOnArjuYFyR80SRSsfmt4hbs3AA1lXe3vwPvM 0/du/9MSv01w3LnGn5c2ONFQZ2Hgway+pZQ2Q=
Received: by 10.181.224.3 with SMTP id b3mr8689319bkr.183.1231289871873; Tue, 06 Jan 2009 16:57:51 -0800 (PST)
Received: by 10.181.9.9 with HTTP; Tue, 6 Jan 2009 16:57:51 -0800 (PST)
Message-ID: <f470f68e0901061657g3aa2d63dwcdbee2ae91b3bbda@mail.gmail.com>
Date: Wed, 07 Jan 2009 00:57:51 +0000
From: Robert Burrell Donkin <robertburrelldonkin@gmail.com>
To: Ned Freed <ned.freed@mrochek.com>
Subject: Re: draft-freed-sieve-in-xml status?
Cc: ietf-mta-filters@imc.org
In-Reply-To: <01N3MQQCAYL000007A@mauve.mrochek.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <f470f68e0812041225x318bfdccg1bf9201b53ce8c2e@mail.gmail.com> <493E908E.70504@isode.com> <f470f68e0812090956j56c29f17s77fc554adaab1350@mail.gmail.com> <f470f68e0812140301r7ef04460t24f2ea9e6d2ff7a0@mail.gmail.com> <01N32VHWP1EK00SE3A@mauve.mrochek.com> <f470f68e0812141304i228b5a03s890e0f101b76b07e@mail.gmail.com> <01N335CY3MAQ00SE3A@mauve.mrochek.com> <f470f68e0812150500r5d1916f4obbd941434295fe07@mail.gmail.com> <01N3MQQCAYL000007A@mauve.mrochek.com>
Sender: owner-ietf-mta-filters@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-mta-filters/mail-archive/>
List-ID: <ietf-mta-filters.imc.org>
List-Unsubscribe: <mailto:ietf-mta-filters-request@imc.org?body=unsubscribe>

On Sun, Dec 28, 2008 at 10:31 PM, Ned Freed <ned.freed@mrochek.com> wrote:
>
>> On Sun, Dec 14, 2008 at 9:59 PM, Ned Freed <ned.freed@mrochek.com> wrote:
>> >> On Sun, Dec 14, 2008 at 6:06 PM, Ned Freed <ned.freed@mrochek.com> wrote:
>> >> >> i note that the draft describes the infoset rather than defining it in
>> >> >> the standard way. is there a reason for this decision?
>> >> >
>> >> > I don't know what "the standard way" is you're referring to. Perhaps you
>> >> > could provide a reference to an RFC where this has been used?
>> >
>> >> AIUI XML is maintained by w3c (rather than IEFT) so is a
>> >> recommendation. http://www.w3.org/TR/xml-infoset/ is the current
>> >> document.
>> >
>> > Quite true, however, the IETF has its own specification for XML is supposed to
>> > be used in RFCs: RFC 3470. And while infosets are mentioned as one approach to
>> > specifying things about an XML format, there's no recommendation, let alone
>> > requirement, that they be used.
>> >
>> >> > This document is a little unusual in that it's defining a mapping of, if you
>> >> > will, a non-XML infoset onto XML. As such, the natural approach seemed to be to
>> >> > first discuss the structure of the language being mapped, then explain the
>> >> > mapping, and finish up with additional unique-to-XML semantics.
>> >
>> >> i agree that most of this arangement is natural. it's just jumping to
>> >> a schema seems - to me - a little premature and inflexible.
>> >
>> > First of all, the use of XML Schema is in fact too inflexible to be allowed
>> > to continue. The next revision will use Relax instead.
>
>> XML schema is flexible but the flexibility comes at the price of
>> readability. one of relax variants would be a better choice.
>
>> however (in my experience) the generative tools commonly used for XML
>> and web service binding, and editor generation tend not to offer good
>> relax support. IMO the draft should offer secondary informative XML
>> Schema or Schemata to assist developers using these tools.
>
> The problem is that the unique particle attribution limitation in XML Schema
> effectively precludes using it without some compromises. I am therefore opposed
> to continuing to include it.

adopting a standard prefix - sieve, say - is all that is required

is this really too much to ask?

>> > But I'm sitll a little confused as to what you're asking for here. If you're
>> > asking for removal of the explicit inline XML syntax examples in favor of a
>> > more abstract approach, I'd be fine with that if there's a WG consensus to make
>> > such a change.
>
>> no - i'm very happy with the syntax examples
>
>> i would like to see the approach used in RFC 5023 (and others)
>> adopted, adding a normative description of the XML and making the
>> schema only informative.
>
> Personally, I find RFC 5023 approach, like the XOPEN object descriptions it's
> similar to, to be almost totally unreadable. Maybe it's the only reasonable way
> to do it when the element structure is quite complex, but that's not the case
> here.
>
> So, absent some fairly strong support for this from others in the group, I'm
> not going to pursue this.

is there anyone else in group - excepting you and myself - who cares
enough to contribute at all to this discussion?

>> >> sieve). there is a large and growing requirement for integration
>> >> between mail and enterprise systems (typically coding in Java and .NET
>> >> but also ruby and python). developers from enterprise backgrounds are
>> >> typically strong on web+xml but very weak on mail.
>> >
>> > Yep, I've seen a lot of this as well. And the problem emcompasses far more than
>> > Sieve: For example, a lot of people who are unfamiliar with email don't
>> > understand very basic concepts such as the separation between envelope and
>> > message content. (This particular issue actually pokes through into Sieve in
>> > the form of whether an envelope or header test is appropriate.)
>
>> i beg to differ slightly on this one
>
>> some enterprise mail processing may happen during the SMTP transaction
>> but it is more typical for the mail processing after storage. not all
>> mail stored arrives through SMTP and so it is typical for any envelope
>> information to be reduced to simple MIME headers.
>
> Robert, with all due respect, you may have substantial expertise on the XML
> front,  but your comments here are actually doing little more than illustrate
> the validity of my argument that there's a general issue with people not
> getting how email works that isn't going to get fixed by anything we do here.

ned - with all due respect - your comments illustrate a lack of
understanding of this class of mail server

> If you want this addressed the place to look is the email architecure
> specifications being worked on by Dave Crocker.
>
> And it is NOT typical for envelope information to be stored as headers.

for the class of application (enterprise mail servers is a name that's
sometimes used but quite possibly that's not familiar to others in the
group), unfortunately it is

> There are several reasons for this:
>
> (1) Envelopes only exist between the time of submission and final delivery.
>    Transport actions do record certain bits of envelope information in
>    trace header fields and final delivery is supposed to copy some additional
>    envelope information into a couple of header fields, but these are NOT
>    a message envelope and it is mistake to assume they are.
>
> (2) During the time the envelope exists it is highy mutable, often changing
>    form at every hop. This makes header storage of envelope information
>    somewhat problematic.

true

> (3) There are several SMTP extension that add to the envelope in various ways,
>    requiring negotiation of what envelope information can and cannot be
>    passed from one system to another. This tends to interact badly with
>    schemes that store envelope information as a static part of the message.

(when SMTP delivery is just the first step in mail processing, this
isn't such a problem)

> (4) The fact that protocols other than SMTP are used for various email
>    operations doesn't necessarily impact header/envelope separability.
>    Other protocols maintain this separation and at least one of them, X.400,
>    actually has a far greater degree of separation than SMTP does.

(not all protocol maintain this separation and if any mail enters
through those protocols, this information will not be available)

> (5) Because there are effectively no controls on what ends up in headers, it
>    is fairly easy for the separation between "header" headers and "envelope"
>    headers to get lost. Among other things, this can create serious
>    security vulnerabilities.

(only during SMTP delivery )

> Now, this is not to say there aren't various ad-hoc schemes in use where active
> envelope information ends up getting stuffed into the header. Such schemes date
> back to BITNET's use of X-Envelope-To: to work around the 8x8 limit and
> probably long before. But in my experience at least these things invariably
> fail to provide a full and correct mapping for all of the possible information
> that can exist in an SMTP (or X.400) envelope. And as a consequence they
> invariably cause problems because of their inability to truly express envelope
> semantics.
>
> Indeed, if you have to capture envelope information in a static form - the main
> current use-case for this is compliance archiving - in most cases you're better
> off NOT using header-based schemes. We even have a standard format defined for
> this: Batch SMTP. Although the format that's probably used the most is the one
> Exchange generates that they call "envelope journaling", which puts the
> envelope in the first text part of a MIME multipart. (On a side note, if anyone
> knows where there's a precise and complete specification of the syntax used for
> envelope journaling, I've appreciate a pointer.)

i agree that stuffing this information into headers is a bad idea (i
intended to observe not advocate above)

i disagree that this is a protocol problem with a protocol solution -
it's simply a poor choice of data representation by the designers.
more modern approaches to meta-data use namespacing and this prevents
loss of information. (this is often then compounded by confusing a
dead MIME document with a live email.)

>> most developers in
>> these mail processing environments do not need to understand the
>> difference between envelope and message content because - for them -
>> there is no difference.
>
> Yeah, that's what a lot of them think. The problem is they're quite simply
> wrong, and it is isn't a harmless thing to be wrong about. I get plenty of
> support calls from customers who got screwed by this lack of understanding.
>
> And it is NOT a minor detail when someone sets up a compliance archiving system
> that ends up in many cases not being able to determine who actually sent or
> received a given message. (I only wish I was making up this example.)

:-)

(that's why containers are now often used in the enterprise)

>> Sieve works very well as a general MIME document processing language.
>
> Actuallly that's not Sieve's purpose at all and it isn't something Sieve is
> currently good at. In fact we've only recently taken the first fairly tentative
> step down the MIME processing path with the MIME loops extension and possibly
> the convert extension. We'll see how well that turns out, but I have to say I'm
> not optimistic that it will replace existing ad-hoc MIME processing facilities
> like MIMEdefang.

AIUI these limitations only apply to multipart documents. outside
email, multipart documents are not so common and sieve works fine on
those.

>> the envelope tests are - in many ways - peculiar since the rest of the
>> specification really isn't mail specific. there are potentially some
>> very interesting applications in this area so it would be a shame - i
>> think - for the expert group to focus too strongly on SMTP at the
>> expense of other IMHO equally valid Sieve use cases.
>
> I don't object to the use of Sieve in other contexts - in principle. But the
> devil is in the details. A good example of this is the use of Sieve in an IMAP
> server as defined in draft-ietf-lemonade-imap-sieve-05.txt. This doesn't seem
> like too much of a stretch from existing usage, but when I reviewed this
> document a while back I found all sorts of semantic mismatches, some of them
> quite serious.
>
>> > But here's the dilemma: This stuff is complicated and in some cases fairly
>> > subtle. This in turn means that the reiteration of even a subset of the
>> > underlying design principles that implementors need to know takes up a lot of
>> > space and will still fall short of the mark of giving the necessary guidance.
>> > But it may lead to the belief that reading this specification (or for that
>> > matter this one and RFC 5228) is in fact sufficient to understand how to use
>> > Sieve. It quite simply isn't.
>
>> again, i beg to differ
>
>> sieve is very similar structurally to the guerrilla standards used in
>> enterprise mail system for more than 5 years now. for most mail
>> processing applications, only the container builders need to have a
>> good understanding of the protocols. application developers are
>> offered a safe environment and an OOP interface. i see no reason why
>> sieve should be any different.
>
> Understanding of the protocols isn't necessary, but I'm very much afraid  that
> there's no avoiding an understanding of email semantics if you want things to
> work properly. We may wish it were otherwise, but it just isn't.

depends on what you mean by that

if you're talking about processing as part of SMTP processing (or any
other protocol) then i agree

but often when people think they are processing email, the use case
boils down to essentially processing a dead MIME document with
meta-data that has been previously delivered through some protocol or
other. they may do some other protocol stuff - such as forwarding some
result - but that's essentially acting as an email client. this is
usually within the capability of most general developers without a
good understanding of email semantics.

- robert