Re: Multiple "To:" and "Cc:" header lines in SMTP messages

Ned Freed <Ned.Freed@innosoft.com> Sun, 06 October 1996 06:37 UTC

Received: from cnri by ietf.org id aa28578; 6 Oct 96 2:37 EDT
Received: from list.cren.net by CNRI.Reston.VA.US id aa02970; 6 Oct 96 2:37 EDT
Received: from localhost (localhost.0.0.127.in-addr.arpa [127.0.0.1]) by list.cren.net (8.7.6/8.6.12) with SMTP id BAA14543; Sun, 6 Oct 1996 01:59:40 -0400 (EDT)
Received: from THOR.INNOSOFT.COM (THOR.INNOSOFT.COM [192.160.253.66]) by list.cren.net (8.7.6/8.6.12) with ESMTP id BAA14530 for <ietf-smtp@list.cren.net>; Sun, 6 Oct 1996 01:59:28 -0400 (EDT)
Received: from INNOSOFT.COM by INNOSOFT.COM (PMDF V5.0-7 #8694) id <01IA9IACMFWG9OCV93@INNOSOFT.COM>; Sat, 05 Oct 1996 11:49:36 -0700 (PDT)
Message-Id: <01IAA6X9B4VK9OCV93@INNOSOFT.COM>
Date: Sat, 05 Oct 1996 10:48:39 -0700 (PDT)
Sender: owner-ietf-smtp@list.cren.net
Precedence: bulk
From: Ned Freed <Ned.Freed@innosoft.com>
To: Pete Resnick <presnick@qualcomm.com>
Cc: Ned Freed <Ned.Freed@innosoft.com>, ietf-smtp@list.cren.net
Subject: Re: Multiple "To:" and "Cc:" header lines in SMTP messages
In-Reply-To: "Your message dated Tue, 01 Oct 1996 00:23:58 -0500" <v03010436ae7634c72e45@resnick1.isdn.uiuc.edu>
References: <v03010432ae75d89baa6b@resnick1.isdn.uiuc.edu> <14386.843812494@domen.uninett.no> <c=US%a=telemail%p=dg%l=GROUCHO-960926092252Z-413@groucho.webo.dg.com> <01IA1WOQ216Q8Y55C6@INNOSOFT.COM> <01IA3FD2A8LI9OCV1R@INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
X-Listprocessor-Version: 8.1 -- ListProcessor(tm) by CREN

> But now comes the question which I posed at the bottom which did not get
> answered: Are you saying that there are sites that are forced by law to
> expand lists (note the emphasis here) into the *recipient headers* of
> messages?

My understanding is that the rules (they are not laws because violation of them
doesn't land you in jail, you just lose your contract and go out of business)
basically say that the recipient list needs to reflect the actual recipients
that received the message and that this list has to be available in machine
readable form for use in auditing operations. Insertion of this material
into the message body as unformatted text would clearly not be acceptable.

Note also that in X.400 (and these rules are clearly written with X.400 in
mind, not Internet mail) this requirement is met by keeping a copy of the
entire list of recipients at the envelope level in every copy of the message;
headers aren't used for this function at all. In X.400 various chunks of the
far-more-extensive envelope structure end up getting put into the "delivery
information" entity; this is then accessible to the user agent.

> Are you saying that laws are specific enough to require that SMTP
> recipient header fields are used as opposed to putting the expansion into
> the body of the message? Why can't you just expand into the body and leave
> "reasonable" headers on the messages?

I know of no requirement anywhere that the headers be used for this function
when Internet mail is employed, but if there's an alternative standardized
and parseable structure available for use I'm not aware of it, so the
effect is basically the same.

Of course you could do something nasty like put in a separate message/rfc822
body part with all this information in it, but I'm afraid I still don't see the
point in doing such a thing. The question you're effectively asking here is
whether or not an alternative place to store this material is superior to using
multiple headers at the top level. So let's see; if  we use multiple headers we
end up with:

+ Doesn't break things during transport
+ Is relatively easy to implement, since all it requires is insertion of
  headers
+ Is auditable using existing mechanisms
+ Some agents support it fully already
- Some agents (e.g. Eudora) don't support it and thus don't get replies right

Whereas if we use the body of the message, we get:

+ Doesn't break things at the transport level
- Is extraordinary difficult to implement since it requires considerable
  sideline information storage and significant understanding of message
  structure
+ Could be made auditable
- Isn't currently auditable and much work would be need to make it so
- No agents support it currently

In other words, I believe use of multiple header fields is superior in
every way to the approach you have suggested.

> Or put them in some other header
> (multiple "Expanded-List:" headers) such that mailers on the other end
> don't have to deal with hundreds or thousands of destination addresses to
> parse and deliver to? The current state of affairs just seems like a poor
> solution.

I disagee. First of all, there is no way to implement this because there is no
way for agents to tell that a given address is the result of a list expansion.
You are again missing the point I've tried to make several times -- these
addresses are already in a regular header field when we receive the message.
The only question for us if how we handle the case where the field is long.
There is no clever little indicator anywhere in any of this that some or all of
the addresses are the result of a list expansion. For all we know the
originator sat there in Pine and typed them in, one by one.

Second, you are acting like someone who covers a stain on the tablecloth by
covering it with a dish -- it doesn't work because someone is sure to move the
dish. Similarly, even supporting you could magic up the information necessary
for us to start generating such a field, all you've now done is create a new
set of things for all of us to do -- we have to parse this new field and
process it in exactly the same way that we process the old fields. And people
are going to want to be able to reply to this multiple field, which means that
you're back to having to support replies to multiple fields. Everyone
loses with this approach.

> (Note that this is not equivalent to truncating the header and putting it
> into the body as "Overflow headers". What I am suggesting is not generating
> the long recipient headers in the first place.)

This is not a viable option for us because we're not the ones who generate
the long fields to begin with.

> >As such, we have customers who encounter messages with *hundreds* or even
> >*thousands* of recipients listed in the header. Our product doesn't do this
> >expansion, it is already a done deal by the time the message reaches us.

> Are you here saying that you are gatewaying messages which come in with
> these large numbers of recipients to SMTP and therefore split up the
> recipient lists into multiple fields yourself to deal with the broken
> sendmail's? Or are you getting SMTP messages which you are then fixing for
> broken sendmail's?

I am getting these messages from every source imaginable: SMTP systems,
LAN email systems, Message Router systems, X.400 systems.

Pete, you are falling into an insidious trap here that I routinely have to
admonish my customers about. Specifically, you are attempting to base decisions
on where a message comes from. This is often an incredibly plausible thing to
do -- the current problem originates by definition from someplace specific,
fixing it often means making some sort of compromise that you may not want
to make when dealing with all the mail you receive, so why not treat the
problem in an origin-specific fashion?

The reality is that such approaches tend in practice to fail spectacularly. Far
from being a tidy, predictable thing, the patterns of message flow can be
extremely complex and very unpredictable. Unexpected routes between systems are
routinely missed in an analysis, a software upgrade brings new capabilities
to a large user population that nobody expected them to have, someone else
sets up a new connection with unexpected consequences, and so on.

Here's a specific example for you to think about. Digital sells a user agent
for PCs called Teamlinks, which happens to be one of those that expand list
addresses into header fields. Teamlinks began life as an agent you hang onto
a VMS MailWorks server. MailWorks in turn talks to the outside world using
Message Router, and one of the things our product set does is act as a
gateway from Message Router to SMTP/RFC822/MIME.

Fine so far -- this is certainly a specific source that you can readily
identify. But then Digital developed a MailWorks server for Digital UNIX.
It gateways to SMTP directly. So now people that use MailWorks servers
on UNIX are generating SMTP messages with long header fields.

But this only applies to MailWorks servers on UNIX, so all you have to
do is del with that source, right? Wrong -- Digital then found that
they couldn't get people to migrate from ALL-IN-1 to Teamlinks because
of legacy message store issues, so they added the ability to talk to
ALL-IN-1 as well. But that's still no problem, because ALL-IN-1 only
talks to Message Router. But then ALL-IN-1 got a direct X.400 connection of
its own, so now this stuff originates from the X.400 world as well.

And then Digital got really fancy and put support for POP3, MIME, and SMTP
directly into Teamlinks. The result is a user agent that is deployed on around
a million desktops (at least according to the figures published in EMMS) that
produces RFC 822 messages directly that potentially have these sorts of header
fields in them.

> Now this is a different issue than the one sited above. Before we were
> talking about places where it was considered desireable to expand lists
> into headers. Here we're talking about places where there is no list name
> which could replace the many recipients. Is your claim is that the number
> of times that such expanded messages hit old broken sendmail's (sendmail's
> which will not be fixed) is high?

You bet it is high. The problem here is psychological: There's an established
system that has been running sendmail for years without any problems. A LAN
email system is installed, and then a gateway from it to SMTP is installed.
The old SMTP server now rejects the messages this new system sends sometimes.
Whose fault is it? Answer: The new system, because it is the only thing
that changed and things were working before.

Try fighting this sort of thing sometime. I have, and it is basically
impossible to win. And waving standards around accomplishes exactly
nothing. In fact I've seen hundreds of cases where far more egregious
standards violations were involved and nobody would consider fixing the
actual problem.

> The thought of changing the standard to accomodate the combination of
> behavior of 3 broken SMTP acts (the act of removing headers, the act of
> looking in the body, and the act of adding headers back on) is pretty
> disheartening.

I don't deny it. But it is really our own fault for tolerating such
botched behavior on a wide scale for so long.

> >(4) In quite a few cases this problem has been "botched away" before you
> >    even see it, and thus ends up being counted as an entirely different
> >    sort of problem. I routinely see messages with truncated header fields
> >    where the truncated content of those fields got tacked on to the message
> >    *body*, typically prefixed with a tag such as "overflow headers". This
> >    happens because some other vendors don't take the multiple field
> >    approach and instead prefer to make it impossible for any agent to
> >    do reply-to-all properly.

> I understand that experiences may differ, but though I used to see lots of
> these, I haven't seen one in quite a long time.

I don't see lots of these any more -- last week I only got about
10 of them ;-)

> > I also never said that these things abound throughout Internet mail. They have
> > been and will continue to be a problem in some segments of the community, but
> > that's all.

> Which is why I'm inclined to not prop up what is currently broken behavior
> if we don't have to.

I never said we have to, only that I think it would be a good idea.

I'm going to give on this issue at this point. I'm always more comfortable
coding my way around these sorts of incompatibilies, and I can code my way
around this one by providing an option to merge multiple fields back into one
for clients that cannot interpret them. In fact I believe I'll call the option
the eudora option in your honor. (Smiley omited because I'm completely serious
about doing all of this, even though it has its humorous aspect.)

				Ned