Re: Questions about RFCs 1494, 1495.

"Carl S. Gutekunst" <csg@hideji.worldtalk.com> Tue, 07 June 1994 15:43 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa13393; 7 Jun 94 11:43 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa13389; 7 Jun 94 11:43 EDT
Received: from survis.surfnet.nl by CNRI.Reston.VA.US id aa25681; 7 Jun 94 11:42 EDT
Received: from relay3.UU.NET by survis.surfnet.nl with SMTP (PP) id <24293-0@survis.surfnet.nl>; Tue, 7 Jun 1994 17:31:13 +0200
Received: from uucp4.uu.net by relay3.UU.NET with SMTP (rama) id QQwtgw26459; Tue, 7 Jun 1994 11:31:08 -0400
Received: from worldtlk.UUCP by uucp4.uu.net with UUCP/RMAIL ; Tue, 7 Jun 1994 11:31:07 -0400
Received: from hideji.worldtalk.com by worldtalk.com with SMTP (1.38.193.5/16.2) id AA27009; Tue, 7 Jun 1994 08:24:50 -0700
Received: by hideji.worldtalk.com (5.61/1.5) id AA22323; Tue, 7 Jun 94 08:29:21 -0700
Date: Tue, 7 Jun 94 08:29:21 -0700
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: "Carl S. Gutekunst" <csg@hideji.worldtalk.com>
Message-Id: <9406071529.AA22323@hideji.worldtalk.com>
To: Ned Freed <NED@sigurd.innosoft.com>
Cc: wg-msg@rare.nl, mime-mhs@surfnet.nl
Subject: Re: Questions about RFCs 1494, 1495.
In-Reply-To: Your message of Mon, 06 Jun 1994 21:00:39 PDT <01HD8ITQT1UC96W8QW@SIGURD.INNOSOFT.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Id: <22321.771002960.1@hideji.worldtalk.com>

>(1) Everyone doing Internet has an RFC822 header parser around. These tend to
>    be quite reliable. Parsing the interior of an extremely complex header, on
>    the other hand, is something that's hard to get right.

This touches on one of my purely selfish reasons.  :-)  I don't know if others
do something similar.

My parser uses two tables, one for message headers and one for MIME multipart
headers.  When scanning message headers, the Content-* lines cannot be parsed
until the entire header has been examined, since the MIME-Version header
dictates whether the message follows MIME or RFC-1049/1154 body semantics.
So, the Content-* headers are saved in their own structure, scanned but not
parsed; if it turns out there is no MIME-Version header, they are marked as
"unknown." When scanning multipart headers, the Content-* headers are valid,
but all of the RFC-822 header lines are "unknown."

So, all Content-* headers are replicated in both tables.  Hence my purely
selfish reason for wanting to keep the number of Content-* headers small.  :-)

That said, it did take vastly more work to get my interior parsing functions
right, although now they are perfect.  ;-)

A better implementation -- regardless of how MIME<->FTBP works -- would be for
the message header scan to give the Content-* header lines no special treat-
ment, letting them drop into the unknown pool.  After the entire header has
been read and the body semantic determined, make a second pass over the
headers in the unknown pool to pick off those that now mean something.  The
exception would be the Content-Type header itself, whose presense (in the
absense of a Mime-Version header) indicates RFC-1049, so it would remain in
both tables.

>(4) I tried it using the second form first in my original draft. It was truly
>    horrible!

To me, this is the most persausive argument of all.

The only reasonable counter-argument is that, for User Agents, filtering out a
small number of headers is easier than filtering a large number of headers.  I
remember when the first X.400 gateways joined the net, and there was much fuss
in the Usenet comp.mail.* groups from Admins that were having to run all over
and help their users add 40 lines of filters to their .mailrc files.

The difficulty would be finding a useful grouping for the complex headers, so
that users could filter out the noise while preserving things they want.
Filtering out a bunch of new headers may be tedious, but in most user agents
extracting one argument from a complex stream is impossible.

I rest.  :-)

<csg>