Re: Comments on draft-resnick-2822upd-02.txt

"Charles Lindsey" <chl@clerew.man.ac.uk> Fri, 17 August 2007 23:04 UTC

Received: from balder-227.proper.com (localhost [127.0.0.1]) by balder-227.proper.com (8.13.5/8.13.5) with ESMTP id l7HN45vN006262 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Aug 2007 16:04:05 -0700 (MST) (envelope-from owner-ietf-822@mail.imc.org)
Received: (from majordom@localhost) by balder-227.proper.com (8.13.5/8.13.5/Submit) id l7HN45pU006257; Fri, 17 Aug 2007 16:04:05 -0700 (MST) (envelope-from owner-ietf-822@mail.imc.org)
X-Authentication-Warning: balder-227.proper.com: majordom set sender to owner-ietf-822@mail.imc.org using -f
Received: from lon-mail-1.gradwell.net (lon-mail-1.gradwell.net [193.111.201.125]) by balder-227.proper.com (8.13.5/8.13.5) with ESMTP id l7HN42vH006233 for <ietf-822@imc.org>; Fri, 17 Aug 2007 16:04:03 -0700 (MST) (envelope-from news@clerew.man.ac.uk)
Received: from [80.175.135.89] ([80.175.135.89] helo=clerew.man.ac.uk country=GB ident=postmaster&pop3#clerew#man*ac^uk) by lon-mail-1.gradwell.net with esmtpa (Gradwell gwh-smtpd 1.243) id 46c62961.126cd.9ac for ietf-822@imc.org; Sat, 18 Aug 2007 00:04:01 +0100 (envelope-sender <news@clerew.man.ac.uk>)
Received: from clerew.man.ac.uk (localhost [127.0.0.1]) by clerew.man.ac.uk (8.13.7/8.13.7) with ESMTP id l7HN3xAV028358 for <ietf-822@imc.org>; Sat, 18 Aug 2007 00:03:59 +0100 (BST)
Received: (from news@localhost) by clerew.man.ac.uk (8.13.7/8.13.7/Submit) id l7HN3xro028355 for ietf-822@imc.org; Sat, 18 Aug 2007 00:03:59 +0100 (BST)
To: ietf-822@imc.org
Xref: clerew local.mime:5065
Path: clerew!chl
From: Charles Lindsey <chl@clerew.man.ac.uk>
Subject: Re: Comments on draft-resnick-2822upd-02.txt
Message-ID: <JMxqxG.FA6@clerew.man.ac.uk>
X-Newsreader: NN version 6.5.2 (NOV)
References: <JMu4Du.B7q@clerew.man.ac.uk> <01MK8ESGFC4M005BGY@mauve.mrochek.com>
Date: Fri, 17 Aug 2007 20:44:04 +0000
Lines: 446
Sender: owner-ietf-822@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-822/mail-archive/>
List-ID: <ietf-822.imc.org>
List-Unsubscribe: <mailto:ietf-822-request@imc.org?body=unsubscribe>

In <01MK8ESGFC4M005BGY@mauve.mrochek.com> ned+ietf-822@mrochek.com writes:

>> >1.  Introduction
>> >
>> >1.1.  Scope
>> >
>> >   This document specifies a syntax only for text messages.  In
>> >   particular, it makes no provision for the transmission of images,
>> >   audio, or other sorts of structured data in electronic mail messages.
>> >   There are several extensions published, such as the MIME document
>> >   series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
>> >   for the transmission of such data through electronic mail,........

>> No mention of RFC 2047, or of RFC 2231?

>RFC 2047 specifies a means to include non-ASCII text in email headers and has
>essentially nothing to do with the transmission of structured data through
>email. So why should it be mentioned in this context?

OK, if this paragraph is intended for Content-Type stuff in bodies. And
there is a mention of RFC 2047 later on in a more relevant context (but
then RFC 2231 should problem get a mention at that place).



>> >1.2.  Notational conventions
>> >
>> >1.2.3.  Structure of this document
>> >

>> Can we be clear about the _intent_ of this obs-syntax?

>> Is the intent to be able to read/display/print ancient messages which
>> people still have on file? In which case, please can we say that there is
>> no longer any expectation that obs messages can still be transmitted and
>> delivered (by RFC2821 or otherwise), and hence only MUAs (but not MTAs)
>> are REQUIRED to accept them.

>> Or, alternatively, is the intent that some ancient software still
>> generates messages using the obs-syntax, and hence MTAs MUST still accept
>> them? In which case, for how much longer?

>I see little if any justification for additional elaboration of the intent
>here. As far as I'm oncerned the intent covers both your "alternatives' and
>quite a few other things as well.

>More generally, the problem with trying to nail down intent is that once you do
>so the ability of the construct to meet other, as-yet-unplanned needs may be
>compromised. See below for an example of another possible use of the obs-
>syntax - helping with interop requirements - should we make our own lives
>harder down the road simply because this document didn't say that one
>intent of this was to make feature enumeration easier?

It's a question of the remote possibility that some future benefit of
keeping the obs-syntax in place will appear; versus the certainty that
implementors will for ever be REQUIRED to accept some things that are
notoriously difficult to parse (such as bare CR or LF or NULL) and which
will surely never be encountered. Keeping these things off the wire and
with no requirement for agents that take things off the wire to continue
to accept them would be a good start.

Being cautious in what changes you introduce at Draft Standard stage is a
fine thing, but it has to be set against the fact that this is your last
chance to remove things which have outlived their usefulness and would not
be missed. Beyond Draft Standard, the concrete in which these things are
set becomes so hard that it can NEVER be broken.


>> >2.  Lexical Analysis of Messages
>> >

>> >2.1.1.  Line Length Limits
>> >
>> >   There are two limits that this specification places on the number of
>> >   characters in a line.  Each line of characters MUST be no more than
>> >   998 characters, and SHOULD be no more than 78 characters, excluding
>> >   the CRLF.

>> Can we de-emphasise that SHOULD, and make it clear that this is a matter
>> of good practice (in the sense of BCP) rather than a normative feature?

...............

>Besides, I think a SHOULD is actually appropriate here. SHOULD means you should
>do it unless you have a really good reason not to.

>> Perhaps s/SHOULD/should/? Too many agents have used this as an excuse to
>> rewrite lines en route (maybe there should be a SHOULD NOT for that).

>If so, they are relying on a flagrant misreading of the document. Attempting to
>prevent such exercises is itself and exercise in futility - the best you can
>ever do is point out that the claim isn't supported by the actual language.

But people DO flagrantly misread documents, and have done so in this case.
So that SHOULD has caused actual harm. RFC 2119 itself admits that the
interpretation of MUST and SHOULD might well be different in BCP and other
informational documents, but this document is clearly intended to be
normative, except where it chooses to make it clear that it is only giving
advice.

>As for having a SHOULD NOT about agents altering messages in transit, IMO the
>place for that - assuming it makes sense to do - would be the SMTP
>specification, not here.

That would be a good thing to say in 2821bis. But it should apply equally
to ANY transport mechanism - old or new.


>> >   The more conservative 78 character recommendation is to accommodate
>> >   the many implementations of user interfaces that display these
>> >   messages which may truncate, or disastrously wrap, the display of
>> >   more than 78 characters per line, in spite of the fact that such
>> >   implementations are non-conformant to the intent of this
>> >   specification (and that of [I-D.klensin-rfc2821bis] if they actually

>> Where did that '78' come from? I am aware of lots of systems that do
>> horrid things such as you mention if there are 80 characters in a line,
>> but I am aware of none where problems arise with exactly 79. In other fora
>> where I have seen this discussed, the consensus was that exceeding '79'
>> was the signal for troubles to start.

>I've always felt that the 78 character limit was one byte lower than it really
>needed to be. But I a long way from convinced that now is the time to change
>this.

It's your last chance :-( . And existing systems that use 78 would still
be compliant, or at least as compliant as they were before.

>> >2.2.3.  Long Header Fields
>> >
>> >   ......  Each header field should be
>> >   treated in its unfolded form for further syntactic and semantic
>>                                              ^^^^^^^^^
>> >   evaluation.

>> 'Semantic' yes, but why is that 'syntactic' there?

>Don't you have to parse things like address fields in order to then perform
>semantic analysis? .........

Ah! you mean that you have

   To: <a-very-long-name-such-as-frederickickickick
        @example.com>

(which is ugly but allowed). So you have to unfold before you can
recognize that you have an <addr-spec>. Point taken.



>> >3.2.2.  Quoted characters
>> >
>> >
>> >      Note: The "\" character may appear in a message where it is not
>> >      part of a quoted-pair.  A "\" character that does not appear in a
>> >      quoted-pair is not semantically invisible.  The only places in
>> >      this specification where quoted-pair currently appears are
>> >      ccontent, qcontent, dcontent, no-fold-quote, and no-fold-literal.

>> .... But,
>> as I have pointed out in a separate thread, you would remove a severe
>> interoperability problem with Netnews if you removed it from <dcontent> as
>> well (allowing just a "\" to appear as a normal character).

>As I believe I stated in an earler response, I am opposed to removing it. It is
>simply not possible to know everything that's out there and just because we
>don't know about something is no excuse to break it.  I could live with moving
>it to the obsolete syntax but that's as far as I'll go.

Indeed, moving it to obs-syntax is all I am asking for (though allowing
"\" in <dtext> might be tricky - I shall respond to Pete's remarks on
that).

>> >3.2.3.  Folding white space and comments
>> >

>> Do you _really_ want to permit NO-WS-CTL in a <comment>?

>RFC 2822 did, so the question becomes one of do we want to change
>this away from what 2822 said?

>Like it or not, control characters have long been allowed in a lot of places
>where they really don't belong. I remain to be convinced that this one
>narrow case is worth worrying about.

It (and other similar cases) is a good candidate for the obs-syntax, then.
There may be a few cases where they may be meaningful in the protocol (so
it is up to 2821bis to make the final pronouncement), but this is not one
of them. We had a purge on them in USEFOR, notably in Message-ID where
characters that are not visible on the screen could provide a golden
opportunity for all sorts of scams.




>> <phrase>s, <unstructured>s and <comment>s are the places where RFC 2047
>> raises its ugly head. It is the most confusingly written RFC I have
>> encountered (and it could be considered as separate from the rest of
>> the MIME standards, since it can be used without the MIME-Version header).

>> For a truly outrageous suggestion, we might incorporate the whole of RFC
>> 2047 into here, cleaning it up in the process. No, that is too much to
>> propose at this juncture, but there are a couple of lesser things we might
>> do to help:

>> 1. Include <encoded-word> in the syntax at all the proper places (which
>> might at least encourage inventors of new extension headers to follow
>> suit). It would need a convincing explanation, of course.

>I am strongly opopsed to this. If RFC 2047 is confusing, the time to argue that
>is when it is revised. We cannot fix it's problems (assuming there actually are
>any to fix) by incorporating some subset of references to it in another
>document.

Yes, I didn't expect that one to fly ;-( .

>> 2. And if that is a step too far, we could still point out that sequences
>> of the form "=? ... ? ... ? ... ?=" have a special significance within RFC
>> 2047 (whether they exceed that 76 character limit or not), and that such
>> sequences SHOULD NOT be used within <phrase>s, <unstructured>s and
>> <comment>s unless that special significance is intended.

>An informational reference to RFC 2047 would be OK with me.

"Within 'comment's, 'phrase's and 'unstructured's, sequences of the form
"=? ...  ? ... ? ... ?=" have a special significance within RFC 2047 for
encoding characters outside the range of US-ASCII. Such sequences SHOULD
NOT therefore be used unless that special significance is intended."

3.2.6 might be a possible home for such a remark. Possibly as a Note.

>> >3.3.  Date and Time Specification

>> why not "within the range -2359 through +2359"?

>I have no objection to restricting the range, but whatever we do needs to agree
>with other specifications that deal in time zones. RFC 3339 appears to allow
>-2459 through +2459. 

That would be fine (apparently funny things can happen aroung the Date
Line). Though I didn't actually find anything about that in RFC 3339.


>> >3.4.1.  Addr-spec specification
>> >
>> >   .......  A liberal syntax
>> >   for the domain portion of addr-spec is given here; it is left to
>> >   other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
>> >   [I-D.klensin-rfc2821bis]) to give more precise limitations on the
>> >   syntax.

>> Can we strengthen that by saying that the 'liberal syntax' MUST be further
>> restricted to conform to some published specification such as the ones you
>> have listed (without precluding further such specifications in the future,
>> of course)?

>No, because that would usurp the perogative of other specifications to
>specify what conformances criteria apply to their additional restrictions.

>> I have already pointed out, in a separate thread, the severe
>> interoperability problems with Netnews of this definition of <dcontent>
>> (at least insofar as its use within <msg-id> is concerned).....

>And IMO you failed to achieve sufficient support to result in a specification
>change. As I said previously, I can live with making the use of quoted-pairs in
>dtext part of the obsolete syntax, but that's as far as I can see us going.

Yes, that would be the best way IMHO.

>The alternative approach I actually favor is one I have previously described:
>Add some text that says that domain literals in message ids should be generated
>using the most restrictive syntax and with well-defined semantics, i.e. an IPv4
>or IPv6 literal. To mind the bigger problem here is that someone will generate
>something like [foobar] here instead of putting in an actual global IPv4 or
>IPv6 address. If we encourage people to use domain literals with defined
>semantics we solve several problems at one go.

Sure, but that was the intent of my suggested "'liberal syntax' MUST be further
restricted to conform to some published specification" which you did not
like. But I see that Pete has suggested a possible text which achieves
much the same effect.


>> >3.6.  Field definitions

>> I have already pointed out, in a separate thread, the severe
>> interoperability problem that arises with Netnews if you do not require a
>> SP after the colon. Since every MUA I am aware of routinely inserts that
>> SP, I cannot see that anything would be lost by requiring it here.

>And as I commented previously, not everything that has an submissiion client in
>it is an MUA. There are quick and dirty submission clients embedded in all
>sorts of places - one of the advantages of SMTP is that you can code a quick
>and dirty client very easily - and leaving out every possible character is
>exactly the sort of things these gizmos do. Heck, they  even do it when they've
>actually got plenty of space to space - unnecessary optimization is RAMPANT in
>the embedded systems world.

But surely anybody writing a script to do a "dirty submission" is going to
write something like:

   sprintf(buffer, "Foo-Header: %s, %d, ...\n", stuff1, stuff2)

either in C, or in one of the many scripting languages (e.g. Perl) which
supports constructs like that, because that is the easiest way to generate
such things. And they will tend to put that SP in simply because that is
how they always expect to see headers. It would take a conscious
decision to be "different" for them to do otherwise.


>> >3.6.2.  Originator fields

>> >   The originator fields indicate the mailbox(es) of the source of the
>> >   message.  The "From:" field specifies the author(s) of the message,
>> >   that is, the mailbox(es) of the person(s) or system(s) responsible
>> >   for the writing of the message....

>> Are those sentences intended to be normative, BCP (or even deliberately
>> vague :-) ).

>Don't see any capitalized words there, do you? So I guess there are
>no compliance implications.

>> For example, some people 'munge' their From: addresses in order to appear
>> anonymous, or to confuse address harvesters. ...

>> The wording currently proposed by the USEFOR WG for this is:

>>    Contrary to [RFC2822], which implies that the mailbox or mailboxes in
>>    the From header field should be that of the poster or posters, a
>>    poster who does not, for whatever reason, wish to use his own mailbox
>>    MAY use any mailbox ending in the top level domain ".invalid"
>>    [RFC2606].

>> But if RFC2822 does not actually imply that, then we might have to think
>> again.

>And IMO it should not imply that. The last thing email needs at this point is
>more license to use invalid addresses.

No, you misread what I said. "If RFC does not actually imply that
munged_addresses_etc_are_disallowed" (and you seem to be saying that it
does not), then the USEFOR WG needs to review that wording which says
"Contrary to [RFC2822]...", because it would not be contrary.

But, since people seem to be reading if both ways, it needs to be
clarified.

>> >3.6.4.  Identification fields
>> >
>> >   The "Message-ID:" field contains a single unique message identifier.
>> >   The "References:" and "In-Reply-To:" field each contain one or more
>> >   unique message identifiers, optionally separated by CFWS.
>>                                 ^^^^^^^^^^

>> Interoperability with Netnews would be improved without that "optionally".

>Perhaps, but like it or not Netnews compatibility is not our primary goal here.

It is a very common practice to gateway message-lists into Netnews (you
can find this list on Usenet if you look for it). And it is also common
for people to both mail and post the same message/article. Therefore,
interoperability with Netnews is an important goal, especially where it
can be achieved with zero or minimal disruption to present practices.

Actually, the CFWS in the References header is one of the less urgent
problems. A "SHOULD include" it would be strong enough, given that USEFOR
felt able to say that its absence "SHOULD be accepted", so that eventually
the two will come into line.


>> >   The "References:" field will contain the contents of the parent's
>> >   "References:" field (if any) followed by the contents of the parent's
>> >   "Message-ID:" field (if any). ...

>> It would be useful to mention that when the References field gets too long
>> it MAY be pruned (the minimum requirement being to retain the first and
>> the last two entries - including the one just being added). I have known
>> of cases where References fields grew to such a length (and MUAs in the
>> followup chain had failed to introduce folding, or even removed folding
>> already present) that the 998 limit was breached with disastrous
>> consequences.

>Adding such a suggestion would be fine with me were it not for the context of
>this effort - every such change increases the likelihood of a problem getting
>to draft.

All it needs is a MAY. Given that lack of it has actually caused breakage
in the past, and certainly does no harm, it would seem wise to allow it.
USEFOR intends to allow (and even to encourage) it.

>> >3.6.5.  Informational fields
>> >

>> >   ....  When used in a reply, the field body MAY start with the
>> >   string "Re: " (from the Latin "res", in the matter of) followed by
>> >   the contents of the "Subject:" field body of the original message.

>> If we are going to discuss Latin Grammar, then please let us to so
>> correctly. "Res" is the nominative form of the fifth declension noun
>> meaning "thing", "matter", "issue", etc.  "Re" is an abbreviation of the
>> phrase "in re" meaning "in the matter of", and in which "re" is the
>> ablative form of the same noun (the preposition "in" is always followed by
>> an ablative in static cases such as this, though it takes the accusative
>> form - e.g. "in rem" - in dynamic cases where the meaning is "into").

>> so if, instead of
>>     the string "Re: " (from the Latin "res", in the matter of)
>> you write
>>     the string "Re: " (an abbreviation of the Latin "in re", meaning "in
>>     the matter of")
>> all will be correct.

>Yep, now that I think about it you're correct. This is a reasonable change.
>Alternately, the  whole thing about the Latin could be omitted.

I think it is needed because two many people imagine that it is an
abbreviation of "reference", and then try to use an abbreviation of an
equivelent word in their own language. And that definitely causes things
to break.




>> >Appendix A.  Example messages
>> >
>> >   Messages are delimited in this section between lines of "----".  The
>> >   "----" lines are not part of the message itself.

>> That is indeed an excellent notation. The Bad News is that you have
>> nowhere used it :-( .

>Yes, and so this sentence would best be removed.

I would prefer that the feature be used as intended.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5