Re: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03

Ned Freed <ned.freed@mrochek.com> Tue, 07 May 2013 02:06 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46B5721F9381 for <apps-discuss@ietfa.amsl.com>; Mon, 6 May 2013 19:06:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.141
X-Spam-Level:
X-Spam-Status: No, score=-0.141 tagged_above=-999 required=5 tests=[BAYES_40=-0.185, DATE_IN_PAST_03_06=0.044]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rnBKNkutttZp for <apps-discuss@ietfa.amsl.com>; Mon, 6 May 2013 19:06:01 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id 20EC721F9377 for <apps-discuss@ietf.org>; Mon, 6 May 2013 19:06:01 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OTC5406FDC004II7@mauve.mrochek.com> for apps-discuss@ietf.org; Mon, 6 May 2013 19:00:58 -0700 (PDT)
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=iso-8859-1
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OT3BOFFH80000054@mauve.mrochek.com>; Mon, 6 May 2013 19:00:55 -0700 (PDT)
Message-id: <01OTC53YHHWQ000054@mauve.mrochek.com>
Date: Mon, 06 May 2013 14:14:55 -0700 (PDT)
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Sun, 05 May 2013 23:42:16 -0700" <CAL0qLwb-Aj+Te2uYJZo8g5LR4B6JREPFATTPSLGf_L4LvgMrZQ@mail.gmail.com>
References: <51657E80.8070208@bbiw.net> <CAL0qLwb-Aj+Te2uYJZo8g5LR4B6JREPFATTPSLGf_L4LvgMrZQ@mail.gmail.com>
To: "Murray S. Kucherawy" <superuser@gmail.com>
Cc: Dave Crocker <dcrocker@bbiw.net>, Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 May 2013 02:06:06 -0000

I reviewed this quite some time ago, and as I recall my assessment was mostly
favorable.

I just reviewed it again, and surprised myself considerably by finding this
document to be quite problematic in its present form. It's now my position that
a number of additions and changes need to be made.

The overarching problem here isn't what this document does; it's what it
doesn't do. It presents a small set of malformation issues, a set which is
necessarily only informed by past and present issues encountered in the field.
This is fine as far as it goes, but the fact that things have changed in the
short time since this document was written - we're not seeing several of the
listed malformation nearly as often as we used to and we're starting to see new
ones that aren't on the list - is a good indication that the lack of such
guidance is a serious omission, especially in what is supposed to be a stable
reference for such matters.

So what would such guidance look like? I don't pretend to have the final answer
here, but these three points strike me a a good starting point:

(1) Whenever possible mitigation of syntactic malformations should be guided
    by an assessment of the most likely semantic intent. For example,
    it is reasonable to conclude that multiple sets of angle brackets
    around an address are simply superflous and can simply be dropped.

(2) When the intent is unclear, or alternately, when it is clear but is
    inpractical to change things to express it, mitigation should be limited
    to cases where not doing anything would clearly lead to a worse outcome.

(3) Security issues, when present, need to be addressed and may force
    mitigation strategies that are otherwise suboptimal.

Now let's consider the various recommendations the document currently makes in
light of these guidelines.

Section 5 really needs to distinguish between semantic and syntactic
invariance, especially in light of syntactic variations caused by things like
MIME downgrade/update, to say nothing of the explicit advice given in sections
8.3 and 8.4 of this document.

Line termination (section 7) is fine. True, the intent of a bare CR might be to
overstrike a line, but this is increasingly a minority taste these days and a
failure to deal with bare CRs and LFs likely leads to a worse outcome overall.

Malformed address fields (section 8.1.*) suffers from a problem separate from
consideration of these guidelines: The failure to deal with the possible
presence of multiple addresses. Unfotunately such cases are quite common
and some thought needs to be given to them in at least some of these sections.
For example, in section 8.1.3 it might be good to point out that a comma
may reaonably be interpreted as ending an address:

       To: <third@example.netnet, fourth@example.net> -->
       To: third@example.net, fourth@example.net

Beyond that, and now considering the guidlines, I find the advice given in
8.1.5 to be incorrect. I'm sorry, but the chances that something like:

   "Joe <joe@example.com>"@example.net

is going to work are remote in the extreme. All this is going to do is create a
situation where a probem occurs in a context that's far removed from the actual
cause. The only case where I'd actually consider such a "mitigation" would be
where it is essential to pull something out. Otherwise leaving the field alone
is the better bet, because that will make it easier to diagnose the actual
problem.

I'll also note that the failure to consider the fairly common case of 
a field that just says:

   To: Joe

seems to me to be an omission that should be corrected, especially since this
is attempting to cover SUBMIT server fixups.

Non-Header Lines (section 8.2) is OK. (I happen to think the advice given in
8.2 is flat-out wrong, but that's a matter of experience coupled with
implementation practicalities; it's not something I can justify based on
the guidelines.)

Unusual spacing (section 8.3) and Header Malformations (section 8.4) are both
spot on.

Header Field Counts (section 8.5) needs work because it fails to take specific
field semantics into account in what it recommends. It is of course true that
when performing some sort of validation check on an originator field it's
essential to pick one and be consistent about it. But there are many other
fields where such checks are rarely if ever applied, e.g., To: and Cc:. Given
that  there are agents out there that employ a separate field for each address,
surely a viable mitigation is to combine all recipient fields of a given type
into a single field?

Missing header fields (section 8.6) is spot on.

Eight-Bit Data (section 8.7) needs a bunch of work, and once again it's a
failure to consider the semantics of where the 8bit shows up. 8bit appearing in
a header is a very different proposition from 8bit in a body. EAI deserves a
shout-out in the former case because it's new, and the latter case is now so
much of a ho-hum for many agents that a recommendation to reject 8bit is
nothing short of silly.

I'm also far from convinced that rejecting messages because of a single null is
a good idea. I think a fair assessment of the likely intent in this case is the
presence of a null or two is simply a message construction error, and silently
dropping them is a much better bet.

Header Field Names (section 9.1) is certainly cute, but MIME is actually
quite clear that the first place a boundary can occur is in the following
body, not the associated header. I'm not really comfortable with
this document acting on an assessment of intent of material that is in fact
completely valid. My suggestion is therefore to refrain from talking about
rejecting such messages.

I'm afraid "Missing MIME-Version Field" (section 9.2) is flat-out incorrect.
The intent of a message that has a valid content-type and possibly
content-transfer-encoding is pretty darned clear, and perhaps more to the
point, failing to interpret such a message as MIME, given that many other
agents, e.g., metamail, are going to, is exceedingly poor practice from a
security standpoint.

(If we're up for additions at this late date, it might also be good to add a
section on how to handle bogus base64 or quoted-printable.)

And finally, Oversized Lines (section 10.1) is spot on.

Finally, a couple of comments about Dave's review. I'm not especially fussed
about the use or non-use of compliance language. If there's a consensus to
remove it that's fine, if not that's fine with me too.

As for this:

> > {BTW, I believe Postel's Law was not the motivating reason for email
> > format deviations.  Rather, I think that receiver's were accountable to
> > their users -- the recipients -- while having no control over the
> > misbehaving senders.  So they/we hacked receiving code when necessary, to
> > appease the users. /d }

This is more or less correct. Not only do receivers rarely have any way to
correct sender behavior, in a depressingly large number of cases some of the
flagrant violations came from widely deployed software from large companies. In
other cases it's the receiving user agent that's busted, and intermediaries get
stuck with trying to accomodate their bustedness. And both are especially bad
when deployed in hardware with no viable upgrade path.

				Ned