Re: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03

Ned Freed <> Wed, 15 May 2013 18:09 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id E4F4021F8FD0 for <>; Wed, 15 May 2013 11:09:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.38
X-Spam-Status: No, score=-1.38 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, DATE_IN_PAST_24_48=1.219]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id dHK+IYqt5zvG for <>; Wed, 15 May 2013 11:09:43 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 0D55521F8FF2 for <>; Wed, 15 May 2013 11:09:38 -0700 (PDT)
Received: from by (PMDF V6.1-1 #35243) id <> for; Wed, 15 May 2013 11:04:36 -0700 (PDT)
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=iso-8859-1
Received: from by (PMDF V6.1-1 #35243) id <>; Wed, 15 May 2013 11:04:33 -0700 (PDT)
Message-id: <>
Date: Tue, 14 May 2013 10:07:28 -0700 (PDT)
From: Ned Freed <>
In-reply-to: "Your message dated Sun, 12 May 2013 23:47:12 -0700" <>
References: <> <> <> <>
To: "Murray S. Kucherawy" <>
Cc: Ned Freed <>, Dave Crocker <>, Apps Discuss <>
Subject: Re: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 15 May 2013 18:09:49 -0000

> Hi Ned, thanks for your comments.  I've worked in most of your suggestions,
> but have the following to ask further on some of your points:


> On Mon, May 6, 2013 at 2:14 PM, Ned Freed <> wrote:

> > Eight-Bit Data (section 8.7) needs a bunch of work, and once again it's a
> > failure to consider the semantics of where the 8bit shows up. 8bit
> > appearing in
> > a header is a very different proposition from 8bit in a body. EAI deserves
> > a
> > shout-out in the former case because it's new, and the latter case is now
> > so
> > much of a ho-hum for many agents that a recommendation to reject 8bit is
> > nothing short of silly.
> >
> > I'm also far from convinced that rejecting messages because of a single
> > null is
> > a good idea. I think a fair assessment of the likely intent in this case
> > is the
> > presence of a null or two is simply a message construction error, and
> > silently
> > dropping them is a much better bet.
> >

> Eight bit data is far from something I'm expert on.  Could I ask you to
> write up a paragraph or three to include here, or replace what's here?

This is going to take a lot more than three paragraphs to do properly. I'd
suggest something like this (apologies in advance - this is *very* rough, but
I'm in the middle of an office move and don't have time right now to wordsmith
the text the way I normally would):


8.7 Missing or incorrect charset information

MIME provides the means to include textual material employing charsets other
than US-ASCII. Such material is required to have an identifiable charset.
Charset identification is done using a charset parameter in the content-type
header field, a charset label within the MIME entity itself, or the charset
may be implicitly specified by the content-type [RFC 6657].

It is unfortunately fairly common for required charset information to be
missing or incorrect in textual MIME entities. As such, processing agents
should perform basic sanity checks, e.g.,

(1) US-ASCII is 7bit only so 8bit material is necessarily not US-ASCII.
(2) UTF-8 has a very specific syntactic structure that other 8bit charsets
    are unlikely to follow.
(3) Nulls (ASCII 0x00) are not allowed in either 7bit or 8bit data.
(4) Not all 7bit material is US-ASCII. The presence of the various escape
    sequences used for character switching may be used as an indication
    of the various iso-2022-* charsets.

When a charset error is detected processing agents should:

(a) apply heuristics to determine the most likely charset and if successful
    proceed using that information, or
(b) refuse to process the malformed MIME entity.

A null (ASCII 0x00) byte inside a textual MIME entity can cause typical string
processing functions to mis-identify the end of a string, which can be
exploited to hide malicious content from analysis processes. According, nulls 
require additional special handling.

A few nulls in isolation is likely to be the result of poor message
construction practices. Such nulls should be silently dropped. 

Large numbers of nulls are usually the result of binary material that is
improperly encoded, labelled, or both. Such material is likely
to be damaged beyond the hope of recovery so the best course of action
is to refuse to process it.

Finally, the presence of nulls may be used as indication of possible
malicious intent.

8.8 8bit data

Standards-compliant mail messages do not contain any non-ASCII data
without indicating that such content is present by means of published
[SMTP] extensions.  Absent that, [MIME] encodings are typically used
to convert non-ASCII data to ASCII in a way that can be reversed by
other handling agents or end users.

The best way to handle incompliant 8bit material depends on its location.

Incompliant 8bit in MIME entity content should simply be processed as if the
necessary SMTP extensions had been used to transfer the message. Note that
improperly labeled 8bit material in textual MIME entities may require treatment
as described in section 8.7.

Incompliant 8bit in message or MIME entity header fields can be handled
as follows:

(1) Occurrences in unstructured text fields, comments and phrases
    can be converted into encoded-words [RFC 2047] if a likely charset can
    be determined. Alternately, 8bit characters can be removed or replaced
    with some other character.

(2) Occurrences in header fields whose syntax is unknown may be handled
    by dropping the field entirely or by removing/replacing the 8bit
    character as in (1).

(3) Occurrences in addresses are especially problematic. Agents supporting
    [EAI] may, if the 8bit conforms to 8bit syntax, elect to treat the
    messages as an EAI message and process it accordingly. Otherwise it is
    in most cases best to exclude the address from any sort of processing -
    which may mean dropping it entirely - since any attempt to fix it is
    unlikely to help.


One thing I'm strongly inclined to add is the option of presenting material
to a user using an interface that lets them select the charset. Like it or not,
this is a successful strategy employed by UAs in the field. 

> >
> > Header Field Names (section 9.1) is certainly cute, but MIME is actually
> > quite clear that the first place a boundary can occur is in the following
> > body, not the associated header. I'm not really comfortable with
> > this document acting on an assessment of intent of material that is in fact
> > completely valid. My suggestion is therefore to refrain from talking about
> > rejecting such messages.
> >

> It's a real attack seen in the wild.  I see your point about MIME and the
> location of boundaries, however, so I think this section can just go.

It's an attack using a valid message on egregiously broken software. In
fact it's likely an indicator that an even more fundamental problem is present:
Searching forward into the message for data rather than first picking out the
header and processing it separately.

Given that the underlying problem may well be exploitable in other ways, the
only real solution in such cases is to fix the broken software. Nothing else
really suffices. That said, the issue of exploitable errors in software is
beyond the scope of this document, so I agree the best thing is to remove this

> >
> > I'm afraid "Missing MIME-Version Field" (section 9.2) is flat-out
> > incorrect.
> > The intent of a message that has a valid content-type and possibly
> > content-transfer-encoding is pretty darned clear, and perhaps more to the
> > point, failing to interpret such a message as MIME, given that many other
> > agents, e.g., metamail, are going to, is exceedingly poor practice from a
> > security standpoint.
> >

> This is also something real I dealt with at a previous employer.  More
> often than not it was seen as part of a spam attack that targeted specific
> MUAs.  I'm fine with removing this section however if the advice is
> controversial; I don't have access to data to back up how important or
> useful this change was.

Not sure what you're talking about here. The section describes no specific
attack of any sort; it simply offers the observation that a message with MIME
structure and no MIME-Version is more likely to be spam than a message that
does  contain a MIME-Version:, and then recommends that MIME structure not be
processed or be removed when this occurs.

It has not been my observation that missing MIME-Version: correlates all that
well with spam; plenty of legitimate agents omit it and spam has evolved to
include it. (This should be seen as natural and inevitable, after all, spam
agents are under more evolutionary pressure than legitimate agents.)

But more to the point, lots of software, including metamail, the original MIME
software, pays no attention to MIME-Version: at all. And the chances of this
changes are, IMO, remote.

As such, the direct consequence of this recommendation is to enable a far worse
attack: One where malicious content sneaks by agents written to comply with
this specification and gets to software that will interpret the MIME structure
and process the malicious content.

I therefore believe that not only is what the section says incorrect, it's
also inappropriate to remove it. The correct thing to do is say that the
MIME structure should always be interpreted when present, especially by
agents checking for malicious material.

You could also add that it may be appropriate to use the lack of a MIME-Version
as an indicator that malicious content may be present and to take additional
precautions in processing the message. I believe that's the best way to address
the concern this section was originally aimed at.

> >
> > (If we're up for additions at this late date, it might also be good to add
> > a
> > section on how to handle bogus base64 or quoted-printable.)
> >

> Yup, we are.  Do you have anything specific in mind?

It's basically the same "exploit different interpretations by different
decoders" issue. For example, the original base64 defined in RFC 1113 -1115
allowed comments inside parentheses. Other implementations will silently ignore
parentheses and interpret whatever is inside as data. The possible exploits of
such differences are obvious.