[apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03

Dave Crocker <dhc@dcrocker.net> Wed, 10 April 2013 15:00 UTC

Return-Path: <dhc@dcrocker.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 94A7021F9819 for <apps-discuss@ietfa.amsl.com>; Wed, 10 Apr 2013 08:00:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.999
X-Spam-Level:
X-Spam-Status: No, score=-5.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_83=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UrwnIR65DtdM for <apps-discuss@ietfa.amsl.com>; Wed, 10 Apr 2013 08:00:51 -0700 (PDT)
Received: from sbh17.songbird.com (sbh17.songbird.com [72.52.113.17]) by ietfa.amsl.com (Postfix) with ESMTP id A79C721F97E5 for <apps-discuss@ietf.org>; Wed, 10 Apr 2013 08:00:51 -0700 (PDT)
Received: from [192.168.1.66] (76-218-9-215.lightspeed.sntcca.sbcglobal.net [76.218.9.215]) (authenticated bits=0) by sbh17.songbird.com (8.13.8/8.13.8) with ESMTP id r3AF0nKm004177 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 10 Apr 2013 08:00:49 -0700
Message-ID: <51657E9D.4050900@dcrocker.net>
Date: Wed, 10 Apr 2013 08:00:45 -0700
From: Dave Crocker <dhc@dcrocker.net>
Organization: Brandenburg InternetWorking
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: "Murray S. Kucherawy" <superuser@gmail.com>, "Gregory N. Shapiro" <gshapiro@sendmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0 (sbh17.songbird.com [72.52.113.17]); Wed, 10 Apr 2013 08:00:50 -0700 (PDT)
Cc: Apps Discuss <apps-discuss@ietf.org>
Subject: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: dcrocker@bbiw.net
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Apr 2013 15:00:54 -0000

Review of:    Advice for Safe Handling of Malformed Messages

I-D:          draft-ietf-appsawg-malformed-mail-03

Reviewer:     D. Crocker

Review Date:  10 April 2013


Summary:

       Internet Mail has always been marked by an unfortunate degree of 
regular and permitted non-conformance to its formal specifications.  The 
current draft seeks to categorize and discuss common types of 
non-conformance and to provide some guidance for how it should be 
handled.  The document is explicit in stating that it does not have the 
goal of standardizing this guidance.

      The document is reasonably clear and complete. I believe a 
document like this can provide very helpful guidance for email 
developers and operators.  It would be useful in its current form, but 
could greatly benefit from some modification.

      One major concern, which is easily remedied, is the draft's use of 
normative language.  The document is often unusually careful to use 
qualifying language that precisely limits the scope of the normative 
text to "a module compliant with this memo".  However I think this is 
too subtle for most readers and that the use of normative language 
defeats the stated limitation of not wanting to define a standard. Hence 
I changing all such language and, instead, using language that is 
clearly only modest "advice", such as with:

    *  a common handling is...
    *  it is best to...
    *  it will typically be safe and helpful to...

and so on.



Detailed Comments:


> Abstract
>
>    The email ecosystem has long had a very permissive set of common
>    processing rules in place, despite increasingly rigid standards
>    governing its components, ostensibly to improve the user experience.

      Although Internet mail formats have been precisely defined since 
the 1970s, authoring and handling software often show only mild 
conformance to the specifications.  The distributed and non-interactive 
nature of email has often prompted adjustments to receiving software, to 
handle these variations, rather than trying to gain better conformance 
by senders, since the receiving operator is primarily driven by 
complaining recipient users and has no authority over the sending side 
of the system.


>    The handling of these come at some cost, and various components are

      Processing with such flexibility comes at some cost, since mail 
software is faced with...


>    faced with decisions about whether or not to permit non-conforming
>    messages to continue toward their destinations unaltered, adjust them
>    to conform (possibly at the cost of losing some of the original
>    message), or outright rejecting them.

      A core requirement for interoperability is that both sides to an 
exchange work from the same details and semantics.  By having receivers 
be flexible, beyond the specifications, there can -- and often has been 
-- a good chance that a message will not be fully interoperable.  Worse, 
a well-established pattern of tolerance for variations can sometimes be 
used as an attack vector.


>    This document includes a collection of the best advice available
>    regarding a variety of common malformed mail situations, to be used
>    as implementation guidance.  It must be emphasized, however, that the
>    intent of this document is not to standardize malformations or
>    otherwise encourage their proliferation.  The messages are manifestly
>    malformed, and the code and culture that generates them needs to be
>    fixed.  Therefore, these messages should be rejected outright if at
>    all possible.  Nevertheless, many malformed messages from otherwise
>    legitimate senders are in circulation and will be for some time, and,
>    unfortunately, commercial reality shows that we cannot always simply
>    reject or discard them.  Accordingly, this document presents
>    alternatives for dealing with them in ways that seem to do the least
>    additional harm until the infrastructure is tightened up to match the
>    standards.


>
> 1.  Introduction
>
> 1.1.  The Purpose Of This Work
>
>    The history of email standards, going back to [RFC822] and beyond,

{ here I actually suggest citing RFC 733, since it managed to establish 
the solid foundation, with 822 being a relatively small modification. 
733 was not the first formal standard, but the first had poor adoption. /d}


>    contains a fairly rigid evolution of specifications.  But
>    implementations within that culture have also long had an
>    undercurrent known formally as the robustness principle, but also
>    known informally as Postel's Law: "Be conservative in what you do, be
>    liberal in what you accept from others."

     Jon Postel's directive is often misinterpreted to mean that any 
deviance from a specification is acceptable.  Rather, it was intended 
only to account for legitimate variations in interpretation /within 
specifications, as well as basic transit errors, like bit errors.  Taken 
to its unintended extreme, excessive tolerance would imply that there 
are no limits to the liberties that a sender might take, while presuming 
a burden on a receiver to "correctly" guess at the meaning of any such 
variation.

{BTW, I believe Postel's Law was not the motivating reason for email 
format deviations.  Rather, I think that receiver's were accountable to 
their users -- the recipients -- while having no control over the 
misbehaving senders.  So they/we hacked receiving code when necessary, 
to appease the users. /d }


>    In general, this served the email ecosystem well by allowing a few
>    errors in implementations without obstructing participation in the
>    game.  The proverbial bar was set low.  However, as we have evolved
>    into the current era, some of these lenient stances have begun to
>    expose opportunities that can be exploited by malefactors.  Various
>    email-based applications rely on strong application of these
>    standards for simple security checks, while the very basic building
>    blocks of that infrastructure, intending to be robust, fail utterly
>    to assert those standards.
>
>    This document presents some areas in which the more lenient stances
>    can provide vectors for attack, and then presents the collected
>    wisdom of numerous applications in and around the email ecosystem for
>    dealing with them to mitigate their impact.
>
> 1.2.  Not The Purpose Of This Work
>
>    It is important to understand that this work is not an effort to
>    endorse or standardize certain common malformations.  The code and
>    culture that introduces such messages into the mail stream needs to
>    be repaired, as the security penalty now being paid for this lax
>    processing arguably outweighs the reduction in support costs to end
>    users who are not expected to understand the standards.  However, the
>    reality is that this will not be fixed quickly.
>
>    Given this, it is beneficial to provide implementers with guidance
>    about the safest or most effective way to handle malformed messages
>    when they arrive, taking into consideration the tradeoffs of the
>    choices available especially with respect to how various actors in
>    the email ecosystem respond to such messages in terms of handling,
>    parsing, or rendering to end users.
>
> 1.3.  General Considerations
>
>    Many deviations from message format standards are considered by some
>    receivers to be strong indications that the message is undesirable,
>    i.e., is spam or contains malware.  Such receivers quickly decide
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 4]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    that the best handling choice is simply to reject or discard the
>    message.  This means malformations caused by innocent
>    misunderstandings or ignorance of proper syntax can cause messages
>    with no ill intent also to fail to be delivered.
>
>    Senders that want to ensure message delivery are best advised to
>    adhere strictly to the relevant standards (including, but not limited
>    to, [MAIL], [MIME], and [DKIM]), as well as observe other industry
>    best practices such as may be published from time to time either by
>    the IETF or independently.
>
> 2.  Document Conventions
>
> 2.1.  Key Words
>
>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
>    document are to be interpreted as described in [KEYWORDS].  However,
>    they only have that meaning in this document when they are presented
>    entirely in upper case.


{ While the document is clear that its use of normative language is 
meant to apply only to those implementations choosing to conform to this 
document, the document itself says -- appropriately, IMO -- that it is 
not trying to standardize these behaviors.  It's therefore confusing and 
probably counter-productive to use normative language.  I strongly urge 
dropping all such language and, instead, only offering modest "advice" 
with language like:

    *  a common handling is...
    *  it is best to...
    *  it will typically be safe and helpful to...

    and so on.   /d}


> 2.2.  Examples
>
>    Examples of message content include a number within braces at the end
>    of each line.  These are line numbers for use in subsequent
>    discussion, and are not actually part of the message content
>    presented in the example.
>
>    Blank lines are not numbered in the examples.
>
> 3.  Background
>
>    The reader would benefit from reading [EMAIL-ARCH] for some general
>    background about the overall email architecture.  Of particular
>    interest is the Internet Message Format, detailed in [MAIL].
>    Throughout this document, the use of the term "messsage" should be

{ Freud possibly at work for this missspellling? /d}


>    assumed to mean a block of text conforming to the Internet Message
>    Format.
>
> 4.  Internal Representations
>
>    Any agent handling a message could have one or two (or more) distinct

      As an agent parses and processes a message, it might create a 
number of distinct representations for the message.


>    representations of a message it is handling.  One is an internal
>    representation, such as a block of storage used for the header and a
>    block for the body.  These may be sorted, encoded, decoded, etc., as
>    per the needs of that particular module.  The other is the
>    representation that is output to the next agent in the handling
>    chain.  This might be identical to the version that is input to the
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 5]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    module, or it might have some changes such as added or reordered
>    header fields, body modifications to remove malicious content, etc.
>
>    In some cases, advice is provided only for internal representations.
>    However, there is often occasion to mandate changes to the output as
>    well.

{ What does this last sentence mean?  "Mandate"?  Perhaps what is meant 
is: /d}

      However it is sometimes necessary to make changes between the 
input and output versions, as well.


>
> 5.  Invariate Content

      Invariant {?}


>
>    Experience has shown that it is beneficial to ensure that, from the
>    first analysis agent at ingress into the destination Administrative
>    Management Domain (ADMD; see [EMAIL-ARCH]) to the agent that actually
>    affects delivery to the end user, the message each agent sees is

{ This is an artfully-crafted sentence, but it would be easier to read 
if broken into parts.  Perhaps: /d}

      An especially interesting handling sequence occurs within the 
destination Administrative Management Domain (ADMD; see [EMAIL-ARCH]). 
 From ingress to the ADMD, through the boundary agent, until delivery to 
the end user, it is beneficial to ensure that each agent sees an 
identical form for the message.


>    identical.  Absent this, it can be impossible for different agents in
>    the chain to make assertions about the content that correlate.


      the chain to make consistent assertions about the content.


>    For example, suppose a handling agent records that a message had some
>    specific set of properties at ingress to the ADMD, then permitted it
>    to continue inbound.  Some other agent alters the content for some
>    reason.  The user, on viewing the delivered content, reports the
>    message as abusive.  If the report is based on the set of properties

      message as abuse.  However, report processing often takes place 
at, or close to, the original point of ingress and is likely to be based 
on the set of properties recorded there, rather than at the user's system.


>    recorded at ingress, then the complaint effectively references a
>    message different from what the user saw, which could render the
>    complaint inactionable.  Similarly, a message with properties that a
>    filtering agent might use to reject an abusive message could be
>    allowed to reach the user if an intermediate agent altered the
>    message in a manner that alters one of those properties, thwarting
>    detection of the abuse.

{ awkward sentence structure. d/}


>    Therefore, agents comprising an inbound message processing

      comprising an inbound  -> within an integrated message

{or should this simply say 'within an ADMD'? /d}


>    environment SHOULD ensure that each agent sees the same content, and
>    the message reaches the end user unmodified.  An exception to this is
>    content that is identitfied as certainly harmful, such as some kind
>    of malicious executable software included in the message.

{the 'exception' sentence is far too specific.  There are, no doubt, 
many reasons for deviating from this recommendation.  Simpler, safer and 
non-normative wording would be:  /d}

      environment will simplify operational concerns by ensuring that 
each agent receives the same content -- except for the usual handling 
agent trace information additions -- and that this is what reaches the 
end user, unmodified.  Various policies, such as special handling for 
detected message abuse, will make exceptions appropriate.


> 6.  Mail Submission Agents
>
>    Within the email context, the single most influential component that
>    can reduce the presence of malformed items in the email system is the
>    Mail Submission Agent (MSA).  This is the component that is
>    essentially the interface between end users that create content and
>    the mail stream.

      the Mail Handling Service (MHS) [EMAIL-ARCH]


>    The lax processing described earlier in the document creates a high

{this first sentence is out of place.  the earlier discussion in the 
document established the need for better conformance; it doesn't need to 
be sold here, again. /d}


>    support and security cost overall.  Thus, MSAs MUST evolve to become
>    more strict about enforcement of all relevant email standards,
>    especially [MAIL] and the [MIME] family of documents.
>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 6]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    Relay Mail Transport Agents (MTAs) SHOULD also be more strict;

      Relay -> Relaying

{ This pseudo-normative phrasing does nothing helpful, since it isn't 
actually specifying anything.  Modify to something like: /d}

      More strict conformance by relaying MTAs also will be helpful. 
Although


>    although preventing the dissemination of malformed messages is
>    desirable, the rejection of such mail already in transit also has a
>    support cost, namely the creation of a [DSN] that many end users
>    might not understand.
>
> 7.  Line Terminaton
>
>    The only valid line separation sequence in messaging is ASCII 0x0D

      For interoperable Internet Mail messages, the only valid...


>    ("carriage return", or CR) followed by ASCII 0x0A ("line feed", or
>    LF), commonly referred to as CRLF.  Common UNIX user tools, however,
>    typically only use LF for line termination.  This means the protocol
>    has to convert LF to CRLF before transporting a message.

      for internal line termination.  This means that a protocol engine, 
which converts between Unix and Internet Mail formats, has to convert 
between these two end-of-line representations, before transmitting a 
message or after receiving it.


>    Naive implementations can cause messages to be transmitted with a mix

{ These aren't "naive".  They are quite simply broken!  d/}

      Implementations that do not conform to Internet Mail standards 
sometimes cause messages to be transmitted...


>    of line terminations, such as LF everywhere except CRLF only at the
>    end of the message.  According to [SMTP], this means the entire
>    message actually exists on a single line.

{ also RFC 5322! /d}


>    A "naked" CR or LF in a message has no reasonable justification, and

{ this is wrong.  they have legitimate presentation uses, albeit pretty 
archaic at this point.  Better:  /d }

      Within modern Internet Mail it is highly unlikely that an isolated 
CR or LF is valid, in common ASCII text.  Furthermore [MIME]...


>    furthermore [MIME] presents mechanisms for encoding content that
>    actually does need to contain such an unusual character sequence.
>
>    Thus, handling agents MUST treat naked CRs and LFs as CRLFs when
>    interpreting the message.

      Thus, it will typically be safe and helpful to treat a naked CR or 
LF as equivalent to a CRLF, when parsing a message.


> 8.  Header Anomalies
>
>    This section covers common syntactical and semantic anomalies found
>    in headers of messages, and presents preferred mitigations.

      in a message header, and


> 8.1.  Converting Obsolete and Invalid Syntaxes
>
>    There are numerous cases of obsolete header syntaxes that can be
>    applied to confound agents with variable processing.  This section

{ The phrasing of the first sentence sounds as if confounding is a goal. 
  If it's meant that way, say it more clearly.  If it isn't, perhaps:  /d }

      A message using an obsolete header syntax might confound an agent 
that is attempting to be robust in its handling of syntax variations.


>    presents some examples of these.  Messages including them SHOULD be

{ 'of these'?  of which? /d}

{ Why reject this particular set?  What about others, outside these 
examples?  Again, phrase this non-normatively. /d}


>    rejected; where this is not possible, RECOMMENDED internal
>    interpretations are provided.
>
> 8.1.1.  Host-Address Syntax
>
>    The following obsolete syntax:

      The following obsolete syntax that attempts to specify source routing:

{ explain, or perhaps even cite the old ABNF rule for it /d}


>
>        To: <@example.net:fran@example.com>
>
>    should be interpreted as:

      can safely be interpreted as:


>        To: <fran@example.com>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 7]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
> 8.1.2.  Excessive Angle Brackets
>
>    The following over-use of angle brackets, e.g.:
>
>        To: <<<user2@example.org>>>
>
>    should be interpreted as:

      can safely be interpreted as:


>        To: <user2@example.org>
>
> 8.1.3.  Unbalanced Angle Brackets
>
>    The following use of unbalanced angle brackets:
>
>        To: <another@example.net
>        To: second@example.org>
>
>    should be interpreted as:

     can usually be treated as:


>        To: <another@example.net>
>        To: second@example.org
>
> 8.1.4.  Unbalanced Parentheses
>
>    The following use of unbalanced parentheses:
>
>        To: (Testing <fran@example.com>
>        To: Testing) <sam@example.com>
>
>    should be interpreted as:
>
>        To: (Testing) <fran@example.com>
>        To: "Testing)" <sam@example.com>
>
> 8.1.5.  Unbalanced Quotes
>
>    The following use of unbalanced quotation marks:
>
>        To: "Joe <joe@example.com>
>
>    should be interpreted as:
>
>        To: "Joe <joe@example.com>"@example.net

{ WTF??? And why is this a good fixup, especially given concerns about 
display-name attack vectors?  /d}


>    where "example.net" is the domain name or host name of the handling
>    agent making the interpretation.
>
>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 8]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
> 8.2.  Non-Header Lines
>
>    It has been observed that some messages contain a line of text in the

      Some messages contain a line of...


>    header that is not a valid message header field of any kind.  For
>    example:
>
>        From: user@example.com {1}
>        To: userpal@example.net {2}
>        Subject: This is your reminder {3}
>        about the football game tonight {4}
>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {5}
>
>        Don't forget to meet us for the tailgate party! {7}
>
>    The cause of this is typically a bug in a message generator of some
>    kind.  Line {4} was intended to be a continuation of line {3}; it
>    should have been indented by whitespace as set out in Section 2.2.3
>    of [MAIL].
>
>    This anomaly has varying impacts on processing software, depending on
>    the implementation:
>
>    1.  some agents choose to separate the header of the message from the
>        body only at the first empty line (i.e. a CRLF immediately
>        followed by another CRLF);
>
>    2.  some agents assume this anomaly should be interpreted to mean the
>        body starts at line {4}, as the end of the header is assumed by
>        encountering something that is not a valid header field or folded
>        portion thereof;
>
>    3.  some agents assume this should be interpreted as an intended
>        header folding as described above and thus simply append a single
>        space character (ASCII 0x20) and the content of line {4} to that
>        of line {3};
>
>    4.  some agents reject this outright as line {4} is neither a valid
>        header field nor a folded continuation of a header field prior to
>        an empty line.
>
>    This can be exploited if it is known that one message handling agent
>    will take one action while the next agent in the handling chain will
>    take another.  Consider, for example, a message filter that searches
>    message headers for properties indicative of abusive of malicious
>    content that is attached to a Mail Transfer Agent (MTA) implementing
>    option 2 above.  An attacker could craft a message that includes this
>    malformation at a position above the property of interest, knowing
>    the MTA will not consider that content part of the header, and thus
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 9]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    the MTA will not feed it to the filter, thus avoiding detection.
>    Meanwhile, the Mail User Agent (MUA) which presents the content to an
>    end user, implements option 1 or 3, which has some undesirable
>    effect.
>
>    It should be noted that a few implementations choose option 4 above
>    since any reputable message generation program will get header
>    folding right, and thus anything so blatant as this malformation is
>    likely an error caused by a malefactor.
>
>    The preferred implementation if option 4 above is not employed is to
>    apply the following heuristic when this malformation is detected:
>
>    1.  Search forward for an empty line.  If one is found, then apply
>        option 3 above to the anomalous line, and continue.
>
>    2.  Search forward for another line that appears to be a new header
>        field, i.e., a name followed by a colon.  If one is found, then
>        apply option 3 above to the anomalous line, and continue.
>
> 8.3.  Unusual Spacing
>
>    The following message is valid per [MAIL]:
>
>        From: user@example.com {1}
>        To: userpal@example.net {2}
>        Subject: This is your reminder {3}
>         {4}
>         about the football game tonight {5}
>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
>
>        Don't forget to meet us for the tailgate party! {8}
>
>    Line {4} contains a single whitespace.  The intended result is that
>    lines {3}, {4}, and {5} comprise a single continued header field.
>    However, some agents are aggressive at stripping trailing whitespace,
>    which will cause line {4} to be treated as an empty line, and thus
>    the separator line between header and body.  This can affect header-
>    specific processing algorithms as described in the previous section.
>
>    Ideally, this case simply ought not to be generated.

{This sentence is entirely gratuitous. Replace it with: d/ }

      This example was legal in earlier versions of the Internet Mail 
format standard.

>
>    Message handling agents receiving a message bearing this anomaly MUST
>    behave as if line {4} was not present on the message, and SHOULD emit
>    a version in which line {4} has been removed.

      The best handling of this example is for a message parsing engine 
to behave as if line {4} was not present in the message and for a 
message creation engine to emit the message with line {4} removed,.


>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 10]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
> 8.4.  Header Malformations
>
>    There are various malformations that exist.  A common one is

{ The first sentence is pretty obvious: there are always lots of ways to 
screw up. I suggest dropping it and beginning with something like:  /d}

    Among the many possible malformations, a common one is...


>    insertion of whitespace at unusual locations, such as:
>
>        From: user@example.com {1}
>        To: userpal@example.net {2}
>        Subject: This is your reminder {3}
>        MIME-Version : 1.0 {4}
>        Content-Type: text/plain {5}
>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
>
>        Don't forget to meet us for the tailgate party! {8}
>
>    Note the addition of whitespace in line {4} after the header field
>    name but before the colon that separates the name from the value.
>
>    The acceptance grammar of [MAIL] permits that extra whitespace, so it
>    cannot be considered invalid.  However, a consensus of
>    implementations prefers to remove that whitespace.  There is no
>    perceived change to the semantics of the header field being altered
>    as the whitespace is itself semantically meaningless.  Thus, a module
>    compliant with this memo MUST remove all whitespace after the field
>    name but before the colon, and MUST emit that version of that field
>    on output.

      Therefore, it is best to remove all whitespace after the field 
name but before the colon and to emit the field in this modified form.


> 8.5.  Header Field Counts
>
>    Section 3.6 of [MAIL] prescribes specific header field counts for a
>    valid message.  Few agents actually enforce these in the sense that a
>    message whose header contents exceed one or more limits set there are
>    generally allowed to pass; they may add any required fields that are

    ; they typically add any...


>    missing, however.
>
>    Also, few agents that use messages as input, including Mail User
>    Agents (MUAs) that actually display messages to users, verify that
>    the input is valid before proceeding.  Two popular open source
>    filtering programs and two popular Mailing List Management (MLM)

{ I suggest changing 'two' to 'some', since the number might change; 
there's no reason to make this document get out of date for such a minor 
issue. /d }


>    packages examined at the time this document was written select either

{ hence, remove "examined at the time this document was written" /d }


>    the first or last instance of a particular field name, such as From,
>    to decide who sent a message.  Absent enforcement of [MAIL], an

      Absent strict enforcement


>    attacker can craft a message with multiple fields if that attacker
>    knows the filter will make a decision based on one but the user will
>    be shown the other.
>
>    This situation is exacerbated when a claim of message validity is
>    inferred by something like a valid [DKIM] signature.  Such a
>    signature might cover one instance of a constrained field but not
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 11]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    another, and a naive consumer of DKIM's output, not realizing which
>    one was covered by a valid signature, could presume the wrong one was
>    the "good" one.  An MUA, for example could show the first of two From
>    fields as "good" or "safe" while the DKIM signature actually only
>    verified the second.

{ DKIM signatures do not verify addresses outside d=.  While the problem 
you are describing is, of course, real, it's far more complicated than 
you've described here.  Perhaps:  /d }

      when message validity is assessed, such as through enhanced 
authentication methods.  Such methods might cover one instance of a 
constrained field but not another, taking the wrong one as "good" or "safe".


>    Thus, an agent compliant with this specification MUST enact one of
>    the following:

      In attempting to counter this exposure, one of the following can 
be enacted:


>    1.  reject outright or refuse to process further any input message
>        that does not conform to Section 3.6 of [MAIL];
>
>    2.  remove or, in the case of an MUA, refuse to render any instances
>        of a header field whose presence exceeds a limit prescribed in
>        Section 3.6 of [MAIL] when generating its output;
>
>    3.  alter the name of any header field whose presence exceeds a limit
>        prescribed in Section 3.6 of [MAIL] when generating its output so
>        that later agents can produce a consistent result.  Any
>        alteration likely to cause the field to be ignored by downstream
>        agents is acceptable.  A common approach is to prefix the field
>        names with a string such as "BAD-".

{ it would help if there were some rationales or analyses of the 
tradeoffs amongst these kinds of choices, to help the 
implementer/operator decide when to use which.  /d }



> 8.6.  Missing Header Fields
>
>    Similar to the previous section, there are messages seen in the wild
>    that lack certain required header fields.  For example, [MAIL]
>    requires that a From and Date field be present in all messages.

{ I think these aren't 'examples' but constitute the entire list.  If 
there are other required fields that can be classed as 'missing', this 
section should list them.  Also, since Message-ID isn't 'required', the 
phrasing here doesn't quite match what's discussed in the section. Might 
be worth distinguishing "required but missing" vs. "optional but really 
useful and worth synthesizing".  Synthesizing the latter probably isn't 
dangerous.  Synthesizing the former always is... d/}


>
>    When presented with a message lacking these fields, the MTA might
>    perform one of the following:
>
>    1.  Make no changes
>
>    2.  Add an instance of the missing field(s) using synthesized content

      3.  Reject the message


>    Option 2 is RECOMMENDED for handling this case.  Handling agents

{ Wow!  Synthesizing a From: field strikes me as especially dangerous, 
in all cases.  The rationale provided, below, needs to state this and, I 
believe, explain how and why it is worth incurring. The explanation that 
is provided essentially define this hack as an attack vector... /d}


>    SHOULD add these for internal hanlding if they are missing, but MUST
>    NOT add them to the external representation.  The reason for this
>    requirement is that there are some filter modules that would consider
>    the absence of such fields to be a condition warranting special
>    treatment (e.g., rejection), and thus the effectiveness of such
>    modules would be stymied by an upstream filter adding them.
>
>    The synthesized fields SHOULD contain a best guess as to what should
>    have been there; for From, the SMTP MAIL command's address can be
>    used (if not null) or a placeholder address followed by an address
>    literal (e.g., unknown@[192.0.2.1]); for Date, a date extracted from
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 12]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    a Received field is a reasonable choice.
>
>    One other important case to consider is a missing Message-Id field.
>    An MTA that encounters a message missing this field SHOULD synthesize
>    a valid one using techniques described above and add it to the
>    external rpresentation, since many deployed tools use the content of
>    that field as a common unique message reference, so its absence
>    inhibits correlation of message processing.  One possible synthesis
>    would be based on based on an encoding of the current date/time and
>    an internal MTA ID (e.g., queue ID) followed by @ and the fully
>    qualified hostname of the machine synthesizing the header value.  For
>    example:
>
>        tm = gmtime(&now);
>        (void) snprintf(buf, sizeof(buf), "%04d%02d%02d%02d%02d.%s@%s",
>                        tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday,
>                        tm->tm_hour, tm->tm_min, queueID, fqhn);
>
> 8.7.  Eight-Bit Data
>
>    Standards-compliant mail messages do not contain any non-ASCII data
>    without indicating that such content is present by means of published
>    [SMTP] extensions.  Absent that, [MIME] encodings are typically used

Overall, the document sometimes mixes transfer issues with data 
representation (object) issues, in ways that can be confusing.  This 
paragraph is one of those.  It's worth the extra verbosity to label each 
clearly and separately.  So, for example:

      Standards-compliant mail messages that contain non-ASCII data are 
required to self-label this through the use of [MIME].  If the 
representation of the non-ASCII data is in an 8-bit mode (rather than 
special encoding so that it retains a 7-bit base), then this must be 
signaled through the use of [SMTP] extensions.


>    without indicating that such content is present by means of published
>    [SMTP] extensions.

>    to convert non-ASCII data to ASCII in a way that can be reversed by
>    other handling agents or end users.
>
>    Non-ASCII data otherwise found in messages can confound code that is
>    used to analyze content.  For example, a null (ASCII 0x00) byte
>    inside a message can cause typical string processing functions to
>    mis-identify the end of a string, which can be exploited to hide
>    malicious content from analysis processes.
>
>    Handling agents MUST reject messages containing null bytes that are
>    not encoded in some standard way, and SHOULD reject other non-ASCII
>    bytes that are similarly not encoded.  If rejection is not done, an
>    ASCII-compatible encoding such as those defined in [MIME] SHOULD be
>    used.

{ Hmmm.  It occurs to me that the document might be helped by an early 
discussion about a/the 'philosophy' that guides choosing whether to 
reject a message versus repair it.  But I don't have any clever text to 
suggest for doing this... /d}


>
> 9.  MIME Anomalies
>
>    [MIME], et seq, define a mechanism of message extensions for

{ perhaps quibbling, but since MIME does a variety of things, including 
this one, I suggest:

      define -> includes


>    providing text in character sets other than ASCII, non-text
>    attachments to messages, multi-part message bodies, and similar
>    facilities.
>
>    Some anomalies with MIME-compliant generation are also common.  This
>    section discusses some of those and presents preferred mitigations.
>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 13]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
> 9.1.  Header Field Names
>
>    [MAIL] permits header field names to begin with "--".  This means
>    that a header field name can look like a [MIME] multipart boundary.
>    For example:
>
>      --foo:bar
>
>    This is a legal header field, whose name is "--foo" and whose value
>    is "bar".  Thus, consider this header:
>
>        From: user@example.com {1}
>        To: userpal@example.net {2}
>        Subject: This is your reminder {3}
>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {4}
>        MIME-Version: 1.0 {5}
>        Content-Type: multipart/mixed; boundary="foo:bar" {6}
>        --foo:bar {7}
>        Malicious-Content: muahaha {8}
>
>    One implementation could observe that line {7} announces the
>    beginning of the first MIME part while another considers it a part of
>    the message's header.
>
>    If rejection of such messages cannot be done, agents MUST treat line
>    {7} as part of the message's header block and not a MIME boundary.

{ Under what circumstances can rejection /not/ be done??? And what is 
involved in even detecting that it isn't a mime boundary?  d/ }


>
> 9.2.  Missing MIME-Version Field
>
>    Any message that uses [MIME] constructs is required to have a MIME-
>    Version header field.  Without them, the Content-Type and associated
>    fields have no semantic meaning.

      them -> it


>    It is often observed that a message has complete MIME structure, yet
>    lacks this header field.
>
>    As described at the end of Section 8.2, this is not expected from a

      this -> this omission


>    reputable content generator and is often an indication of mass-
>    produced spam or other undesirable messages.
>
>    Therefore, an agent compliant with this specification MUST internally
>    enact one or more of the following in the absence of a MIME-Version
>    header field:
>
>    1.  Ignore all other MIME-specific fields, even if they are
>        syntactically valid, thus treating the entire message as a
>        single-part message of type text/plain;

{ Offhand, this sounds like a potentially-interesting attack vector.  /d}


>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 14]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    2.  Remove all other MIME-specific fields, even if they are
>        syntactically valid, both internally and when emitting the output
>        version of the message;
>
> 10.  Body Anomalies
>
> 10.1.  Oversized Lines
>
>    A message containing a line of content that exceeds 998 characters
>    plus the line terminator (1000 total) violates Section 2.1.1 of
>    [MAIL].  Some handling agents may not look at content in a single
>    line past the first 998 bytes, providing bad actors an opportunity to
>    hide malicious content.
>
>    There is no specified way to handle such messages, other than to
>    observe that they are non-compliant and reject them, or rewrite the
>    oversized line such that the message is compliant.
>
>    Handling agents MUST take one of the following actions:
>
>    1.  Break such lines into multiple lines at a position that does not
>        change the semantics of the text being thus altered.  For
>        example, breaking an oversized line such that a [URI] then spans
>        two lines could inhibit the proper identification of that URI.
>
>    2.  Rewrite the MIME part (or the entire message if not MIME) that
>        contains the excessively long line using a content encoding that
>        breaks the line in the transmission but would still result in the
>        line being intact on decoding for presentation to the user.  Both
>        of the encodings declared in [MIME] can accomplish this.
>
> 11.  Security Considerations
>
>    The discussions of the anomalies above and their prescribed solutions
>    are themselves security considerations.  The practises enumerated in
>    this memo are generally perceived as attempts to resolve security
>    considerations that already exist rather than introducing new ones.

{ Hmmm.  Whereas I think the document introduces quite a few attack 
vectors that probably aren't discussed in other email specifications. /d}


>
> 12.  IANA Considerations
>
>    This memo contains no actions for IANA.
>
>    [RFC Editor: Please remove this section prior to publication.]
>
> 13.  References
>
>
>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 15]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
> 13.1.  Normative References
>
>    [KEYWORDS]    Bradner, S., "Key words for use in RFCs to Indicate
>                  Requirement Levels", BCP 14, RFC 2119, March 1997.
>
>    [MAIL]        Resnick, P., "Internet Message Format", RFC 5322,
>                  October 2008.
>
> 13.2.  Informative References
>
>    [DKIM]        Allman, E., Callas, J., Delany, M., Libbey, M., Fenton,
>                  J., and M. Thomas, "DomainKeys Identified Mail (DKIM)
>                  Signatures", RFC 4871, May 2007.
>
>    [DSN]         Moore, K. and G. Vaudreuil, "An Extensible Message
>                  Format for Delivery Status Notifications", RFC 3464,
>                  January 2003.
>
>    [EMAIL-ARCH]  Crocker, D., "Internet Mail Architecture", RFC 5598,
>                  July 2009.
>
>    [MIME]        Freed, N. and N. Borenstein, "Multipurpose Internet
>                  Mail Extensions (MIME) Part One: Format of Internet
>                  Message Bodies", RFC 2045, November 1996.
>
>    [RFC822]      Crocker, D., "Standard for the Format of Internet Text
>                  Messages", RFC 822, August 1982.
>
>    [SMTP]        Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
>                  October 2008.
>
>    [URI]         Berners-Lee, T., Fielding, R., and L. Masinter,
>                  "Uniform Resource Identifier (URI): Generic Syntax",
>                  RFC 3986, January 2005.
>
> Appendix A.  Acknowledgements
>
>    The author wishes to acknowledge the following for their review and
>    constructive criticism of this proposal: Tony Hansen, and Franck
>    Martin
>
> Authors' Addresses
>
>    Murray S. Kucherawy
>
>    EMail: superuser@gmail.com
>
>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 16]
> 
> Internet-Draft             Safe Mail Handling               October 2012
>
>
>    Gregory N. Shapiro
>
>    EMail: gshapiro@sendmail.com
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Kucherawy & Shapiro      Expires April 12, 2013                [Page 17]
> 

-- 
  Dave Crocker
  Brandenburg InternetWorking
  bbiw.net

-- 
  Dave Crocker
  Brandenburg InternetWorking
  bbiw.net