Re: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03

"Murray S. Kucherawy" <superuser@gmail.com> Mon, 06 May 2013 06:42 UTC

Return-Path: <superuser@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 69E1A21F856D for <apps-discuss@ietfa.amsl.com>; Sun, 5 May 2013 23:42:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.973
X-Spam-Level:
X-Spam-Status: No, score=-0.973 tagged_above=-999 required=5 tests=[AWL=-1.574, BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_83=0.6, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i1KPTU7czN7v for <apps-discuss@ietfa.amsl.com>; Sun, 5 May 2013 23:42:23 -0700 (PDT)
Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) by ietfa.amsl.com (Postfix) with ESMTP id 6459021F8E5D for <apps-discuss@ietf.org>; Sun, 5 May 2013 23:42:17 -0700 (PDT)
Received: by mail-wi0-f177.google.com with SMTP id hq12so2204722wib.10 for <apps-discuss@ietf.org>; Sun, 05 May 2013 23:42:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=+9uglJA78W0cngvatsGUgZcC5o1SOhmw5y488mNitgE=; b=Q5FlA4/KhpqbGy/e596D12nWBOXRyzT29lq2t5udFUcO9T/LWuJZ284awoqfm+TU/F SFJpNnukwCXXMwuKftOdQoYCGn01ECkr5Iu3wm+Jxy5Y+EkNmooRN7Tzf+LpPc8u8dbY o61QoVYN9/lSMZGcj0vu9eSZm2/njUJ8GPI1rt35a6VXPKNMeCboidiXlji92MTtntN0 LefQiHaaSwWO0njL8uqvcDYfNWwGPQrw7PuHrv9RogLiy4Zic8skXZBb0f0viBMftBqt Iow/ikIqiMDEJgrnIw9DGO+M+z/C5+rhTwsewItFLtq5Ioavi/UGwSfk0LTqp7lA9uNY 3eTQ==
MIME-Version: 1.0
X-Received: by 10.194.59.208 with SMTP id b16mr23516940wjr.15.1367822536436; Sun, 05 May 2013 23:42:16 -0700 (PDT)
Received: by 10.180.14.34 with HTTP; Sun, 5 May 2013 23:42:16 -0700 (PDT)
In-Reply-To: <51657E80.8070208@bbiw.net>
References: <51657E80.8070208@bbiw.net>
Date: Sun, 05 May 2013 23:42:16 -0700
Message-ID: <CAL0qLwb-Aj+Te2uYJZo8g5LR4B6JREPFATTPSLGf_L4LvgMrZQ@mail.gmail.com>
From: "Murray S. Kucherawy" <superuser@gmail.com>
To: Dave Crocker <dcrocker@bbiw.net>
Content-Type: multipart/alternative; boundary="047d7b86de326670fd04dc07005f"
Cc: Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Review of: draft-ietf-appsawg-malformed-mail-03
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 May 2013 06:42:30 -0000

Thanks, Dave.  I'm waiting for my co-author to get back to me on two points
in your review that I can't really answer, and then we'll at long last post
a new version for the WG to review.  I'll try to solicit a couple more
reviews and then suggest Salvatore start WGLC on it.

-MSK, hatless this time


On Wed, Apr 10, 2013 at 8:00 AM, Dave Crocker <dcrocker@bbiw.net> wrote:

>
> Review of:    Advice for Safe Handling of Malformed Messages
>
> I-D:          draft-ietf-appsawg-malformed-**mail-03
>
> Reviewer:     D. Crocker
>
> Review Date:  10 April 2013
>
>
> Summary:
>
>       Internet Mail has always been marked by an unfortunate degree of
> regular and permitted non-conformance to its formal specifications.  The
> current draft seeks to categorize and discuss common types of
> non-conformance and to provide some guidance for how it should be handled.
>  The document is explicit in stating that it does not have the goal of
> standardizing this guidance.
>
>      The document is reasonably clear and complete. I believe a document
> like this can provide very helpful guidance for email developers and
> operators.  It would be useful in its current form, but could greatly
> benefit from some modification.
>
>      One major concern, which is easily remedied, is the draft's use of
> normative language.  The document is often unusually careful to use
> qualifying language that precisely limits the scope of the normative text
> to "a module compliant with this memo".  However I think this is too subtle
> for most readers and that the use of normative language defeats the stated
> limitation of not wanting to define a standard. Hence I changing all such
> language and, instead, using language that is clearly only modest "advice",
> such as with:
>
>    *  a common handling is...
>    *  it is best to...
>    *  it will typically be safe and helpful to...
>
> and so on.
>
>
>
> Detailed Comments:
>
>
>  Abstract
>>
>>    The email ecosystem has long had a very permissive set of common
>>    processing rules in place, despite increasingly rigid standards
>>    governing its components, ostensibly to improve the user experience.
>>
>
>      Although Internet mail formats have been precisely defined since the
> 1970s, authoring and handling software often show only mild conformance to
> the specifications.  The distributed and non-interactive nature of email
> has often prompted adjustments to receiving software, to handle these
> variations, rather than trying to gain better conformance by senders, since
> the receiving operator is primarily driven by complaining recipient users
> and has no authority over the sending side of the system.
>
>
>     The handling of these come at some cost, and various components are
>>
>
>      Processing with such flexibility comes at some cost, since mail
> software is faced with...
>
>
>     faced with decisions about whether or not to permit non-conforming
>>    messages to continue toward their destinations unaltered, adjust them
>>    to conform (possibly at the cost of losing some of the original
>>    message), or outright rejecting them.
>>
>
>      A core requirement for interoperability is that both sides to an
> exchange work from the same details and semantics.  By having receivers be
> flexible, beyond the specifications, there can -- and often has been -- a
> good chance that a message will not be fully interoperable.  Worse, a
> well-established pattern of tolerance for variations can sometimes be used
> as an attack vector.
>
>
>     This document includes a collection of the best advice available
>>    regarding a variety of common malformed mail situations, to be used
>>    as implementation guidance.  It must be emphasized, however, that the
>>    intent of this document is not to standardize malformations or
>>    otherwise encourage their proliferation.  The messages are manifestly
>>    malformed, and the code and culture that generates them needs to be
>>    fixed.  Therefore, these messages should be rejected outright if at
>>    all possible.  Nevertheless, many malformed messages from otherwise
>>    legitimate senders are in circulation and will be for some time, and,
>>    unfortunately, commercial reality shows that we cannot always simply
>>    reject or discard them.  Accordingly, this document presents
>>    alternatives for dealing with them in ways that seem to do the least
>>    additional harm until the infrastructure is tightened up to match the
>>    standards.
>>
>
>
>
>> 1.  Introduction
>>
>> 1.1.  The Purpose Of This Work
>>
>>    The history of email standards, going back to [RFC822] and beyond,
>>
>
> { here I actually suggest citing RFC 733, since it managed to establish
> the solid foundation, with 822 being a relatively small modification. 733
> was not the first formal standard, but the first had poor adoption. /d}
>
>
>     contains a fairly rigid evolution of specifications.  But
>>    implementations within that culture have also long had an
>>    undercurrent known formally as the robustness principle, but also
>>    known informally as Postel's Law: "Be conservative in what you do, be
>>    liberal in what you accept from others."
>>
>
>     Jon Postel's directive is often misinterpreted to mean that any
> deviance from a specification is acceptable.  Rather, it was intended only
> to account for legitimate variations in interpretation /within
> specifications, as well as basic transit errors, like bit errors.  Taken to
> its unintended extreme, excessive tolerance would imply that there are no
> limits to the liberties that a sender might take, while presuming a burden
> on a receiver to "correctly" guess at the meaning of any such variation.
>
> {BTW, I believe Postel's Law was not the motivating reason for email
> format deviations.  Rather, I think that receiver's were accountable to
> their users -- the recipients -- while having no control over the
> misbehaving senders.  So they/we hacked receiving code when necessary, to
> appease the users. /d }
>
>
>     In general, this served the email ecosystem well by allowing a few
>>    errors in implementations without obstructing participation in the
>>    game.  The proverbial bar was set low.  However, as we have evolved
>>    into the current era, some of these lenient stances have begun to
>>    expose opportunities that can be exploited by malefactors.  Various
>>    email-based applications rely on strong application of these
>>    standards for simple security checks, while the very basic building
>>    blocks of that infrastructure, intending to be robust, fail utterly
>>    to assert those standards.
>>
>>    This document presents some areas in which the more lenient stances
>>    can provide vectors for attack, and then presents the collected
>>    wisdom of numerous applications in and around the email ecosystem for
>>    dealing with them to mitigate their impact.
>>
>> 1.2.  Not The Purpose Of This Work
>>
>>    It is important to understand that this work is not an effort to
>>    endorse or standardize certain common malformations.  The code and
>>    culture that introduces such messages into the mail stream needs to
>>    be repaired, as the security penalty now being paid for this lax
>>    processing arguably outweighs the reduction in support costs to end
>>    users who are not expected to understand the standards.  However, the
>>    reality is that this will not be fixed quickly.
>>
>>    Given this, it is beneficial to provide implementers with guidance
>>    about the safest or most effective way to handle malformed messages
>>    when they arrive, taking into consideration the tradeoffs of the
>>    choices available especially with respect to how various actors in
>>    the email ecosystem respond to such messages in terms of handling,
>>    parsing, or rendering to end users.
>>
>> 1.3.  General Considerations
>>
>>    Many deviations from message format standards are considered by some
>>    receivers to be strong indications that the message is undesirable,
>>    i.e., is spam or contains malware.  Such receivers quickly decide
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 4]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    that the best handling choice is simply to reject or discard the
>>    message.  This means malformations caused by innocent
>>    misunderstandings or ignorance of proper syntax can cause messages
>>    with no ill intent also to fail to be delivered.
>>
>>    Senders that want to ensure message delivery are best advised to
>>    adhere strictly to the relevant standards (including, but not limited
>>    to, [MAIL], [MIME], and [DKIM]), as well as observe other industry
>>    best practices such as may be published from time to time either by
>>    the IETF or independently.
>>
>> 2.  Document Conventions
>>
>> 2.1.  Key Words
>>
>>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>>    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
>>    document are to be interpreted as described in [KEYWORDS].  However,
>>    they only have that meaning in this document when they are presented
>>    entirely in upper case.
>>
>
>
> { While the document is clear that its use of normative language is meant
> to apply only to those implementations choosing to conform to this
> document, the document itself says -- appropriately, IMO -- that it is not
> trying to standardize these behaviors.  It's therefore confusing and
> probably counter-productive to use normative language.  I strongly urge
> dropping all such language and, instead, only offering modest "advice" with
> language like:
>
>    *  a common handling is...
>    *  it is best to...
>    *  it will typically be safe and helpful to...
>
>    and so on.   /d}
>
>
>  2.2.  Examples
>>
>>    Examples of message content include a number within braces at the end
>>    of each line.  These are line numbers for use in subsequent
>>    discussion, and are not actually part of the message content
>>    presented in the example.
>>
>>    Blank lines are not numbered in the examples.
>>
>> 3.  Background
>>
>>    The reader would benefit from reading [EMAIL-ARCH] for some general
>>    background about the overall email architecture.  Of particular
>>    interest is the Internet Message Format, detailed in [MAIL].
>>    Throughout this document, the use of the term "messsage" should be
>>
>
> { Freud possibly at work for this missspellling? /d}
>
>
>     assumed to mean a block of text conforming to the Internet Message
>>    Format.
>>
>> 4.  Internal Representations
>>
>>    Any agent handling a message could have one or two (or more) distinct
>>
>
>      As an agent parses and processes a message, it might create a number
> of distinct representations for the message.
>
>
>     representations of a message it is handling.  One is an internal
>>    representation, such as a block of storage used for the header and a
>>    block for the body.  These may be sorted, encoded, decoded, etc., as
>>    per the needs of that particular module.  The other is the
>>    representation that is output to the next agent in the handling
>>    chain.  This might be identical to the version that is input to the
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 5]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    module, or it might have some changes such as added or reordered
>>    header fields, body modifications to remove malicious content, etc.
>>
>>    In some cases, advice is provided only for internal representations.
>>    However, there is often occasion to mandate changes to the output as
>>    well.
>>
>
> { What does this last sentence mean?  "Mandate"?  Perhaps what is meant
> is: /d}
>
>      However it is sometimes necessary to make changes between the input
> and output versions, as well.
>
>
>
>> 5.  Invariate Content
>>
>
>      Invariant {?}
>
>
>
>>    Experience has shown that it is beneficial to ensure that, from the
>>    first analysis agent at ingress into the destination Administrative
>>    Management Domain (ADMD; see [EMAIL-ARCH]) to the agent that actually
>>    affects delivery to the end user, the message each agent sees is
>>
>
> { This is an artfully-crafted sentence, but it would be easier to read if
> broken into parts.  Perhaps: /d}
>
>      An especially interesting handling sequence occurs within the
> destination Administrative Management Domain (ADMD; see [EMAIL-ARCH]). From
> ingress to the ADMD, through the boundary agent, until delivery to the end
> user, it is beneficial to ensure that each agent sees an identical form for
> the message.
>
>
>     identical.  Absent this, it can be impossible for different agents in
>>    the chain to make assertions about the content that correlate.
>>
>
>
>      the chain to make consistent assertions about the content.
>
>
>     For example, suppose a handling agent records that a message had some
>>    specific set of properties at ingress to the ADMD, then permitted it
>>    to continue inbound.  Some other agent alters the content for some
>>    reason.  The user, on viewing the delivered content, reports the
>>    message as abusive.  If the report is based on the set of properties
>>
>
>      message as abuse.  However, report processing often takes place at,
> or close to, the original point of ingress and is likely to be based on the
> set of properties recorded there, rather than at the user's system.
>
>
>     recorded at ingress, then the complaint effectively references a
>>    message different from what the user saw, which could render the
>>    complaint inactionable.  Similarly, a message with properties that a
>>    filtering agent might use to reject an abusive message could be
>>    allowed to reach the user if an intermediate agent altered the
>>    message in a manner that alters one of those properties, thwarting
>>    detection of the abuse.
>>
>
> { awkward sentence structure. d/}
>
>
>     Therefore, agents comprising an inbound message processing
>>
>
>      comprising an inbound  -> within an integrated message
>
> {or should this simply say 'within an ADMD'? /d}
>
>
>     environment SHOULD ensure that each agent sees the same content, and
>>    the message reaches the end user unmodified.  An exception to this is
>>    content that is identitfied as certainly harmful, such as some kind
>>    of malicious executable software included in the message.
>>
>
> {the 'exception' sentence is far too specific.  There are, no doubt, many
> reasons for deviating from this recommendation.  Simpler, safer and
> non-normative wording would be:  /d}
>
>      environment will simplify operational concerns by ensuring that each
> agent receives the same content -- except for the usual handling agent
> trace information additions -- and that this is what reaches the end user,
> unmodified.  Various policies, such as special handling for detected
> message abuse, will make exceptions appropriate.
>
>
>  6.  Mail Submission Agents
>>
>>    Within the email context, the single most influential component that
>>    can reduce the presence of malformed items in the email system is the
>>    Mail Submission Agent (MSA).  This is the component that is
>>    essentially the interface between end users that create content and
>>    the mail stream.
>>
>
>      the Mail Handling Service (MHS) [EMAIL-ARCH]
>
>
>     The lax processing described earlier in the document creates a high
>>
>
> {this first sentence is out of place.  the earlier discussion in the
> document established the need for better conformance; it doesn't need to be
> sold here, again. /d}
>
>
>     support and security cost overall.  Thus, MSAs MUST evolve to become
>>    more strict about enforcement of all relevant email standards,
>>    especially [MAIL] and the [MIME] family of documents.
>>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 6]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    Relay Mail Transport Agents (MTAs) SHOULD also be more strict;
>>
>
>      Relay -> Relaying
>
> { This pseudo-normative phrasing does nothing helpful, since it isn't
> actually specifying anything.  Modify to something like: /d}
>
>      More strict conformance by relaying MTAs also will be helpful.
> Although
>
>
>     although preventing the dissemination of malformed messages is
>>    desirable, the rejection of such mail already in transit also has a
>>    support cost, namely the creation of a [DSN] that many end users
>>    might not understand.
>>
>> 7.  Line Terminaton
>>
>>    The only valid line separation sequence in messaging is ASCII 0x0D
>>
>
>      For interoperable Internet Mail messages, the only valid...
>
>
>     ("carriage return", or CR) followed by ASCII 0x0A ("line feed", or
>>    LF), commonly referred to as CRLF.  Common UNIX user tools, however,
>>    typically only use LF for line termination.  This means the protocol
>>    has to convert LF to CRLF before transporting a message.
>>
>
>      for internal line termination.  This means that a protocol engine,
> which converts between Unix and Internet Mail formats, has to convert
> between these two end-of-line representations, before transmitting a
> message or after receiving it.
>
>
>     Naive implementations can cause messages to be transmitted with a mix
>>
>
> { These aren't "naive".  They are quite simply broken!  d/}
>
>      Implementations that do not conform to Internet Mail standards
> sometimes cause messages to be transmitted...
>
>
>     of line terminations, such as LF everywhere except CRLF only at the
>>    end of the message.  According to [SMTP], this means the entire
>>    message actually exists on a single line.
>>
>
> { also RFC 5322! /d}
>
>
>     A "naked" CR or LF in a message has no reasonable justification, and
>>
>
> { this is wrong.  they have legitimate presentation uses, albeit pretty
> archaic at this point.  Better:  /d }
>
>      Within modern Internet Mail it is highly unlikely that an isolated CR
> or LF is valid, in common ASCII text.  Furthermore [MIME]...
>
>
>     furthermore [MIME] presents mechanisms for encoding content that
>>    actually does need to contain such an unusual character sequence.
>>
>>    Thus, handling agents MUST treat naked CRs and LFs as CRLFs when
>>    interpreting the message.
>>
>
>      Thus, it will typically be safe and helpful to treat a naked CR or LF
> as equivalent to a CRLF, when parsing a message.
>
>
>  8.  Header Anomalies
>>
>>    This section covers common syntactical and semantic anomalies found
>>    in headers of messages, and presents preferred mitigations.
>>
>
>      in a message header, and
>
>
>  8.1.  Converting Obsolete and Invalid Syntaxes
>>
>>    There are numerous cases of obsolete header syntaxes that can be
>>    applied to confound agents with variable processing.  This section
>>
>
> { The phrasing of the first sentence sounds as if confounding is a goal.
>  If it's meant that way, say it more clearly.  If it isn't, perhaps:  /d }
>
>      A message using an obsolete header syntax might confound an agent
> that is attempting to be robust in its handling of syntax variations.
>
>
>     presents some examples of these.  Messages including them SHOULD be
>>
>
> { 'of these'?  of which? /d}
>
> { Why reject this particular set?  What about others, outside these
> examples?  Again, phrase this non-normatively. /d}
>
>
>     rejected; where this is not possible, RECOMMENDED internal
>>    interpretations are provided.
>>
>> 8.1.1.  Host-Address Syntax
>>
>>    The following obsolete syntax:
>>
>
>      The following obsolete syntax that attempts to specify source routing:
>
> { explain, or perhaps even cite the old ABNF rule for it /d}
>
>
>
>>        To: <@example.net:fran@example.com**>
>>
>>    should be interpreted as:
>>
>
>      can safely be interpreted as:
>
>
>         To: <fran@example.com>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 7]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>> 8.1.2.  Excessive Angle Brackets
>>
>>    The following over-use of angle brackets, e.g.:
>>
>>        To: <<<user2@example.org>>>
>>
>>    should be interpreted as:
>>
>
>      can safely be interpreted as:
>
>
>         To: <user2@example.org>
>>
>> 8.1.3.  Unbalanced Angle Brackets
>>
>>    The following use of unbalanced angle brackets:
>>
>>        To: <another@example.net
>>        To: second@example.org>
>>
>>    should be interpreted as:
>>
>
>     can usually be treated as:
>
>
>         To: <another@example.net>
>>        To: second@example.org
>>
>> 8.1.4.  Unbalanced Parentheses
>>
>>    The following use of unbalanced parentheses:
>>
>>        To: (Testing <fran@example.com>
>>        To: Testing) <sam@example.com>
>>
>>    should be interpreted as:
>>
>>        To: (Testing) <fran@example.com>
>>        To: "Testing)" <sam@example.com>
>>
>> 8.1.5.  Unbalanced Quotes
>>
>>    The following use of unbalanced quotation marks:
>>
>>        To: "Joe <joe@example.com>
>>
>>    should be interpreted as:
>>
>>        To: "Joe <joe@example.com>"@example.net
>>
>
> { WTF??? And why is this a good fixup, especially given concerns about
> display-name attack vectors?  /d}
>
>
>     where "example.net" is the domain name or host name of the handling
>>    agent making the interpretation.
>>
>>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 8]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>> 8.2.  Non-Header Lines
>>
>>    It has been observed that some messages contain a line of text in the
>>
>
>      Some messages contain a line of...
>
>
>     header that is not a valid message header field of any kind.  For
>>    example:
>>
>>        From: user@example.com {1}
>>        To: userpal@example.net {2}
>>        Subject: This is your reminder {3}
>>        about the football game tonight {4}
>>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {5}
>>
>>        Don't forget to meet us for the tailgate party! {7}
>>
>>    The cause of this is typically a bug in a message generator of some
>>    kind.  Line {4} was intended to be a continuation of line {3}; it
>>    should have been indented by whitespace as set out in Section 2.2.3
>>    of [MAIL].
>>
>>    This anomaly has varying impacts on processing software, depending on
>>    the implementation:
>>
>>    1.  some agents choose to separate the header of the message from the
>>        body only at the first empty line (i.e. a CRLF immediately
>>        followed by another CRLF);
>>
>>    2.  some agents assume this anomaly should be interpreted to mean the
>>        body starts at line {4}, as the end of the header is assumed by
>>        encountering something that is not a valid header field or folded
>>        portion thereof;
>>
>>    3.  some agents assume this should be interpreted as an intended
>>        header folding as described above and thus simply append a single
>>        space character (ASCII 0x20) and the content of line {4} to that
>>        of line {3};
>>
>>    4.  some agents reject this outright as line {4} is neither a valid
>>        header field nor a folded continuation of a header field prior to
>>        an empty line.
>>
>>    This can be exploited if it is known that one message handling agent
>>    will take one action while the next agent in the handling chain will
>>    take another.  Consider, for example, a message filter that searches
>>    message headers for properties indicative of abusive of malicious
>>    content that is attached to a Mail Transfer Agent (MTA) implementing
>>    option 2 above.  An attacker could craft a message that includes this
>>    malformation at a position above the property of interest, knowing
>>    the MTA will not consider that content part of the header, and thus
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                 [Page 9]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    the MTA will not feed it to the filter, thus avoiding detection.
>>    Meanwhile, the Mail User Agent (MUA) which presents the content to an
>>    end user, implements option 1 or 3, which has some undesirable
>>    effect.
>>
>>    It should be noted that a few implementations choose option 4 above
>>    since any reputable message generation program will get header
>>    folding right, and thus anything so blatant as this malformation is
>>    likely an error caused by a malefactor.
>>
>>    The preferred implementation if option 4 above is not employed is to
>>    apply the following heuristic when this malformation is detected:
>>
>>    1.  Search forward for an empty line.  If one is found, then apply
>>        option 3 above to the anomalous line, and continue.
>>
>>    2.  Search forward for another line that appears to be a new header
>>        field, i.e., a name followed by a colon.  If one is found, then
>>        apply option 3 above to the anomalous line, and continue.
>>
>> 8.3.  Unusual Spacing
>>
>>    The following message is valid per [MAIL]:
>>
>>        From: user@example.com {1}
>>        To: userpal@example.net {2}
>>        Subject: This is your reminder {3}
>>         {4}
>>         about the football game tonight {5}
>>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
>>
>>        Don't forget to meet us for the tailgate party! {8}
>>
>>    Line {4} contains a single whitespace.  The intended result is that
>>    lines {3}, {4}, and {5} comprise a single continued header field.
>>    However, some agents are aggressive at stripping trailing whitespace,
>>    which will cause line {4} to be treated as an empty line, and thus
>>    the separator line between header and body.  This can affect header-
>>    specific processing algorithms as described in the previous section.
>>
>>    Ideally, this case simply ought not to be generated.
>>
>
> {This sentence is entirely gratuitous. Replace it with: d/ }
>
>      This example was legal in earlier versions of the Internet Mail
> format standard.
>
>
>>    Message handling agents receiving a message bearing this anomaly MUST
>>    behave as if line {4} was not present on the message, and SHOULD emit
>>    a version in which line {4} has been removed.
>>
>
>      The best handling of this example is for a message parsing engine to
> behave as if line {4} was not present in the message and for a message
> creation engine to emit the message with line {4} removed,.
>
>
>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 10]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>> 8.4.  Header Malformations
>>
>>    There are various malformations that exist.  A common one is
>>
>
> { The first sentence is pretty obvious: there are always lots of ways to
> screw up. I suggest dropping it and beginning with something like:  /d}
>
>    Among the many possible malformations, a common one is...
>
>
>     insertion of whitespace at unusual locations, such as:
>>
>>        From: user@example.com {1}
>>        To: userpal@example.net {2}
>>        Subject: This is your reminder {3}
>>        MIME-Version : 1.0 {4}
>>        Content-Type: text/plain {5}
>>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
>>
>>        Don't forget to meet us for the tailgate party! {8}
>>
>>    Note the addition of whitespace in line {4} after the header field
>>    name but before the colon that separates the name from the value.
>>
>>    The acceptance grammar of [MAIL] permits that extra whitespace, so it
>>    cannot be considered invalid.  However, a consensus of
>>    implementations prefers to remove that whitespace.  There is no
>>    perceived change to the semantics of the header field being altered
>>    as the whitespace is itself semantically meaningless.  Thus, a module
>>    compliant with this memo MUST remove all whitespace after the field
>>    name but before the colon, and MUST emit that version of that field
>>    on output.
>>
>
>      Therefore, it is best to remove all whitespace after the field name
> but before the colon and to emit the field in this modified form.
>
>
>  8.5.  Header Field Counts
>>
>>    Section 3.6 of [MAIL] prescribes specific header field counts for a
>>    valid message.  Few agents actually enforce these in the sense that a
>>    message whose header contents exceed one or more limits set there are
>>    generally allowed to pass; they may add any required fields that are
>>
>
>    ; they typically add any...
>
>
>     missing, however.
>>
>>    Also, few agents that use messages as input, including Mail User
>>    Agents (MUAs) that actually display messages to users, verify that
>>    the input is valid before proceeding.  Two popular open source
>>    filtering programs and two popular Mailing List Management (MLM)
>>
>
> { I suggest changing 'two' to 'some', since the number might change;
> there's no reason to make this document get out of date for such a minor
> issue. /d }
>
>
>     packages examined at the time this document was written select either
>>
>
> { hence, remove "examined at the time this document was written" /d }
>
>
>     the first or last instance of a particular field name, such as From,
>>    to decide who sent a message.  Absent enforcement of [MAIL], an
>>
>
>      Absent strict enforcement
>
>
>     attacker can craft a message with multiple fields if that attacker
>>    knows the filter will make a decision based on one but the user will
>>    be shown the other.
>>
>>    This situation is exacerbated when a claim of message validity is
>>    inferred by something like a valid [DKIM] signature.  Such a
>>    signature might cover one instance of a constrained field but not
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 11]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    another, and a naive consumer of DKIM's output, not realizing which
>>    one was covered by a valid signature, could presume the wrong one was
>>    the "good" one.  An MUA, for example could show the first of two From
>>    fields as "good" or "safe" while the DKIM signature actually only
>>    verified the second.
>>
>
> { DKIM signatures do not verify addresses outside d=.  While the problem
> you are describing is, of course, real, it's far more complicated than
> you've described here.  Perhaps:  /d }
>
>      when message validity is assessed, such as through enhanced
> authentication methods.  Such methods might cover one instance of a
> constrained field but not another, taking the wrong one as "good" or "safe".
>
>
>     Thus, an agent compliant with this specification MUST enact one of
>>    the following:
>>
>
>      In attempting to counter this exposure, one of the following can be
> enacted:
>
>
>     1.  reject outright or refuse to process further any input message
>>        that does not conform to Section 3.6 of [MAIL];
>>
>>    2.  remove or, in the case of an MUA, refuse to render any instances
>>        of a header field whose presence exceeds a limit prescribed in
>>        Section 3.6 of [MAIL] when generating its output;
>>
>>    3.  alter the name of any header field whose presence exceeds a limit
>>        prescribed in Section 3.6 of [MAIL] when generating its output so
>>        that later agents can produce a consistent result.  Any
>>        alteration likely to cause the field to be ignored by downstream
>>        agents is acceptable.  A common approach is to prefix the field
>>        names with a string such as "BAD-".
>>
>
> { it would help if there were some rationales or analyses of the tradeoffs
> amongst these kinds of choices, to help the implementer/operator decide
> when to use which.  /d }
>
>
>
>  8.6.  Missing Header Fields
>>
>>    Similar to the previous section, there are messages seen in the wild
>>    that lack certain required header fields.  For example, [MAIL]
>>    requires that a From and Date field be present in all messages.
>>
>
> { I think these aren't 'examples' but constitute the entire list.  If
> there are other required fields that can be classed as 'missing', this
> section should list them.  Also, since Message-ID isn't 'required', the
> phrasing here doesn't quite match what's discussed in the section. Might be
> worth distinguishing "required but missing" vs. "optional but really useful
> and worth synthesizing".  Synthesizing the latter probably isn't dangerous.
>  Synthesizing the former always is... d/}
>
>
>
>>    When presented with a message lacking these fields, the MTA might
>>    perform one of the following:
>>
>>    1.  Make no changes
>>
>>    2.  Add an instance of the missing field(s) using synthesized content
>>
>
>      3.  Reject the message
>
>
>     Option 2 is RECOMMENDED for handling this case.  Handling agents
>>
>
> { Wow!  Synthesizing a From: field strikes me as especially dangerous, in
> all cases.  The rationale provided, below, needs to state this and, I
> believe, explain how and why it is worth incurring. The explanation that is
> provided essentially define this hack as an attack vector... /d}
>
>
>     SHOULD add these for internal hanlding if they are missing, but MUST
>>    NOT add them to the external representation.  The reason for this
>>    requirement is that there are some filter modules that would consider
>>    the absence of such fields to be a condition warranting special
>>    treatment (e.g., rejection), and thus the effectiveness of such
>>    modules would be stymied by an upstream filter adding them.
>>
>>    The synthesized fields SHOULD contain a best guess as to what should
>>    have been there; for From, the SMTP MAIL command's address can be
>>    used (if not null) or a placeholder address followed by an address
>>    literal (e.g., unknown@[192.0.2.1]); for Date, a date extracted from
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 12]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    a Received field is a reasonable choice.
>>
>>    One other important case to consider is a missing Message-Id field.
>>    An MTA that encounters a message missing this field SHOULD synthesize
>>    a valid one using techniques described above and add it to the
>>    external rpresentation, since many deployed tools use the content of
>>    that field as a common unique message reference, so its absence
>>    inhibits correlation of message processing.  One possible synthesis
>>    would be based on based on an encoding of the current date/time and
>>    an internal MTA ID (e.g., queue ID) followed by @ and the fully
>>    qualified hostname of the machine synthesizing the header value.  For
>>    example:
>>
>>        tm = gmtime(&now);
>>        (void) snprintf(buf, sizeof(buf), "%04d%02d%02d%02d%02d.%s@%s",
>>                        tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday,
>>                        tm->tm_hour, tm->tm_min, queueID, fqhn);
>>
>> 8.7.  Eight-Bit Data
>>
>>    Standards-compliant mail messages do not contain any non-ASCII data
>>    without indicating that such content is present by means of published
>>    [SMTP] extensions.  Absent that, [MIME] encodings are typically used
>>
>
> Overall, the document sometimes mixes transfer issues with data
> representation (object) issues, in ways that can be confusing.  This
> paragraph is one of those.  It's worth the extra verbosity to label each
> clearly and separately.  So, for example:
>
>      Standards-compliant mail messages that contain non-ASCII data are
> required to self-label this through the use of [MIME].  If the
> representation of the non-ASCII data is in an 8-bit mode (rather than
> special encoding so that it retains a 7-bit base), then this must be
> signaled through the use of [SMTP] extensions.
>
>
> >    without indicating that such content is present by means of published
> >    [SMTP] extensions.
>
>     to convert non-ASCII data to ASCII in a way that can be reversed by
>>    other handling agents or end users.
>>
>>    Non-ASCII data otherwise found in messages can confound code that is
>>    used to analyze content.  For example, a null (ASCII 0x00) byte
>>    inside a message can cause typical string processing functions to
>>    mis-identify the end of a string, which can be exploited to hide
>>    malicious content from analysis processes.
>>
>>    Handling agents MUST reject messages containing null bytes that are
>>    not encoded in some standard way, and SHOULD reject other non-ASCII
>>    bytes that are similarly not encoded.  If rejection is not done, an
>>    ASCII-compatible encoding such as those defined in [MIME] SHOULD be
>>    used.
>>
>
> { Hmmm.  It occurs to me that the document might be helped by an early
> discussion about a/the 'philosophy' that guides choosing whether to reject
> a message versus repair it.  But I don't have any clever text to suggest
> for doing this... /d}
>
>
>
>> 9.  MIME Anomalies
>>
>>    [MIME], et seq, define a mechanism of message extensions for
>>
>
> { perhaps quibbling, but since MIME does a variety of things, including
> this one, I suggest:
>
>      define -> includes
>
>
>     providing text in character sets other than ASCII, non-text
>>    attachments to messages, multi-part message bodies, and similar
>>    facilities.
>>
>>    Some anomalies with MIME-compliant generation are also common.  This
>>    section discusses some of those and presents preferred mitigations.
>>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 13]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>> 9.1.  Header Field Names
>>
>>    [MAIL] permits header field names to begin with "--".  This means
>>    that a header field name can look like a [MIME] multipart boundary.
>>    For example:
>>
>>      --foo:bar
>>
>>    This is a legal header field, whose name is "--foo" and whose value
>>    is "bar".  Thus, consider this header:
>>
>>        From: user@example.com {1}
>>        To: userpal@example.net {2}
>>        Subject: This is your reminder {3}
>>        Date: Wed, 20 Oct 2010 20:53:35 -0400 {4}
>>        MIME-Version: 1.0 {5}
>>        Content-Type: multipart/mixed; boundary="foo:bar" {6}
>>        --foo:bar {7}
>>        Malicious-Content: muahaha {8}
>>
>>    One implementation could observe that line {7} announces the
>>    beginning of the first MIME part while another considers it a part of
>>    the message's header.
>>
>>    If rejection of such messages cannot be done, agents MUST treat line
>>    {7} as part of the message's header block and not a MIME boundary.
>>
>
> { Under what circumstances can rejection /not/ be done??? And what is
> involved in even detecting that it isn't a mime boundary?  d/ }
>
>
>
>> 9.2.  Missing MIME-Version Field
>>
>>    Any message that uses [MIME] constructs is required to have a MIME-
>>    Version header field.  Without them, the Content-Type and associated
>>    fields have no semantic meaning.
>>
>
>      them -> it
>
>
>     It is often observed that a message has complete MIME structure, yet
>>    lacks this header field.
>>
>>    As described at the end of Section 8.2, this is not expected from a
>>
>
>      this -> this omission
>
>
>     reputable content generator and is often an indication of mass-
>>    produced spam or other undesirable messages.
>>
>>    Therefore, an agent compliant with this specification MUST internally
>>    enact one or more of the following in the absence of a MIME-Version
>>    header field:
>>
>>    1.  Ignore all other MIME-specific fields, even if they are
>>        syntactically valid, thus treating the entire message as a
>>        single-part message of type text/plain;
>>
>
> { Offhand, this sounds like a potentially-interesting attack vector.  /d}
>
>
>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 14]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    2.  Remove all other MIME-specific fields, even if they are
>>        syntactically valid, both internally and when emitting the output
>>        version of the message;
>>
>> 10.  Body Anomalies
>>
>> 10.1.  Oversized Lines
>>
>>    A message containing a line of content that exceeds 998 characters
>>    plus the line terminator (1000 total) violates Section 2.1.1 of
>>    [MAIL].  Some handling agents may not look at content in a single
>>    line past the first 998 bytes, providing bad actors an opportunity to
>>    hide malicious content.
>>
>>    There is no specified way to handle such messages, other than to
>>    observe that they are non-compliant and reject them, or rewrite the
>>    oversized line such that the message is compliant.
>>
>>    Handling agents MUST take one of the following actions:
>>
>>    1.  Break such lines into multiple lines at a position that does not
>>        change the semantics of the text being thus altered.  For
>>        example, breaking an oversized line such that a [URI] then spans
>>        two lines could inhibit the proper identification of that URI.
>>
>>    2.  Rewrite the MIME part (or the entire message if not MIME) that
>>        contains the excessively long line using a content encoding that
>>        breaks the line in the transmission but would still result in the
>>        line being intact on decoding for presentation to the user.  Both
>>        of the encodings declared in [MIME] can accomplish this.
>>
>> 11.  Security Considerations
>>
>>    The discussions of the anomalies above and their prescribed solutions
>>    are themselves security considerations.  The practises enumerated in
>>    this memo are generally perceived as attempts to resolve security
>>    considerations that already exist rather than introducing new ones.
>>
>
> { Hmmm.  Whereas I think the document introduces quite a few attack
> vectors that probably aren't discussed in other email specifications. /d}
>
>
>
>> 12.  IANA Considerations
>>
>>    This memo contains no actions for IANA.
>>
>>    [RFC Editor: Please remove this section prior to publication.]
>>
>> 13.  References
>>
>>
>>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 15]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>> 13.1.  Normative References
>>
>>    [KEYWORDS]    Bradner, S., "Key words for use in RFCs to Indicate
>>                  Requirement Levels", BCP 14, RFC 2119, March 1997.
>>
>>    [MAIL]        Resnick, P., "Internet Message Format", RFC 5322,
>>                  October 2008.
>>
>> 13.2.  Informative References
>>
>>    [DKIM]        Allman, E., Callas, J., Delany, M., Libbey, M., Fenton,
>>                  J., and M. Thomas, "DomainKeys Identified Mail (DKIM)
>>                  Signatures", RFC 4871, May 2007.
>>
>>    [DSN]         Moore, K. and G. Vaudreuil, "An Extensible Message
>>                  Format for Delivery Status Notifications", RFC 3464,
>>                  January 2003.
>>
>>    [EMAIL-ARCH]  Crocker, D., "Internet Mail Architecture", RFC 5598,
>>                  July 2009.
>>
>>    [MIME]        Freed, N. and N. Borenstein, "Multipurpose Internet
>>                  Mail Extensions (MIME) Part One: Format of Internet
>>                  Message Bodies", RFC 2045, November 1996.
>>
>>    [RFC822]      Crocker, D., "Standard for the Format of Internet Text
>>                  Messages", RFC 822, August 1982.
>>
>>    [SMTP]        Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
>>                  October 2008.
>>
>>    [URI]         Berners-Lee, T., Fielding, R., and L. Masinter,
>>                  "Uniform Resource Identifier (URI): Generic Syntax",
>>                  RFC 3986, January 2005.
>>
>> Appendix A.  Acknowledgements
>>
>>    The author wishes to acknowledge the following for their review and
>>    constructive criticism of this proposal: Tony Hansen, and Franck
>>    Martin
>>
>> Authors' Addresses
>>
>>    Murray S. Kucherawy
>>
>>    EMail: superuser@gmail.com
>>
>>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 16]
>>
>> Internet-Draft             Safe Mail Handling               October 2012
>>
>>
>>    Gregory N. Shapiro
>>
>>    EMail: gshapiro@sendmail.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Kucherawy & Shapiro      Expires April 12, 2013                [Page 17]
>>
>>
> --
>  Dave Crocker
>  Brandenburg InternetWorking
>  bbiw.net
>