Re: [ietf-smtp] Stray <LF> in the middle of messages

Viktor Dukhovni <ietf-dane@dukhovni.org> Sun, 05 July 2020 21:58 UTC

Return-Path: <ietf-dane@dukhovni.org>
X-Original-To: ietf-smtp@ietfa.amsl.com
Delivered-To: ietf-smtp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 63BDA3A0B57 for <ietf-smtp@ietfa.amsl.com>; Sun, 5 Jul 2020 14:58:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.413
X-Spam-Level:
X-Spam-Status: No, score=-0.413 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FAKE_REPLY_C=1.486, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YCfcxVtEGWyT for <ietf-smtp@ietfa.amsl.com>; Sun, 5 Jul 2020 14:58:23 -0700 (PDT)
Received: from straasha.imrryr.org (straasha.imrryr.org [100.2.39.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E65A3A0B55 for <ietf-smtp@ietf.org>; Sun, 5 Jul 2020 14:58:23 -0700 (PDT)
Received: by straasha.imrryr.org (Postfix, from userid 1001) id 3CE7831EBD1; Sun, 5 Jul 2020 17:58:22 -0400 (EDT)
Date: Sun, 05 Jul 2020 17:58:22 -0400
From: Viktor Dukhovni <ietf-dane@dukhovni.org>
To: ietf-smtp@ietf.org
Message-ID: <20200705215822.GA82270@straasha.imrryr.org>
Reply-To: ietf-smtp@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <779239c4-ebca-1ca8-da59-bb9af4c38ac8@tana.it> <4efb92cb-9657-15eb-11f5-b6057f691505@pscs.co.uk> <87ftb8p1ii.fsf@llwynog.ekleog.org>
User-Agent: Mutt/1.12.2 (2019-09-21)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-smtp/2xfR1lsMd-5-eLoASvuASs82dXs>
Subject: Re: [ietf-smtp] Stray <LF> in the middle of messages
X-BeenThere: ietf-smtp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of issues related to Simple Mail Transfer Protocol \(SMTP\) \[RFC 821, RFC 2821, RFC 5321\]" <ietf-smtp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-smtp/>
List-Post: <mailto:ietf-smtp@ietf.org>
List-Help: <mailto:ietf-smtp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2020 21:58:25 -0000

[ Sorry about being late to the party, I read this list sporadically. ]

On Sat, Jun 06, 2020 at 07:06:29PM +0200, Leo Gaspard wrote:

> However, I notice that every single time I have tried to use `netcat` to
> send emails for demo purposes, it succeeded *without* sending <CRLF> and
> by sending only <LF>. While `telnet` does appear to convert typed <LF>
> into <CRLF>, it looks like (my version of) `netcat` does not. So most of
> the SMTP servers I have met with appear to consider <LF> as a valid line
> ending.

Postfix always sends <CRLF>, but accepts <CR>*<LF> as a line ending.

    /*      smtp_get() reads the named stream up to and including
    /*      the next LF character and strips the trailing CR LF. 

            /*
             * Strip off the record terminator: either CRLF or just bare LF.
             *
             * XXX RFC 2821 disallows sending bare CR everywhere. We remove bare CR
             * if received before CRLF, and leave it alone otherwise.
             */
        case '\n':
            vstring_truncate(vp, VSTRING_LEN(vp) - 1);
            while (VSTRING_LEN(vp) > 0 && vstring_end(vp)[-1] == '\r')
                vstring_truncate(vp, VSTRING_LEN(vp) - 1);

Perhaps you tested at least some Postfix servers.

> However, there is one case where the semantics is important: should one
> escape the <LF>. sequence while in a DATA block?

    https://tools.ietf.org/html/rfc5322#section-2.3

       The body of a message is simply lines of US-ASCII characters.  The
       only two limitations on the body are as follows:

       o  CR and LF MUST only occur together as CRLF; they MUST NOT appear
          independently in the body.
       o  Lines of characters in the body MUST be limited to 998 characters,
          and SHOULD be limited to 78 characters, excluding the CRLF.

          Note: As was stated earlier, there are other documents,
          specifically the MIME documents ([RFC2045], [RFC2046], [RFC2049],
          [RFC4288], [RFC4289]), that extend (and limit) this specification
          to allow for different sorts of message bodies.  Again, these
          mechanisms are beyond the scope of this document.

Since bare LF is invalid, you must not send it.  With Postfix that's
automatic, because the bare LF becomes a line-ending on input, so can
never occur in the output.  Otherwise, Postfix would have to reject
messages with bare LF, and it is easier to just accept these, becase
(e.g. "sendmail -bs" can then tolerate newline-terminated input).

> I would guess that the fact that other SMTP servers appear to usually
> accept <LF>.<LF> as a terminator indicates that <LF>. should be escaped
> even though it is not strictly conforming with the RFC, but… I wanted to
> have the opinion of other people on this, before diving too deep in the
> implementation?

I think by escaped you mean "transparency":

    https://tools.ietf.org/html/rfc5321#section-4.5.2

In which case the answer is simply that you MUST NOT send either
<LF>.<LF> or the dot-stuffed <LF>..<LF>, because you MUST NOT send a
bare LF in the first place.

> Should I understand this paragraph as meaning that if I ever receive
> such an ill-formed message, I… can? should? must? accept it and… can?
> should? must? convert the <LF> into proper <CRLF>?

You can reject the invalid input, or modify it in transit to send
something valid.  Choices are:

    * Convert <LF> to <CRLF>
    * Strip bare <LF> (and perhaps bare <CR>).
    * Apply a MIME quoted-printable or bases64 encoding to the body,
      if not already encoded.
    * If already base64, you are at liberty to strip extraneous
      non-base64 characters without changing the payload.
    * If already quoted-printable, you could in principle decode
      and re-encode, but saner to either strip or accept as EOL.

On Sat, Jun 06, 2020 at 07:36:19PM +0100, Paul Smith wrote:

> (To be honest, I'd be tempted to treat a lone LF as a 99.9999% reliable 
> indicator of spam. Similarly with a NULL (0x00) character in the middle 
> of an (RFC5322) message. Legitimate mail will just never have it unless 
> it was generated by something very dodgy).

It may be worth noting that 0x00 (ASCII NUL), is a US-ASCII character,
and so is in fact allowed in RFC5322 message bodies, per the above
quoted section 2.3 of RFC5322.  The prohibition on NULs is a feature of
MIME (RFC2045):

    https://tools.ietf.org/html/rfc2045#section-2.7
    https://tools.ietf.org/html/rfc2045#section-2.8

And so MIME messages are obliged to apply a non-identity transfer
encoding to message bodies that would otherwise contain ASCII NULs.

On Mon, Jun 08, 2020 at 06:22:47PM +0200, Alessandro Vesely wrote:

> CRLF is required for IMAP and POP too.  The inconvenience is having to compute
> the length of the message, in octets.  The FS can only tell how many octets the
> native format takes.  One has to add the number of lines.  Storing that info in
> the file name is a possibility, if you don't have a dedicated FS.

FWIW, Postfix stores each line (or sufficiently long partial line) of
the mesage as a "record", with each record having a one-byte type and a
variable width length of at least one byte.  As a consequence, the queue
file is approximately the same size as the message with CRLF line
endings and never smaller.  So Postfix just uses the queue file size.

-- 
    Viktor.