Re: [Extra] Email header / address parsing

Timo Sirainen <timo@sirainen.com> Tue, 01 September 2020 22:15 UTC

Return-Path: <timo@sirainen.com>
X-Original-To: extra@ietfa.amsl.com
Delivered-To: extra@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B699A3A112C for <extra@ietfa.amsl.com>; Tue, 1 Sep 2020 15:15:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VetnzJ4M8nnH for <extra@ietfa.amsl.com>; Tue, 1 Sep 2020 15:15:04 -0700 (PDT)
Received: from sirainen.com (mail.sirainen.com [94.237.26.55]) by ietfa.amsl.com (Postfix) with ESMTP id 012B43A082D for <extra@ietf.org>; Tue, 1 Sep 2020 15:15:03 -0700 (PDT)
Received: from [192.168.0.101] (unknown [213.114.131.212]) by sirainen.com (Postfix) with ESMTPSA id 28E532B3C89; Tue, 1 Sep 2020 22:15:02 +0000 (UTC)
From: Timo Sirainen <timo@sirainen.com>
Message-Id: <8A38854C-8914-4200-8EB3-4BFA5B03B5E0@sirainen.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_09FB2C49-0F78-4061-8DC3-B0C2F5AF31F5"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Date: Wed, 02 Sep 2020 00:15:01 +0200
In-Reply-To: <20200901174151.C75881F59628@ary.qy>
Cc: extra@ietf.org
To: John Levine <johnl@taugh.com>
References: <20200901174151.C75881F59628@ary.qy>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/extra/AzXa0QDKKNoZRjyDMzDMXxsh2II>
Subject: Re: [Extra] Email header / address parsing
X-BeenThere: extra@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Email mailstore and eXtensions To Revise or Amend <extra.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/extra>, <mailto:extra-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/extra/>
List-Post: <mailto:extra@ietf.org>
List-Help: <mailto:extra-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/extra>, <mailto:extra-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Sep 2020 22:15:07 -0000

On 1. Sep 2020, at 19.41, John Levine <johnl@taugh.com> wrote:
> 
> In article <483BC400-403A-43CE-AEB5-EAE3B5B73080@sirainen.com> you write:
>> Hi,
>> 
>> I was reading https://www.usenix.org/system/files/sec20-chen-jianjun.pdf and started wondering if IMAP should be
>> handling some of this better. Especially for generating ENVELOPE. We could even still have time to add
>> recommendations to IMAP4rev2?
>> 
>> For example:
>> - From: user@attacker.com <user@real.com>
>> - From: <user@attacker.com, <user@real.com>
> 
> But the former is completely valid and occasionally useful, and the
> latter is a syntax error.

My understanding is that the first is supposed to be:

   name-addr       =   [display-name] angle-addr
   display-name    =   phrase
   phrase          =   1*word / obs-phrase
   word            =   atom / quoted-string

And '@' is not in atom. So it's not a valid address.

> Don't we already have a "don't do that" rule
> for invalid syntax?

Sure, but that's causing security problems nowadays. Would be nice to try to prevent those.

>> 2a) Space preceding the first header name
> 
> That's a continuation line.

Not when it's the first header. There is nothing to continue.

>> 2b) Space after From header: Again
> 
> Not sure what you mean but that's probably valid.

Sorry, messed up writing this. I mean when the header is "From : foo@example.com <mailto:foo@example.com>" - some parsers treat it as "From" header and some as "From " with a space.

>> 2c) Folding space before ":"
> 
> Not valid but I don't think I've ever seen it.
> 
> Over in DMARC land people have been arguing for years about what to do
> about misleading display names, and getting nowhere. 

These aren't about display-names though, which the client could handle how it wishes, but the actual email addresses visible to client via ENVELOPE.

> While I certainly believe that it's possible to do spam filtering
> looking for patterns that are likely to be phishes, I really don't
> think that the IMAP address parser is the place to do it.

But IMAP server is the one parsing the addresses into ENVELOPE, which the MUAs can use. We have the same problem that mail could look like it's fully DMARC-validated (and/or whatever other validation method is being used), but due to different parsers the From address could end up looking like it's fully valid to client even though it's not what was actually validated due to parsing differences. The only way for client to avoid this issue now would be to not use ENVELOPE at all and parse the headers itself.