Re: [Extra] Email header / address parsing

Timo Sirainen <timo@sirainen.com> Tue, 01 September 2020 22:15 UTC

From: Timo Sirainen <timo@sirainen.com>
Message-Id: <8A38854C-8914-4200-8EB3-4BFA5B03B5E0@sirainen.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_09FB2C49-0F78-4061-8DC3-B0C2F5AF31F5"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Date: Wed, 02 Sep 2020 00:15:01 +0200
In-Reply-To: <20200901174151.C75881F59628@ary.qy>
Cc: extra@ietf.org
To: John Levine <johnl@taugh.com>
References: <20200901174151.C75881F59628@ary.qy>
Archived-At: <https://mailarchive.ietf.org/arch/msg/extra/AzXa0QDKKNoZRjyDMzDMXxsh2II>
Subject: Re: [Extra] Email header / address parsing
Precedence: list

On 1. Sep 2020, at 19.41, John Levine <johnl@taugh.com> wrote:
> 
> In article <483BC400-403A-43CE-AEB5-EAE3B5B73080@sirainen.com> you write:
>> Hi,
>> 
>> I was reading https://www.usenix.org/system/files/sec20-chen-jianjun.pdf and started wondering if IMAP should be
>> handling some of this better. Especially for generating ENVELOPE. We could even still have time to add
>> recommendations to IMAP4rev2?
>> 
>> For example:
>> - From: user@attacker.com <user@real.com>
>> - From: <user@attacker.com, <user@real.com>
> 
> But the former is completely valid and occasionally useful, and the
> latter is a syntax error.

My understanding is that the first is supposed to be:

   name-addr       =   [display-name] angle-addr
   display-name    =   phrase
   phrase          =   1*word / obs-phrase
   word            =   atom / quoted-string

And '@' is not in atom. So it's not a valid address.

> Don't we already have a "don't do that" rule
> for invalid syntax?

Sure, but that's causing security problems nowadays. Would be nice to try to prevent those.

>> 2a) Space preceding the first header name
> 
> That's a continuation line.

Not when it's the first header. There is nothing to continue.

>> 2b) Space after From header: Again
> 
> Not sure what you mean but that's probably valid.

Sorry, messed up writing this. I mean when the header is "From : foo@example.com <mailto:foo@example.com>" - some parsers treat it as "From" header and some as "From " with a space.

>> 2c) Folding space before ":"
> 
> Not valid but I don't think I've ever seen it.
> 
> Over in DMARC land people have been arguing for years about what to do
> about misleading display names, and getting nowhere. 

These aren't about display-names though, which the client could handle how it wishes, but the actual email addresses visible to client via ENVELOPE.

> While I certainly believe that it's possible to do spam filtering
> looking for patterns that are likely to be phishes, I really don't
> think that the IMAP address parser is the place to do it.

But IMAP server is the one parsing the addresses into ENVELOPE, which the MUAs can use. We have the same problem that mail could look like it's fully DMARC-validated (and/or whatever other validation method is being used), but due to different parsers the From address could end up looking like it's fully valid to client even though it's not what was actually validated due to parsing differences. The only way for client to avoid this issue now would be to not use ENVELOPE at all and parse the headers itself.

[Extra] Email header / address parsing Timo Sirainen
Re: [Extra] Email header / address parsing John Levine
Re: [Extra] Email header / address parsing Timo Sirainen
Re: [Extra] Email header / address parsing John R Levine
Re: [Extra] Email header / address parsing Timo Sirainen
Re: [Extra] Email header / address parsing Arnt Gulbrandsen