Re: [ietf-smtp] How is EAI mail implemented ?

Ned Freed <ned.freed@mrochek.com> Tue, 15 June 2021 18:15 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: ietf-smtp@ietfa.amsl.com
Delivered-To: ietf-smtp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6F1C13A38BA for <ietf-smtp@ietfa.amsl.com>; Tue, 15 Jun 2021 11:15:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=mrochek.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Y5ioKX1LTOO for <ietf-smtp@ietfa.amsl.com>; Tue, 15 Jun 2021 11:15:19 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [98.153.82.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C52D13A38B9 for <ietf-smtp@ietf.org>; Tue, 15 Jun 2021 11:15:19 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01S09JUE1CSG009ZTE@mauve.mrochek.com> for ietf-smtp@ietf.org; Tue, 15 Jun 2021 11:10:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mrochek.com; s=201712; t=1623780614; bh=nxTGEZxiCLD+tLsYbXOmXHwuYTmEk/IMg6RIrn0lYhM=; h=Cc:Date:From:Subject:In-reply-to:References:To:From; b=YIesQ8X0c0H+fu4D+A2uWq8yzvTYwDbFjKzCrgvXroScNqXu4tFDPGz5S5hBxSbrD sibAjJQEfMdaLZVAVa3a1UIcysxb2m1RW0YmFg1ao+UwR5cGDtHNxeUpBc1pPbPG9v 8X0mw5p2JQUO4L+3EeUePeAYIcRErIYLxx6/eQgA=
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: TEXT/PLAIN; CHARSET="us-ascii"; Format="flowed"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01S07X1DWB4G0085YQ@mauve.mrochek.com>; Tue, 15 Jun 2021 11:10:11 -0700 (PDT)
Cc: ietf-smtp <ietf-smtp@ietf.org>
Message-id: <01S09JUCB7XE0085YQ@mauve.mrochek.com>
Date: Tue, 15 Jun 2021 10:43:49 -0700
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Tue, 15 Jun 2021 13:32:34 -0400" <5bb26c2f-a94d-ccaf-8fc1-51684f25f48@taugh.com>
References: <5bb26c2f-a94d-ccaf-8fc1-51684f25f48@taugh.com>
To: John R Levine <johnl@taugh.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-smtp/SLl2rJOmPPSdEVLjT_FbOoRWZ3Q>
Subject: Re: [ietf-smtp] How is EAI mail implemented ?
X-BeenThere: ietf-smtp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of issues related to Simple Mail Transfer Protocol \(SMTP\) \[RFC 821, RFC 2821, RFC 5321\]" <ietf-smtp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-smtp/>
List-Post: <mailto:ietf-smtp@ietf.org>
List-Help: <mailto:ietf-smtp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jun 2021 18:15:24 -0000

> This is sort of a followup to the discussion abut domain names in Received
> lines in EAI messages.

> RFC 6531 describes a model where EAI mail is conceptually parallel to
> ASCII mail -- clients tag incoming EAI messages with SMTPUTF8 keywords,
> those EAI messages can only be relayed to servers that offer SMTPUTF8, and
> so forth.  Having looked at a bunch of EAI mail software, nobody
> actually implements it that way.

> Since all computers handle 8-bit bytes, mail software generally handles
> 8-bit data without doing anything special.  You have to change the code
> that does DNS lookups to turn U-labels into A-labels, but that's about it.

> On my system I turn addresses into A-labels on the way in, and back into
> U-labels on the way out (only if there is a UTF-8 local part) and handle
> all the local routing and deliveries with A-labels.

It sounds to me like this doesn't allow for UTF-8 domains in aliases. If so,
that's not very friendly.

In any case, this isn't how we do it. We try very hard not to modify addresses,
and instead do various forms of "canonicalization" when looking them up or
comparing them. This is undoubtedly ugly, and to someone like me who also writes
code for AVR microcontrollers it seems pretty inefficient, but as a practical
matter these transformations are down in the noise for modern CPUs.

> But a lot of MTAs
> don't even do that, they just allow any IDN in the domain, and if you want
> the U-label and A-label versions of an address to deliver to the same
> place, you have to configure them both.

In our case configuring either one is sufficient. Of course this leaves open the
question of comparing local parts. We haven't really solved this one yet.

> Nobody I've seen tags messages as EAI in their internal queues. 

We do. In fact our EAI tagging tells us what aspects of EAI are
in use: MAIL FROM address, RCPT TO address, main headers, body.
It turns out to be handy to know this up front.

> On
> outgoing mail, they check on the fly to see if it's an EAI message:
> non-ASCII characters in the envelope or message headers (Exim doesn't even
> look at the headers, and says that's not a bug.)

How strictly we follow the RFCs is settable in our case. If someone
wants strict behavior they can have it, if someone is content
to send messages with SMTPUTF8 headers to a non-EAI server they can
have that too.

> A lot of MTAs add the
> SMTPUTF8 MAIL FROM tag to all outgoing mail to servers that offer
> SMTPUTF8, because why not.

Because it's possible that regardless of what you have observed, that server may
then refuse to relay an EAI-tageed but actually non-EAI message to a non-EAI
server?

> I think they all notice if an EAI message is
> sent to a non-EAI server, and a few in that case do odd things like
> turning a UTF-8 local part into a MIME encoded word in the envelope.

Yuck. Frankly, it would be better to just send UTF-8 in this case than to come
up with a private encoding for an address that likely belongs to the server
you're sending to.

> This approach is a lot easier to code than trying to tag all the queued
> messages, and it can deliver more mail if, e.g., an incoming message has
> an ASCII bounce address and UTF-8 recipients but is relayed to an ASCII
> recipient, the relay doesn't need EAI.

IME the tagging part was trivial. The canonicalization was considerably harder.
(And writing the tests for all of it was a real PITA.) However, the really
difficult part is balancing standards compliance and all this turning into a
major support call generator.

> When looking at IMAP and POP servers, again, since computers all handle
> 8-bit data, you get most of the way there for free.  I haven't found any
> IMAP servers with UTF8=ACCEPT or POP with UTF8 that really works, but I've
> found plenty with LOGIN and AUTHENTICATE commands that take UTF-8
> strings, and IMAP searches with the complex old character encoding work
> remarkably well.  It often seems even to find strings in unencoded UTF-8
> headers which I wasn't expecting, perhaps again something that works by
> mistake.

I don't think it's a mistake, exactly. More like the optimum handling
for invalid messages turns out to align with the proper handling
for SMTPUTF8 messages.

> None of this means the RFCs have to change but it might be time for an
> applicability statement or something about how EAI is likely to coexist
> with ASCII mail for a long time.

I don't have enough feedback from actual use to be able to say anything
definitive, but my guess is the document EAI needs is something that the IETF
would not be willing/able to write: One that says what parts of the standard
should be followed, what parts should be outright violated, and when.

				Ned