Re: [EAI] UTF-8 in Message-IDs

John C Klensin <klensin@jck.com> Wed, 17 August 2011 23:16 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A68D21F8A96 for <ima@ietfa.amsl.com>; Wed, 17 Aug 2011 16:16:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.575
X-Spam-Level:
X-Spam-Status: No, score=-2.575 tagged_above=-999 required=5 tests=[AWL=-0.128, BAYES_00=-2.599, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pLTCUBHkBGvp for <ima@ietfa.amsl.com>; Wed, 17 Aug 2011 16:16:08 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by ietfa.amsl.com (Postfix) with ESMTP id 5188621F8A95 for <ima@ietf.org>; Wed, 17 Aug 2011 16:16:08 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1QtpM0-000AuE-QF; Wed, 17 Aug 2011 19:16:57 -0400
Date: Wed, 17 Aug 2011 19:16:55 -0400
From: John C Klensin <klensin@jck.com>
To: Chris Newman <chris.newman@oracle.com>, Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, ima@ietf.org
Message-ID: <96FFE3C1B209E0FB349436D2@PST.JCK.COM>
In-Reply-To: <18B1642B54C3604C98866093@96B2F16665FF96BAE59E9B90>
References: <CAHhFybo47--0YjCRcvSO4asoV_R89+ULDB3tyij+ba=O_6gKsQ@mail.gmail.com> <18B1642B54C3604C98866093@96B2F16665FF96BAE59E9B90>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: Re: [EAI] UTF-8 in Message-IDs
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 23:16:09 -0000

--On Wednesday, August 17, 2011 15:36 -0700 Chris Newman
<chris.newman@oracle.com> wrote:

>...
>   The Message-ID SHOULD include at least one UTF-8 character.
> 
> By including a UTF-8 character, any gateway to a non-EAI
> system will have to replace the Message-ID with a new one,
> thus correctly indicating that the resulting downgraded
> message is a subsequent revision. Systems wishing to identify
> the original EAI message id for a damaged downgraded message
> can look at the Downgraded-Message-ID header.

Chris, while I agree with your analysis of "subsequent
revision", I don't get from their to your conclusion for two
reasons:

(1) Other than in the POP/IMAP downgrade path, there is no such
thing as a Downgraded-Message-ID header.   One can imagine a
number of scenarios that would involve a need to send a message
back out in ASCII-only form that do not involve that header or
any downgrading we are going to specify.  Note especially that
the user generating a reply may have sufficient out of band
information available to supply deliverable addresses for some
or all of the author, sender and recipients of the message as
appropriate.

(2) Per one of my examples of some days ago, if we allow UTF-8
in Message-IDs at all, it is quite possible to have a message
that requires UTF8SMTPbis extension handling but that contains
no non-ASCII data in either header fields or addresses other
than that UTF-8 Message-ID.  Indeed, one interpretation of the
statement you suggest above virtually guarantees a lot of
messages of that type (probably it can be rewritten to reduce
the number, but that doesn't change things much).  Now we have
an interesting problem: the message is reply-able (and, more
important, forward-able without creating a message/global body
part) and new recipients with all-ASCII addresses can be added.
Certainly in the forwarding case, there is no "subsequent
revision".   It can be handled strictly as a legacy message
except for that pesky Message-ID.  And, again, except for
replacing or otherwise dealing with the Message-ID, there is no
need to downgrade anything.

As I've said before, I don't think those scenarios (including
Frank's) are adequate to make a case for a protocol restriction
to ASCII.  I think they are sufficient to justify a
recommendation that systems sending mail into environments they
consider fragile should confine themselves to ASCII message
ID's.  Requiring non-ASCII characters in Message-IDs would not
only defeat that recommendation, it would run counter to the
core of the "if we tell them ASCII-only, they will ignore us"
observation that Ned, I, and others have been making.  

To review that observation (at least my version), MUAs have ways
to construct Message-IDs, usually by cramming some local
information and a domain name together.  For all sorts of
reasons, they are likely to continue doing whatever they are
doing, using a domain name with U-labels in it if they consider
such a domain name primary.  If we ask them to do anything else,
it is going to create a requirement for extra code and extra
work -- whether that be converting the domain name to use
U-labels or to artificially add a non-ASCII character because
some other header field contains at least one non-ASCII
character.  We can provide them very little motivation for doing
that other than the WG's exploration of transition cases and
edge cases.  Experience indicates that extra work, especially
extra work that creates additional code paths, with no even
slightly compelling motivation, translates into an unimplemented
requirement.

(personal opinion only)
    john