Re: [EAI] UTF-8 in Message-IDs

John C Klensin <klensin@jck.com> Mon, 15 August 2011 21:22 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8688D21F8D0E for <ima@ietfa.amsl.com>; Mon, 15 Aug 2011 14:22:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.578
X-Spam-Level:
X-Spam-Status: No, score=-2.578 tagged_above=-999 required=5 tests=[AWL=-0.131, BAYES_00=-2.599, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5pd9+QGMTL-P for <ima@ietfa.amsl.com>; Mon, 15 Aug 2011 14:22:01 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by ietfa.amsl.com (Postfix) with ESMTP id 9260D21F8C63 for <ima@ietf.org>; Mon, 15 Aug 2011 14:22:01 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1Qt4cI-000AXy-Qc; Mon, 15 Aug 2011 17:22:39 -0400
Date: Mon, 15 Aug 2011 17:22:37 -0400
From: John C Klensin <klensin@jck.com>
To: Ned Freed <ned.freed@mrochek.com>
Message-ID: <619143DE42BB97B26A53920D@PST.JCK.COM>
In-Reply-To: <01O4VQ5BI2B200VHKR@mauve.mrochek.com>
References: <01O4VQ5BI2B200VHKR@mauve.mrochek.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: Charles Lindsey <chl@clerew.man.ac.uk>, IMA <ima@ietf.org>
Subject: Re: [EAI] UTF-8 in Message-IDs
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 21:22:02 -0000

Ned,

Let me try again with a note short enough that my conclusion and
intentions are not obscured (I think that happened the last
time; my apologies).

I think several of us have reasoned to the conclusion that, on
balance, Message-IDs should not be restricted to ASCII (in the
formal syntax or more generally).  Some of us find some of those
arguments more persuasive than others; others of us would choose
a different mix, but the conclusion is the same.

Given the multiple reasons for that conclusion; the apparent
consensus about it in email discussions before, during, and
after IETF 80; and the fairly general impression that an ASCII
restriction would be generally ignored because of the way
Message-IDs are often contructed, I believe that anyone who
continues to believe that non-ASCII Message-IDs should be
prohibited needs to persuasively demonstrate to the WG that they
would cause significant harm.

That demonstration has not appeared.  We can create edge cases
that show that messages with Message-IDs with non-ASCII content
are slightly less robust that Message-IDs that are ASCII-only,
but far more likely cases can be shown to demonstrate that
messages with only ASCII addresses are more robust than messages
that contain non-ASCII text in addresses, etc.  To go down that
path is to argue that any message with UTF-8 strings in _any_
header field is less robust than a corresponding message with
only ASCII in those fields.  While that is undoubtedly true, the
WG (and the IETF by issuing the WG a charter) have decided that
the advantages of having internationalized addressing and header
fields far exceed the disadvantages of that drop in robustness.
Moreover, the marginal drop due to non-ASCII Message-IDs alone
(once the risks of any non-ASCII material are accepted) appears
to be close to trivial... making a persuasive demonstration of
harm even less likely.

I urge Joseph to review the history of this discussion and then
close it out.

      john

p.s. As far as "SHOULD keep Message-IDs in ASCII" is concerned,
I could live with it but would actually oppose it.  The reasons
are implicit in the above, in your recent notes, and in other
recent discussions: (i) restrictions that we unlikely to be
obeyed are just bad for standards and (ii) to whatever extent
Message-IDs (including values in In-Reply-To and other fields)
are ever examined by humans, forcing id-right to use A-labels
for the most obvious and common cases is just inconsistent with
multiple design goals.  So I would prefer a bit of
implementation advice that points the issue out, not a
conformance statement.  YMMD but, if you agree even slightly,
let me try to draft a paragraph that we can then figure out
where to put.


--On Monday, August 15, 2011 10:32 -0700 Ned Freed
<ned.freed@mrochek.com> wrote:

>...
> That's good, because AFAIK nobody is making that argument. The
> argument we're making is threefold:
> 
> (1) Because of structural issues in the RFC 5322 ABNF, it's
> much easier     to make some changes to low-level rules than
> to try and add utf-8 at     a higher leve. But one consequence
>...