Re: Experiment with UTF-8 in message-IDs

"Charles Lindsey" <chl@clerew.man.ac.uk> Mon, 10 October 2011 11:23 UTC

Return-Path: <owner-ietf-usefor@mail.imc.org>
X-Original-To: ietfarch-usefor-archive@ietfa.amsl.com
Delivered-To: ietfarch-usefor-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F124721F8569 for <ietfarch-usefor-archive@ietfa.amsl.com>; Mon, 10 Oct 2011 04:23:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.847
X-Spam-Level:
X-Spam-Status: No, score=-0.847 tagged_above=-999 required=5 tests=[BAYES_50=0.001, RCVD_IN_DNSWL_LOW=-1, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kRo40xaMVTKk for <ietfarch-usefor-archive@ietfa.amsl.com>; Mon, 10 Oct 2011 04:23:28 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id 5134121F8573 for <usefor-archive@ietf.org>; Mon, 10 Oct 2011 04:23:28 -0700 (PDT)
Received: from hoffman.proper.com (localhost [127.0.0.1]) by hoffman.proper.com (8.14.4/8.14.3) with ESMTP id p9ABCAM3054169 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 10 Oct 2011 04:12:10 -0700 (MST) (envelope-from owner-ietf-usefor@mail.imc.org)
Received: (from majordom@localhost) by hoffman.proper.com (8.14.4/8.13.5/Submit) id p9ABCAQH054168; Mon, 10 Oct 2011 04:12:10 -0700 (MST) (envelope-from owner-ietf-usefor@mail.imc.org)
X-Authentication-Warning: hoffman.proper.com: majordom set sender to owner-ietf-usefor@mail.imc.org using -f
Received: from outbound-queue-2.mail.thdo.gradwell.net (outbound-queue-2.mail.thdo.gradwell.net [212.11.70.35]) by hoffman.proper.com (8.14.4/8.14.3) with ESMTP id p9ABC8K6054140 for <ietf-usefor@imc.org>; Mon, 10 Oct 2011 04:12:09 -0700 (MST) (envelope-from news@clerew.man.ac.uk)
Received: from outbound-edge-2.mail.thdo.gradwell.net (bonnie.gradwell.net [212.11.70.2]) by outbound-queue-2.mail.thdo.gradwell.net (Postfix) with ESMTP id E96AF21EC1 for <ietf-usefor@imc.org>; Mon, 10 Oct 2011 12:12:06 +0100 (BST)
Received: from port-89.xxx.th.newnet.co.uk (HELO clerew.man.ac.uk) (80.175.135.89) (smtp-auth username postmaster%pop3.clerew.man.ac.uk, mechanism cram-md5) by outbound-edge-2.mail.thdo.gradwell.net (qpsmtpd/0.83) with (DES-CBC3-SHA encrypted) ESMTPSA; Mon, 10 Oct 2011 12:12:06 +0100
Received: from clerew.man.ac.uk (localhost [127.0.0.1]) by clerew.man.ac.uk (8.13.7/8.13.7) with ESMTP id p9ABC3QO003880 for <ietf-usefor@imc.org>; Mon, 10 Oct 2011 12:12:03 +0100 (BST)
Received: (from news@localhost) by clerew.man.ac.uk (8.13.7/8.13.7/Submit) id p9ABC28u003877 for ietf-usefor@imc.org; Mon, 10 Oct 2011 12:12:03 +0100 (BST)
To: ietf-usefor@imc.org
Xref: clerew local.usefor:25249
Path: clerew!chl
From: Charles Lindsey <chl@clerew.man.ac.uk>
Subject: Re: Experiment with UTF-8 in message-IDs
Message-ID: <LsuI4I.H4@clerew.man.ac.uk>
X-Newsreader: NN version 6.5.2 (NOV)
Date: Mon, 10 Oct 2011 10:21:54 +0000
Lines: 70
X-Gradwell-MongoId: 4e92d306.5333-7f7d-2
X-Gradwell-Auth-Method: mailbox
X-Gradwell-Auth-Credentials: postmaster@pop3.clerew.man.ac.uk
Sender: owner-ietf-usefor@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-usefor/mail-archive/>
List-Unsubscribe: <mailto:ietf-usefor-request@imc.org?body=unsubscribe>
List-ID: <ietf-usefor.imc.org>

>Hi all,

>In the IETF working group for IMA (Internationalized eMail Address),
>there is a current thread about UTF-8 in message-IDs:
>    http://www.ietf.org/mail-archive/web/ima/current/threads.html#04330

>Quick references in the thread:

>http://www.ietf.org/mail-archive/web/ima/current/msg04430.html
>http://www.ietf.org/mail-archive/web/ima/current/msg04344.html
>http://www.ietf.org/mail-archive/web/ima/current/msg04345.html
>http://www.ietf.org/mail-archive/web/ima/current/msg04420.html
>http://www.ietf.org/mail-archive/web/ima/current/msg04422.html



>RFC 5536 (USEFOR) currently allows only ASCII characters in message-IDs.

>INN 2.4 and INN 2.5 have always rejected message-IDs containing
>non-ASCII chars.  (I have not looked at INN 2.3 and before.)  When
>a message-ID is not valid per RFC 850/1036/... and now 5536, the
>article is rejected.


>My question is:  should we try right now to relax the check so as to allow
>UTF-8 in message-IDs?
>If yes, is there something else to enforce?  (NFC normalization?)

It looks like UTF-8 Message-IDs in mail will start to appear. They would
"mostly work" in news it they happened to be encountered (and might well
route around sites that did awkward things with them). So I suggest simply
removing the check in INN would be a good idea - and likewise similar
checks on other headers (but not Date:, I think). It is simply a matter of
"being liberal in what you accept", which is a fine thing to do except
when it is obviously going to lead to breakage.

But I wouldn't do anything about normalization at this stage. That problem
only arises if intermediate sites try to rewrite (or "improve") what they
received, and that would likely do more harm than good.

And, as the EAI standards seem about to become proposed standards, perhaps
it is time to revive the idea of UTF-8 in Newsgroup names. There are some
early Usefor drafts proposing how it should be done, and they DID contain
severe restrictions on allowed characters and strict NFC normalization
(but essentially to be enforced at submission time, and left strictly
alone thereafter).

>Of course, other requirements from RFC 5536 will remain (that is to say
>no comments in the Message-ID: header field, and no ">" or WSP).
>U+00A0 (&nbsp; in HTML) and other spaces encoded in UTF-8 are allowed,
>aren't they?

Even RFC 5332 does not allow comments or WSP ARAIR.

>We plan on releasing INN 2.5.3 soon, so perhaps we can relax the check
>starting from INN 2.5.3.  I will ask in the INN workers mailing-list,
>if naturally there is no complaints in this USEFOR mailing-list against
>going this way.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131            Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5