Re: [EAI] UTF-8 in Message-IDs

"Charles Lindsey" <chl@clerew.man.ac.uk> Thu, 06 October 2011 09:37 UTC

Return-Path: <chl@clerew.man.ac.uk>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25F7821F8B48 for <ima@ietfa.amsl.com>; Thu, 6 Oct 2011 02:37:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.601
X-Spam-Level:
X-Spam-Status: No, score=-3.601 tagged_above=-999 required=5 tests=[AWL=-1.154, BAYES_00=-2.599, J_BACKHAIR_44=1, RCVD_IN_DNSWL_LOW=-1, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l32GEfRpMMT4 for <ima@ietfa.amsl.com>; Thu, 6 Oct 2011 02:37:41 -0700 (PDT)
Received: from outbound-queue-2.mail.thdo.gradwell.net (outbound-queue-2.mail.thdo.gradwell.net [212.11.70.35]) by ietfa.amsl.com (Postfix) with ESMTP id 7A35521F8B54 for <ima@ietf.org>; Thu, 6 Oct 2011 02:37:40 -0700 (PDT)
Received: from outbound-edge-2.mail.thdo.gradwell.net (bonnie.gradwell.net [212.11.70.2]) by outbound-queue-2.mail.thdo.gradwell.net (Postfix) with ESMTP id 4CD0122047 for <ima@ietf.org>; Thu, 6 Oct 2011 10:40:48 +0100 (BST)
Received: from port-89.xxx.th.newnet.co.uk (HELO clerew.man.ac.uk) (80.175.135.89) (smtp-auth username postmaster%pop3.clerew.man.ac.uk, mechanism cram-md5) by outbound-edge-2.mail.thdo.gradwell.net (qpsmtpd/0.83) with (DES-CBC3-SHA encrypted) ESMTPSA; Thu, 06 Oct 2011 10:40:47 +0100
Received: from clerew.man.ac.uk (localhost [127.0.0.1]) by clerew.man.ac.uk (8.13.7/8.13.7) with ESMTP id p969ekEB020982 for <ima@ietf.org>; Thu, 6 Oct 2011 10:40:47 +0100 (BST)
Date: Thu, 06 Oct 2011 10:40:46 +0100
To: IMA <ima@ietf.org>
From: "Charles Lindsey" <chl@clerew.man.ac.uk>
Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-1
MIME-Version: 1.0
References: <20111004014257.8027.qmail@joyce.lan> <op.v2viju2m6hl8nm@clerew.man.ac.uk> <34E8E4E5F1CBE344994E3F8B@PST.JCK.COM> <CAHhFybrr0jWaSMxHwnJ4NuKFBJRSzw423aYHnEmta8M+1=2+1Q@mail.gmail.com> <A48F698A08B601A60F5A9719@PST.JCK.COM> <4E8CCDDC.7010908@trigofacile.com> <20E612AF4F05B85D980028A1@PST.JCK.COM>
Content-Transfer-Encoding: 8bit
Message-ID: <op.v2xbt8us6hl8nm@clerew.man.ac.uk>
In-Reply-To: <20E612AF4F05B85D980028A1@PST.JCK.COM>
User-Agent: Opera Mail/9.25 (SunOS)
X-Gradwell-MongoId: 4e8d779f.103a2-4760-2
X-Gradwell-Auth-Method: mailbox
X-Gradwell-Auth-Credentials: postmaster@pop3.clerew.man.ac.uk
Subject: Re: [EAI] UTF-8 in Message-IDs
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Oct 2011 09:37:45 -0000

On Thu, 06 Oct 2011 01:11:53 +0100, John C Klensin <klensin@jck.com> wrote:

> I believe it leaves us with the following cases for Message-IDs
> in mail<->news gateways:
>
> Direction news -> mail
>
> 	  Since 5322 Message-IDs are less restrictive than 5536
> 	ones, there should be no need to change anything....

Actually, the difference between 5322 and 5536, as it finally ended up, is  
miniscule (5536 does not permit a '>' inside a Message-ID, but it is  
wildly improbable that one would see such a '>' in a real-world email).
> Direction mail -> news
>
>  Case 1: The Message-ID is all-ASCII and otherwise
> 	conforms to the 5536 requirements.  No transformation
> 	other than copying is necessary.  If a change is made,
> 	it would presumably be to assure uniqueness on the news
> 	side.  Nothing new here.
> 	
>  Case 2: The Message-ID is all-ASCII but does not conform
> 	to the 5536 requirements.  The gateway is required to
> 	change the mail Message-ID to a 5536 conforming one.
> 	Whatever happens, the two Message-IDs are no longer the
> 	same.
> 	
>  Case 3: The Message-ID contains some non-ASCII
> 	characters.  That doesn't conform to 5536, so the
> 	gateway is required to change it to something else,
> 	something that is 5536 conforming.  Whatever happens,
> 	the two Message-IDs are no longer the same.

Actually no. The best thing for a gateway to do (in the sense that it will  
cause the least damage/misrouting/looping/whatever) is to leave the utf-8  
untouched. If some subsequent server gets upset, then its connected  
clients will suffer, but the article will still route its way through  
other servers anr thus propagate throughout the network. Maybe not  
standards conforming, but that is what will work best. There might just be  
some slight benefit in NFC normalization at that point. Standards might  
eventually catch up, but in the meantime servers that refused such  
articles would get heavily leant upon to let them through.

As you say, gatewaying is a messy business, and attempts to fix "broken"  
messages usually make things worse. What I have suggested is well within  
the spirit of the gatewaying guidelines from 5537 that were quoted.

My only complaint is that if EAI had required that NFC normalization in  
any utf-8 message-id, then that would have been one issue less for those  
qateways. Indeed it would have also been one issue less for _email_ user  
agents that like to do threading.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5