Re: [EAI] UTF-8 in Message-IDs

Frank Ellermann <> Wed, 05 October 2011 23:18 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 71BAB21F8DA7 for <>; Wed, 5 Oct 2011 16:18:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -102.902
X-Spam-Status: No, score=-102.902 tagged_above=-999 required=5 tests=[AWL=0.045, BAYES_00=-2.599, FROM_LOCAL_NOVOWEL=0.5, RCVD_IN_DNSWL_LOW=-1, SARE_SUB_ENC_UTF8=0.152, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id oDitxp9ggyGf for <>; Wed, 5 Oct 2011 16:18:10 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 9246D21F8DA4 for <>; Wed, 5 Oct 2011 16:18:10 -0700 (PDT)
Received: by wyh21 with SMTP id 21so2582647wyh.31 for <>; Wed, 05 Oct 2011 16:21:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=R0Cqa5h29fFWSwZCohnoWcJumCH5ISyagv4UWips6p8=; b=YPOBiEsrDtGLHuzOJGXHs6JyNGTQC9sw+/7IU1tLO8Z3MbSQZmhQpKxz/ZkwJfVUky gtWHqUinX1ofbX1+TcKS4CH6tMwhLy+TbBEUiQGAoVS1O5U3j74jf5HHfu5YOPzv8Ryv r9jLlJTaNs0kPZY19FgJj+T5WJSVws/urV11I=
Received: by with SMTP id x82mr196400wei.77.1317856879068; Wed, 05 Oct 2011 16:21:19 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Wed, 5 Oct 2011 16:20:39 -0700 (PDT)
In-Reply-To: <A48F698A08B601A60F5A9719@PST.JCK.COM>
References: <20111004014257.8027.qmail@joyce.lan> <> <34E8E4E5F1CBE344994E3F8B@PST.JCK.COM> <> <A48F698A08B601A60F5A9719@PST.JCK.COM>
From: Frank Ellermann <>
Date: Thu, 6 Oct 2011 01:20:39 +0200
Message-ID: <>
To: John C Klensin <>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: IMA <>
Subject: Re: [EAI] UTF-8 in Message-IDs
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 05 Oct 2011 23:18:11 -0000

On 5 October 2011 23:13, John C Klensin wrote:

 [Naive or broken]
> I would not presume to try to categorize all of the
> transformations I've seen, but some of them probably don't
> fall under your description above.

> This isn't trivial.

ACK, but thinking that it's trivial would be covered by naive.

I never had anything to do with MIXER.  All I vaguely recall
is related to "dupes" (= the same message arriving more than
once where it is expected at most once) and "nopes" (= if an
expected message arrives less than once, i.e., never, null,
nada, nope).  For Netnews "same message" means by definition
"same Message-ID", it is a fundmental concept.

Because I had a hard time to grok this years ago (in FTN the
Message-ID is only optional, the real thing in FTN echos are
the "Seen-By" thingies) I'm now reluctant to touch this "holy
cow" of what used to be the "RFC822" side from my FTN (Fido
technology network) POV.

> Remember that simple transcoding of header fields from
> net-ASCII into, e.g., EBCDIC or what we call ISO-2022-JP
> means that dumb octet-string comparisons of Message-IDs
> (and other things) will fail.

Actually I don't remember this issue, the "gatebau" problems
with Message-IDs were limited to ASCII-compatible networks.

But I can imagine how "ASCII-incompatible" offers wild and
wonderful ways to get it wrong.  If "they" can't do UTF-8 or
at least UTF-EBCDIC they have only one EAI-option:  Reject.

The corner cases where only the EAI-ID causes this "reject",
because everything else happens to work for say EBCDIC, are
too odd to consider.

> If someone asked my opinion about whether a gateway should mess
> with Message-IDs, I'd say "no, unless it is absolutely required
> by the systems on the other end".  But lots of people don't ask
> and even fewer listen.   Neither RFC 5321/5322 nor RFC 5536
> offer any reciprocal guarantees.

RFC 1849 has a very good advice:  "Avove all, prevent loops."

The Usefor folks, or rather Russ, considered to write a gateway
memo after 5537 (not the same as 5536).  Some of the topics you
didn't find in 5322 & 5536 are covered in 5321 & 5537.  Sadly you
can't put more about this in 5321bis, and actually it should get
its own memo -- 5321 & 5537 aren't the place for gateway details.

>> EAI won't make that worse.

> If there are fragile systems out there -- and there almost
> certainly still are-- how can you possibly know that?

It is a guess:  Gateway operators who manage to get it right for
message/rfc822 <-> something based on EBCDIC could also manage
to get it right for message/global <-> something UTF-EBCDIC, or
simply reject message/global.  In other words, when I'm opposed
to UTF-8 in message-IDs I'm not worried about breaking gateways.

> In spite of being co-author of CRAM-MD5, I don't understand this
> comment at all.  Certainly the CRAM-MD5 spec, regardless of its
> other strengths and weaknesses, doesn't even mention Message-IDs

Quoth 2195:  "The syntax of the unencoded form must correspond to
that of an RFC 822 'msg-id' [RFC822] as described in [POP3]."

APOP and CRAM-MD5 use the syntactical form of a Message-ID for
their challenges.  You didn't write "globally unique forever", but
the Message-ID challenges are still supposed to be "unique".