Re: [EAI] UTF-8/MIME

John C Klensin <klensin@jck.com> Thu, 19 August 2010 16:34 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9AC123A6915 for <ima@core3.amsl.com>; Thu, 19 Aug 2010 09:34:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.414
X-Spam-Level:
X-Spam-Status: No, score=-2.414 tagged_above=-999 required=5 tests=[AWL=0.033, BAYES_00=-2.599, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1m8gv1VLFvzH for <ima@core3.amsl.com>; Thu, 19 Aug 2010 09:34:26 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by core3.amsl.com (Postfix) with ESMTP id 32AEC3A693D for <ima@ietf.org>; Thu, 19 Aug 2010 09:34:26 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1Om84t-000B9l-4n; Thu, 19 Aug 2010 12:34:55 -0400
Date: Thu, 19 Aug 2010 12:34:54 -0400
From: John C Klensin <klensin@jck.com>
To: Charles Lindsey <chl@clerew.man.ac.uk>, IMA <ima@ietf.org>
Message-ID: <366A47AA8992A5250A7A8820@PST.JCK.COM>
In-Reply-To: <op.vhotq0c26hl8nm@clerew.man.ac.uk>
References: <E14011F8737B524BB564B05FF748464A0E2374D7@TK5EX14MBXC141.redmond.corp.microsoft.com> <op.vhotq0c26hl8nm@clerew.man.ac.uk>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: Re: [EAI] UTF-8/MIME
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Aug 2010 16:34:27 -0000

--On Thursday, August 19, 2010 14:43 +0100 Charles Lindsey
<chl@clerew.man.ac.uk> wrote:

> On Wed, 18 Aug 2010 19:31:52 +0100, Shawn Steele
> <Shawn.Steele@microsoft.com> wrote:
> 
>>> my understanding is that headers are in utf-8,
>>> the body part might still use mime(gb2312).
>> 
>> I'm obviously really confused.  My understanding was that EAI
>> required   UTF-8 headers and MIME encoded bodies, which
>> should also be UTF-8, but   an appropriate MIME part.
>> Presumably the body could also be GB2312   MIME, though I'd
>> discourage that as much as possible.

The implication of draft-iab-idn-encoding is that we ought to be
gradually deprecating everything but ASCII and UTF-8 (I would
say "everything but UTF-8", but there is specific language in
the MIME specs stating a preference for coding body parts with
no non-ASCII characters as "us-ascii" rather than "utf-8").   So
I agree with Shawn's observation but don't think it is a matter
for EAI to discuss or offer advice about (see note 1 below).

> Now you've got me confused. An EAI message (i.e. one coming
> from an agent asserting a need for UTF8SMTP) might contain
> headers using UTF-8, but the body entirely in ASCII. So no
> MIME stuff anywhere in it. So you cannot say that EAI REQUIRES
> MIME, though for sure it would be stupid to implement it
> without MIME.

But there will always be "MIME stuff" in an EAI-conformant
message with non-ASCII header material.  Remember, UTF8SMTP[bis]
requires 8BITMIME, 8BITMIME requires MIME, and MIME requires at
least a MIME-version header field, so, even if there is only one
body part and the content type is allowed to default to
'text/plain; charset="us-ascii"' (i.e., no Content-Type header
field present), there is "MIME stuff" present.

> AIUI, if you assert UTF8SMTP and have UTF8 in the headers, and
> then want to use UTF-8 in the Body (a common situation,
> presumably), then you are supposed to include a Content-Type:
> specifying charset=utf8 and a suitable
> Content-Transfer-Encoding.

Yes.  Since 8BITMIME is needed to transport those UTF-8 headers,
the C-T-E can reasonably be "8bit"

> What we MIGHT do is to state that, for such EAI messages the
> default body charset was UTF-8 and the default CTE was 8bit
> (and most modern MUAs would likely display that correctly).
> That's just another extension to RFC 553[12], and we have
> extended those already. The downside would be that any attempt
> at downgrading would have to put those assumed-by-default
> headers back.

I think this would be certain to cause problems with other MIME
implementations.  Remember that messages are passed outside the
transport system and that making up other defaults --body parts
not requiring Content-Type or C-T-E fields-- would violate the
8BITMIME spec.   Now, if we were to say that use of EAI required
either the normal default Content-Type (text/plain;
charset="us-ascii") or that text types were required to be
charset="utf-8", I think that would be ok.  However, I think it
would accomplish very little in practice that a simple
recommendation to use UTF-8 in body parts where possible
(labeled as the MIME and 8BITMIME specs required) would not.

In particular, if someone were determined to use GB body parts,
I don't think we are going to be able to effectively prohibit
that.  We should just try to insist that they be properly
labeled and identified.


> In fact, I suspect that default practice is going to happen
> anyway within EAI-only communities, so we might as well make
> it official.

Partially because of the wave of MTAs that, in self-defense,
took very harsh measures to deal with unlabeled 8bit content,
there is, in my experience, little current practice of using
non-ASCII body parts without Content-type labeling.   I see no
reason at all to try to reintroduce that practice, especially
when we remember how common text/html and various binary
("application") and image type body parts have to be handled by
those same MTAs and MUAs.

    john