Re: multiple character sets in GeneralText

"Carl S. Gutekunst" <csg@hideji.worldtalk.com> Tue, 07 June 1994 21:02 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa18649; 7 Jun 94 17:02 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa18645; 7 Jun 94 17:02 EDT
Received: from survis.surfnet.nl by CNRI.Reston.VA.US id aa17954; 7 Jun 94 17:02 EDT
Received: from relay3.UU.NET by survis.surfnet.nl with SMTP (PP) id <28587-0@survis.surfnet.nl>; Tue, 7 Jun 1994 22:52:04 +0200
Received: from uucp4.uu.net by relay3.UU.NET with SMTP (rama) id QQwthr26667; Tue, 7 Jun 1994 16:51:58 -0400
Received: from worldtlk.UUCP by uucp4.uu.net with UUCP/RMAIL ; Tue, 7 Jun 1994 16:52:01 -0400
Received: from hideji.worldtalk.com by worldtalk.com with SMTP (1.38.193.5/16.2) id AA28849; Tue, 7 Jun 1994 12:54:03 -0700
Received: by hideji.worldtalk.com (5.61/1.5) id AA23324; Tue, 7 Jun 94 12:58:43 -0700
Date: Tue, 7 Jun 94 12:58:43 -0700
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: "Carl S. Gutekunst" <csg@hideji.worldtalk.com>
Message-Id: <9406071958.AA23324@hideji.worldtalk.com>
To: "Kevin E. Jordan" <Kevin.E.Jordan@cdc.com>
Cc: mime-mhs@surfnet.nl
Subject: Re: multiple character sets in GeneralText
In-Reply-To: Your message of Tue, 07 Jun 1994 10:36:09 CDT <2df493ea3898002@mercury.udev.cdc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Id: <23322.771019122.1@hideji.worldtalk.com>

>Has this question been asked before...  RFC1494 gives no guidance in the case
>where an X.400 GeneralText body part contains multiple ISO-8859-x character
>sets, e.g. ISO-8859-1 and ISO-8859-7.

You would certainly think so, but I don't see any mention in RFC-1502, either.

I am highly adverse to codeset switching, by ISO 2022 or any other means.
There are legacy environments where you have no choice, like Japanese; but
for new parts it would be dreadful (IMHO) to introduce ISO 2022 into MIME.

I'd make a strong push for Unicode, aka ISO 10646.  RFC-1502 hints at this
without going into details.  Unicode software seems to be catching on, given
things like native support in Windows NT.  The biggest obstable is developing
mapping tables and algoritms that provide reasonable performance.  The tables
are largely available, but mapping a 16-bit codeset to and from multiple 8-bit
sets requires either vast quantities of memory or some cleverness; and an
optimal mapping requires two passes over the Unicode input.  (There is no
other way to select between ISO 8859-1 and 8859-9, for example.)

There is a mailing list for Unicode in MIME, maintained by Mark Davis
<mark_davis@taligent.com> and David Goldsmith <david_goldsmith@taligent.com>om>.
They have posted two drafts, general rules for Unicode in MIME, and a 7-bit
encoding of UCS-2 which they have called UTF-7.  I am unhappy with the UTF-7
encoding, but I don't have much expertise in that area, so I am reluctant to
say much.  (I don't know why they didn't base it on the existing UTF-8.)

The drafts are goldsmith-mime-unicode-00 and goldsmith-mime-utf7-03.

<csg>