Re: [EAI] Unicode vs. UTF-8 / Encoding vs. Representation

John C Klensin <klensin@jck.com> Tue, 28 December 2010 04:15 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 37ABF3A6915 for <ima@core3.amsl.com>; Mon, 27 Dec 2010 20:15:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.433
X-Spam-Level:
X-Spam-Status: No, score=-2.433 tagged_above=-999 required=5 tests=[AWL=0.014, BAYES_00=-2.599, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qDYZg8MzKJ1s for <ima@core3.amsl.com>; Mon, 27 Dec 2010 20:15:29 -0800 (PST)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by core3.amsl.com (Postfix) with ESMTP id 3E1963A683F for <ima@ietf.org>; Mon, 27 Dec 2010 20:15:29 -0800 (PST)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1PXR0A-0003n9-0Q; Mon, 27 Dec 2010 23:17:34 -0500
X-Vipre-Scanned: 0789A179001DEB0789A2C6-TDI
Date: Mon, 27 Dec 2010 23:17:32 -0500
From: John C Klensin <klensin@jck.com>
To: dcrocker@bbiw.net
Message-ID: <61939C011F6BB4A93749804C@[192.168.1.128]>
In-Reply-To: <4D19623F.3040804@dcrocker.net>
References: <Pine.OSX.4.64.1012221602490.40683@mac-allocchio3.elettra.trieste.it> <68655A9F86D4BE7ED933F8A6@[192.168.1.128]> <4D192FF8.1030706@dcrocker.net> <9B48F59821946F2EA2DCDEA0@[192.168.1.128]> <4D19623F.3040804@dcrocker.net>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: ima@ietf.org
Subject: Re: [EAI] Unicode vs. UTF-8 / Encoding vs. Representation
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Dec 2010 04:15:30 -0000

--On Monday, December 27, 2010 8:06 PM -0800 Dave CROCKER
<dhc2@dcrocker.net> wrote:

>>  "ASCII versus
>> Unicode" and "ASCII versus Unicode are ultimately equally
>> incorrect.
> 
> Well, it is essential to make the distinction between legacy
> and new semantics, since it is the essence of the current work.

yes.  That was what people were trying to do with "ASCII versus
UTF-8".  As you point out, that doesn't work well.  "ASCII
versus Unicode" really doesn't work any better.

> The choice of terms is a separate issue.

Indeed.

>  If you have
> alternative labeling, that would be great to consider.

I hope someone in the WG will have a good suggestion.   If not,
we may be back to "ASCII" versus "non-ASCII" as the disjoint
pair, "Unicode"  to describe the international CCS (inclusive of
ASCII), and "UTF-8" to describe the particular encoding that is
used for this protocol and recommended by RFC 2277 (a
recommendation that is reinforced by draft-iab-idn-encoding and
other more recent work).

    john