Re: [Ltru] Re: UTF-8

Addison Phillips <addison@yahoo-inc.com> Sun, 17 September 2006 19:32 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GP2Nn-0005ro-2r; Sun, 17 Sep 2006 15:32:51 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GP2Nl-0005pK-3l for ltru@ietf.org; Sun, 17 Sep 2006 15:32:49 -0400
Received: from rsmtp1.corp.yahoo.com ([207.126.228.149]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GP2Nj-0002Km-OV for ltru@ietf.org; Sun, 17 Sep 2006 15:32:49 -0400
Received: from [10.72.76.233] (snvvpn2-10-72-76-c233.corp.yahoo.com [10.72.76.233]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.6/8.13.6/y.rout) with ESMTP id k8HJWXG4099519 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 17 Sep 2006 12:32:34 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=bEqzrBhOFX5I71EdC/fiA8CyjtYTejCaHyV6NQjY0O2lpzFrwH/TYFjcN6bJurKs
Message-ID: <450DA2D1.6020706@yahoo-inc.com>
Date: Sun, 17 Sep 2006 12:32:33 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 1.5.0.5 (Windows/20060719)
MIME-Version: 1.0
To: John Cowan <cowan@ccil.org>
Subject: Re: [Ltru] Re: UTF-8
References: <789E617C880666438EDEE30C2A3E8D10EEFC@mailsrvnt05.enet.sharplabs.com> <450B2B75.2F36@xyzzy.claranet.de> <6.0.0.20.2.20060916114849.081056e0@localhost> <450BD347.9EA@xyzzy.claranet.de> <6.0.0.20.2.20060917154808.08a12880@localhost> <20060917183821.GC26073@ccil.org>
In-Reply-To: <20060917183821.GC26073@ccil.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: -15.0 (---------------)
X-Scan-Signature: c1c65599517f9ac32519d043c37c5336
Cc: ltru@ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

John Cowan wrote:
> 
>> Well, and then interpreted as Shift-JIS in the case of my
>> mailer :-(. Our spec as well as the HTTP headers will say
>> that it's UTF-8. That should be enough. People who are
>> okay to see garbage will be as satisfied with Bokm&#xE5;
>> as they are with Bokm?$B%F!&l, or any other weird rendering,
>> but people who like to see the real thing will be best
>> served by UTF-8.
> 
> Bokm&#xE5;l is at least reconstructible without knowing anything
> except the SGML character reference conventions and the
> codepoints of Unicode characters; Bokm?$B%F!&l (which is the
> way it got to me, sans ESC characters) is nothing but rubbish.

Well, that's mozibake: it's ISO 2022-JP encoding of the Shift-JIS 
characters made by mis-interpreting UTF-8 bytes. Lovely.

> 
> This is an *excellent* example of why we need explicit escaping
> (for which SGML is as good a convention as any) rather than
> encoding, given the present state of email.
> 

But that consideration doesn't apply to the registry file. It only 
applies to the discussion of a registry entry on ietf-languages. 
Admittedly MUAs vary greatly in their support for encodings, but we can 
certainly specify directions for how to send the registration request to 
the list so that everyone can tell what code points are intended and 
other instructions for how to encode those characters in the registry.

Suggestions to limit the character repertoire strike me as counter 
productive too. Why would we do that? I could see, for example, that we 
might end up with a few Chinese descriptions or comments in the registry 
to disambiguate Chinese variations. Why artificially restrict it 'ab 
initio'?

I find this fascination with ASCII slightly quaint.

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru