Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt

John C Klensin <john-ietf@jck.com> Wed, 31 January 2007 17:52 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HCJdR-0003Uq-M3; Wed, 31 Jan 2007 12:52:41 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HCJdQ-0003Ud-F4 for discuss@apps.ietf.org; Wed, 31 Jan 2007 12:52:40 -0500
Received: from ns.jck.com ([209.187.148.211] helo=bs.jck.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HCJdP-0006Q5-0q for discuss@apps.ietf.org; Wed, 31 Jan 2007 12:52:40 -0500
Received: from [127.0.0.1] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.34) id 1HCJdO-000IFz-5v; Wed, 31 Jan 2007 12:52:38 -0500
Date: Wed, 31 Jan 2007 12:52:37 -0500
From: John C Klensin <john-ietf@jck.com>
To: Tim Bray <tbray@textuality.com>
Subject: Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt
Message-ID: <E4790BD63A92B0F55375CE85@p3.JCK.COM>
X-Mailer: Mulberry/4.0.7 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 3002fc2e661cd7f114cb6bae92fe88f1
Cc: discuss@apps.ietf.org
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org


--On Tuesday, 30 January, 2007 16:45 -0800 Tim Bray
<tbray@textuality.com> wrote:

> Pardon me for being late to this party, I was on vacation in
> Australia.  I think this is a positive contribution.
> 
> First, a detail point:  In section 5.4, it's probably relevant
> that
> per the Java Language Specification
> (http://java.sun.com/docs/books/jls/third_edition/html/lexical
> .html#95413p)
> it's clear that a Java character literal or variable
> represents, not a
> Unicode character, but a UTF-16 code point.   I guess the
> conclusion
> is that it may be OK in certain circumstances to use \uNNNN,
> but it's
> not OK to explain that by calling out to Java.
> 
> Second: I think that the discussion shows that the syntax
> problems
> around representing Unicode characters in ASCII and other
> Unicode-oblivious texts are tricky; witness the issues with
> delimiters
> and ABNF/case.  This is further evidence, were any needed,
> that IETF
> Working Groups SHOULD NOT specify Internet protocols which may
> be used
> to transfer text but are not capable of representing the
> Unicode
> character set, either by specifying the use of either
> hard-wired UTF-8
> or alternatively XML, both of which have cracked this nut.
> 
> So here's a proposed recasting of second para of 1.1:
> 
>   When one moves to Unicode [Unicode] [ISO10646], where
> characters
>    occupy two or more octets and may be coded in several
> different
>    forms, the question of escapes becomes even more
> complicated.  In
>    particular, we have seen fairly extensive use of both
> hexadecimal
>    representations of the UTF-8 encoding [RFC3629] of a
> character and
>    variations on the U+NNNN[N[N]] notation commonly used in
> conjunction
>    with the Unicode Standard.
> 
>   New protocols that are required to carry textual content
> SHOULD be designed
>   in such a way that the full repertoire of Unicode characters
> may be
> represented
>   in that text; UTF-8 and XML are both good options.
> 
>   This document proposes that existing protocols being
> internationalized SHOULD
>    use some contextually-appropriate variation of the
> U+NNNN[N[N]]
> notation unless
>    other considerations outweigh those described here.

Tim,

While I think I agree with you about your second proposed
paragraph above ("New protocols..."), I think my instructions
with this document is to keep it narrow and to focus on escapes,
not on general advice to protocol designers about Unicode or
internationalization more broadly.   So I don't want to go so
far as to make specific (or even specific-sounding) suggestions.
This effort, and some others, have convinced me that we are
getting closer to the time at which RFC 2277/ BCP 18 needs to be
reopened, reviewed, and updated, but this document isn't the
right place to do it, at least IMO.  

But most of your text is better than mine; let me see what I can
do.  I have adjusted the Java text.

thanks,
     john