Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt

John C Klensin <john-ietf@jck.com> Wed, 31 January 2007 17:52 UTC

Date: Wed, 31 Jan 2007 12:52:37 -0500
From: John C Klensin <john-ietf@jck.com>
To: Tim Bray <tbray@textuality.com>
Subject: Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt
Message-ID: <E4790BD63A92B0F55375CE85@p3.JCK.COM>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: discuss@apps.ietf.org
Precedence: list
Errors-To: discuss-bounces@apps.ietf.org


--On Tuesday, 30 January, 2007 16:45 -0800 Tim Bray
<tbray@textuality.com> wrote:

> Pardon me for being late to this party, I was on vacation in
> Australia.  I think this is a positive contribution.
> 
> First, a detail point:  In section 5.4, it's probably relevant
> that
> per the Java Language Specification
> (http://java.sun.com/docs/books/jls/third_edition/html/lexical
> .html#95413p)
> it's clear that a Java character literal or variable
> represents, not a
> Unicode character, but a UTF-16 code point.   I guess the
> conclusion
> is that it may be OK in certain circumstances to use \uNNNN,
> but it's
> not OK to explain that by calling out to Java.
> 
> Second: I think that the discussion shows that the syntax
> problems
> around representing Unicode characters in ASCII and other
> Unicode-oblivious texts are tricky; witness the issues with
> delimiters
> and ABNF/case.  This is further evidence, were any needed,
> that IETF
> Working Groups SHOULD NOT specify Internet protocols which may
> be used
> to transfer text but are not capable of representing the
> Unicode
> character set, either by specifying the use of either
> hard-wired UTF-8
> or alternatively XML, both of which have cracked this nut.
> 
> So here's a proposed recasting of second para of 1.1:
> 
>   When one moves to Unicode [Unicode] [ISO10646], where
> characters
>    occupy two or more octets and may be coded in several
> different
>    forms, the question of escapes becomes even more
> complicated.  In
>    particular, we have seen fairly extensive use of both
> hexadecimal
>    representations of the UTF-8 encoding [RFC3629] of a
> character and
>    variations on the U+NNNN[N[N]] notation commonly used in
> conjunction
>    with the Unicode Standard.
> 
>   New protocols that are required to carry textual content
> SHOULD be designed
>   in such a way that the full repertoire of Unicode characters
> may be
> represented
>   in that text; UTF-8 and XML are both good options.
> 
>   This document proposes that existing protocols being
> internationalized SHOULD
>    use some contextually-appropriate variation of the
> U+NNNN[N[N]]
> notation unless
>    other considerations outweigh those described here.

Tim,

While I think I agree with you about your second proposed
paragraph above ("New protocols..."), I think my instructions
with this document is to keep it narrow and to focus on escapes,
not on general advice to protocol designers about Unicode or
internationalization more broadly.   So I don't want to go so
far as to make specific (or even specific-sounding) suggestions.
This effort, and some others, have convinced me that we are
getting closer to the time at which RFC 2277/ BCP 18 needs to be
reopened, reviewed, and updated, but this document isn't the
right place to do it, at least IMO.  

But most of your text is better than mine; let me see what I can
do.  I have adjusted the Java text.

thanks,
     john

New draft (Was: I-D ACTION:draft-klensin-unicode-… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Tim Bray
Re: New draft (Was: I-D ACTION:draft-klensin-unic… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Tim Bray
Re: New draft (Was: I-D ACTION:draft-klensin-unic… John C Klensin
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Clive D.W. Feather
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Clive D.W. Feather
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Stephane Bortzmeyer
I-D.klensin-unicode-escapes (was: New Draft) Frank Ellermann
I-D.klensin-unicode-escapes (was: New Draft) Frank Ellermann
ABNF (was: New draft) Frank Ellermann
Re: New draft (Was: I-D ACTION:draft-klensin-unic… Clive D.W. Feather
Re: I-D.klensin-unicode-escapes (was: New Draft) Clive D.W. Feather
Re: I-D.klensin-unicode-escapes (was: New Draft) Clive D.W. Feather
Re: ABNF (was: New draft) Clive D.W. Feather
Re: ABNF Frank Ellermann
draft-klensin-unicode-escapes-01 (was: New Draft) John C Klensin
Re: I-D.klensin-unicode-escapes Frank Ellermann
Re: I-D.klensin-unicode-escapes John C Klensin
Re: draft-klensin-unicode-escapes-01 Frank Ellermann
Re: I-D.klensin-unicode-escapes (was: New Draft) Stephane Bortzmeyer
Re: I-D.klensin-unicode-escapes (was: New Draft) John C Klensin
Re: draft-klensin-unicode-escapes-01 (was: New Dr… Clive D.W. Feather
Re: draft-klensin-unicode-escapes-01 (was: New Dr… John C Klensin
Re: draft-klensin-unicode-escapes-01 (was: New Dr… Clive D.W. Feather