Re: draft-klensin-unicode-escapes-01

Frank Ellermann <nobody@xyzzy.claranet.de> Sat, 03 February 2007 02:07 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HDAJg-0007tG-79; Fri, 02 Feb 2007 21:07:48 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HDAJe-0007tA-Qv for discuss@apps.ietf.org; Fri, 02 Feb 2007 21:07:46 -0500
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HDAJd-0003xD-Bk for discuss@apps.ietf.org; Fri, 02 Feb 2007 21:07:46 -0500
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1HDAJV-0002GA-Tw for discuss@apps.ietf.org; Sat, 03 Feb 2007 03:07:37 +0100
Received: from 212.82.251.96 ([212.82.251.96]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Sat, 03 Feb 2007 03:07:37 +0100
Received: from nobody by 212.82.251.96 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Sat, 03 Feb 2007 03:07:37 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: Re: draft-klensin-unicode-escapes-01
Date: Sat, 03 Feb 2007 03:03:48 +0100
Organization: <URL:http://purl.net/xyzzy>
Lines: 80
Message-ID: <45C3ED84.70C@xyzzy.claranet.de>
References: <875A124D75A8B481E176CF06@p3.JCK.COM> <uppsr2hs59srbd7eufbcul5a1ekl7i09nl@hive.bjoern.hoehrmann.de> <EF59DA6FD89C4F19750C68C3@p3.JCK.COM> <20070202114658.GX7742@finch-staff-1.thus.net> <45C3371E.330F@xyzzy.claranet.de> <20070202184727.GG68544@finch-staff-1.thus.net> <B7F8733D73E8CC7227785A69@p3.JCK.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 212.82.251.96
X-Mailer: Mozilla 3.0 (OS/2; U)
X-Spam-Score: 2.7 (++)
X-Scan-Signature: d185fa790257f526fedfd5d01ed9c976
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

John C Klensin wrote:

> Sorry, but, IMO, for a document like this which is merely giving
> examples (at this point), "very widely deployed and used" trumps
> "horrible".

When I mentioned hex. NCRs I meant XML, not SGML and its many ways
to save keystrokes.

>> Probably the I-D should mention that one famous exception from
>> its rule to avoid encoded UTF-8 is the URL form of IRIs.

> Why?  I don't see that (and a few other cases) as "exceptions"
> (famous or not), but as mistakes from which we should learn and,
> I hope, have learned.

The overall RFC 2277 rule is IMO "if you don't know what it is or
can't say what it is, it's UTF-8, and if that theory fails it's
UNKNOWN-8BIT".  And unlike RFC 2231 an URL can't say what it is.

>> wouldn't it be better to say "21 bits - rather than the 7 bits
>> of ASCII -"?

> No, because of net-ascii and some other issues, probably not.
> Again, because the important issue here is that this stuff is
> about escapes, you are picking nits that belong elsewhere

That nit is closely related to escape mechanisms providing for
31 or 32 bits, and attempts to get rid of leading zeros in these
mechanisms, which could fail without explicit delimiters.

> U+NNNN[N[N]] versus U+[[N]N]NNNN is a matter of taste.

If the idea is to reflect appendix A of Unicode 5, it talks about
stripping leading zeros until four digits are left.  In table A.1
it has U+HHHH vs. U-HHHHHHHHH, saying that this is the same as
\uHHHH vs. \UHHHHHHHH.  I haven't seen U-HHHHHHHH before, and I
won't miss it if you don't want to talk about it in the draft.

> my editorial judgment and preferences are going to prevail...
> at least until the document gets far enough along that I have
> to start arm-wrestling with the RFC Editor and _their_ judgment
> and preferences.

Oops, sorry, I thought the intended status was BCP, "an attempt
to prohibit escapes for UTF-8 entirely".

> If you don't like my writing style -- and many people don't --
> please take on these efforts yourselves and let me complain
> about your style (or not) some of the time.

I like it, your drafts are almost always very interesting.  I was
really surprised when I stumbled over RFC 2345 some day ago, it
supports UTF-8 for whois.  So that wasn't a DeNIC invention after
all, and it's clearly older than RFC 3912 chapter 4.

But "interesting" sometimes includes "controversial", and then the
intended status is relevant.  As you say it makes no sense to nit-
pick your personal preferences.

> nit-picking is easy and may be fun, but it tends to block,
> rather than contribute to, progress.

I don't read 99% of all drafts, and I don't try to contribute to
99.9%.  The remaining 0.1% somehow attracted my attention, often
because I like them, the opposite is also possible.  No idea how
easy it generally is, but this article alone took me about three
hours, because I tried to find out what "net ascii" really is, why
your "net utf-8" mentions an RFC that's not more available, where
the Unicode 5 notational conventions are, etc.

> unless your purpose is to introduce delays and lay down obstacles

Escaping that with <rant> and adding a qualifier doesn't make it
better.  If you don't want a discussion about the draft it's okay,
as you say we're free to submit our own drafts about this and / or
related topics.

Frank