Re: draft-klensin-unicode-escapes-01
John C Klensin <john-ietf@jck.com> Sat, 03 February 2007 17:07 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HDOLw-0000B2-Jk; Sat, 03 Feb 2007 12:07:04 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HDOLv-00008o-8M for discuss@apps.ietf.org; Sat, 03 Feb 2007 12:07:03 -0500
Received: from ns.jck.com ([209.187.148.211] helo=bs.jck.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HDOLq-00033b-Nf for discuss@apps.ietf.org; Sat, 03 Feb 2007 12:07:03 -0500
Received: from [127.0.0.1] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.34) id 1HDOLp-000O7W-To; Sat, 03 Feb 2007 12:06:58 -0500
Date: Sat, 03 Feb 2007 12:06:56 -0500
From: John C Klensin <john-ietf@jck.com>
To: Frank Ellermann <nobody@xyzzy.claranet.de>, discuss@apps.ietf.org
Subject: Re: draft-klensin-unicode-escapes-01
Message-ID: <AF334D6BB0BFF3037B0DE609@p3.JCK.COM>
X-Mailer: Mulberry/4.0.7 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 1.1 (+)
X-Scan-Signature: 963faf56c3a5b6715f0b71b66181e01a
Cc:
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org
--On Saturday, 03 February, 2007 03:03 +0100 Frank Ellermann <nobody@xyzzy.claranet.de> wrote: > John C Klensin wrote: > >> Sorry, but, IMO, for a document like this which is merely >> giving examples (at this point), "very widely deployed and >> used" trumps "horrible". > > When I mentioned hex. NCRs I meant XML, not SGML and its many > ways to save keystrokes. And I was commenting only on the suggestion that all reference to HTML be removed, however horrible it is. XML is, as far as I know, already covered. If it isn't covered adequately, I don't understand your suggestion. >>> Probably the I-D should mention that one famous exception >>> from its rule to avoid encoded UTF-8 is the URL form of IRIs. > >> Why? I don't see that (and a few other cases) as "exceptions" >> (famous or not), but as mistakes from which we should learn >> and, I hope, have learned. > > The overall RFC 2277 rule is IMO "if you don't know what it is > or can't say what it is, it's UTF-8, and if that theory fails > it's UNKNOWN-8BIT". And unlike RFC 2231 an URL can't say what > it is. I think I've understood both of those things. I just haven't seen the justification or requirement to start exploring existing protocols in this document, famous or not. I'm willing to be convinced but note that every time I put something into a document that is not strictly necessary I get attacked for excessive length, etc. >>> wouldn't it be better to say "21 bits - rather than the 7 >>> bits of ASCII -"? > >> No, because of net-ascii and some other issues, probably not. >> Again, because the important issue here is that this stuff is >> about escapes, you are picking nits that belong elsewhere > > That nit is closely related to escape mechanisms providing for > 31 or 32 bits, and attempts to get rid of leading zeros in > these mechanisms, which could fail without explicit delimiters. Sure. But we routinely express ASCII in terms of octets. We don't use the "7-bit" or "21-bit" language very often. And wiring it into protocols and conventions just gets us into trouble with other standards bodies change their minds and decide that however many bits they though was certainly enough, wasn't. Consider the evolution from "6 bits is enough" (with BCD and other code sets) to "7 bits is enough" (with ASCII and ISO 646), to "8 bits is enough" (with 8859 (including the unfortunately-named ASCII-8), EBCDIC, and others), to "16 bits is certainly enough" (with Unicode 1.0), to "we will never need more than 21" (current Unicode). Of course, the IETF and its predecessors have made the same errors, starting with the conviction that we would never need more than 255 network nodes, to 32 bits with IPv4, to 128 bits with IPv6, to some suspicions that, depending on allocation policies, the latter may turn out to not be enough in a world with sensor networks, multiple connectivity paths, and many, many small IP-connected devices. I don't think I disagree with your point -- it is certainly factual-- but don't yet see the need to open this topic up in this document (see the comment about length, etc., above). I have taken \uNNNN and \UNNNNNNNN out of their special status. I've inserted an explicit discussion about why explicit delimiters (or terminators) are desirable. And, with the working draft for -02, I've made additional comments, as an extra bullet in Section 4, about the inability to tell the difference, just by looking, between \uNNNN as a short, BMP-only, form of the \uNNNN and \UNNNNNNNN pair, a short form for \uNNNNN or \uNNNNNN, and \uNNNN as an octet encoding for UTF-16 as reasons why none of those "\u" form (without delimiters) are attractive. What else do you suggest and why? Text please. >> U+NNNN[N[N]] versus U+[[N]N]NNNN is a matter of taste. > If the idea is to reflect appendix A of Unicode 5, it talks > about stripping leading zeros until four digits are left. Which defines the contents and semantics of the NNNN... string, not the syntax. I can argue this either way (as I presume you can). I just find the second form harder to read. > In > table A.1 it has U+HHHH vs. U-HHHHHHHHH, saying that this is > the same as \uHHHH vs. \UHHHHHHHH. I haven't seen U-HHHHHHHH > before, and I won't miss it if you don't want to talk about it > in the draft. Thanks. I don't intend to talk about it unless someone makes a very persuasive argument. >> my editorial judgment and preferences are going to prevail... >> at least until the document gets far enough along that I have >> to start arm-wrestling with the RFC Editor and _their_ >> judgment and preferences. > > Oops, sorry, I thought the intended status was BCP, "an attempt > to prohibit escapes for UTF-8 entirely". Well, we clearly can't do that without making the spec retroactive on existing protocols, especially the famous IRI situation. But, yes, the target is a standards track document of some favor... see below. >> If you don't like my writing style -- and many people don't -- >> please take on these efforts yourselves and let me complain >> about your style (or not) some of the time. > > I like it, your drafts are almost always very interesting. I > was really surprised when I stumbled over RFC 2345 some day > ago, it supports UTF-8 for whois. So that wasn't a DeNIC > invention after all, and it's clearly older than RFC 3912 > chapter 4. For whatever it is worth, I've become convinced, as I have delved further into the history of telnet-based protocols, that it was a mistake. And I have no reason to believe that DENIC looked at that document rather than inventing the idea independently. If they did draw inspiration from it, I think I'd be suitably chagrined :-( > But "interesting" sometimes includes "controversial", and then > the intended status is relevant. As you say it makes no sense > to nit- pick your personal preferences. In this case, and probably more generally, the comment was strictly about editorial preferences, not substantive issues of protocol design or recommendations. See below. >> nit-picking is easy and may be fun, but it tends to block, >> rather than contribute to, progress. > > I don't read 99% of all drafts, and I don't try to contribute > to 99.9%. The remaining 0.1% somehow attracted my attention, > often because I like them, the opposite is also possible. No > idea how easy it generally is, but this article alone took me > about three hours, because I tried to find out what "net > ascii" really is, why your "net utf-8" mentions an RFC that's > not more available, where the Unicode 5 notational conventions > are, etc. My apologies. And I'm flattered that you thought it worth the time. I'm typically willing to answer questions about comments of mine that seem obscure if that is more efficient for you. >> unless your purpose is to introduce delays and lay down >> obstacles > > Escaping that with <rant> and adding a qualifier doesn't make > it better. If you don't want a discussion about the draft > it's okay, as you say we're free to submit our own drafts > about this and / or related topics. Frank, I welcome, and want, comments and suggestions. The fact that I completely revamped the documents and what it suggested between -00 and and -01 in response to comments critical of the recommendation about \u and \U should be evidence of that. On this document, virtually every substantive suggestion or comment has resulted in some changes to the text. I count "this sentence is incomprehensible and needs to be fixed" as substantive in that regard. As with open source programs and programming, I believe that having multiple critical eyes on one of these things makes it better, and that certainly has been the case here. What I've objected to, here and elsewhere, has been what seemed to be high-frequency nit-picking about editorial style and preferences. From my point of view, that is irritating and it does little or nothing to advance the work. More important, especially when I'm working on multiple documents in parallel, I often start feeling like a Bear of Very Little Brain (apologies if that reference is not obvious to you): while long words don't bother me, I start getting scared that I will miss something substantive and important among flurries of editorial quibbling. john
- Re: draft-klensin-unicode-escapes-01 John C Klensin
- Re: draft-klensin-unicode-escapes-01 Frank Ellermann