Re: draft-klensin-unicode-escapes-01

Frank Ellermann <nobody@xyzzy.claranet.de> Tue, 06 February 2007 19:57 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HEWRt-0003kg-Ph; Tue, 06 Feb 2007 14:57:53 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HEWRt-0003kb-AS for discuss@apps.ietf.org; Tue, 06 Feb 2007 14:57:53 -0500
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HEWRq-0001sd-Qu for discuss@apps.ietf.org; Tue, 06 Feb 2007 14:57:53 -0500
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1HEWRT-0006IL-V3 for discuss@apps.ietf.org; Tue, 06 Feb 2007 20:57:27 +0100
Received: from d255146.dialin.hansenet.de ([80.171.255.146]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Tue, 06 Feb 2007 20:57:27 +0100
Received: from nobody by d255146.dialin.hansenet.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Tue, 06 Feb 2007 20:57:27 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: Re: draft-klensin-unicode-escapes-01
Date: Tue, 06 Feb 2007 20:53:01 +0100
Organization: <URL:http://purl.net/xyzzy>
Lines: 89
Message-ID: <45C8DC9D.3D61@xyzzy.claranet.de>
References: <AF334D6BB0BFF3037B0DE609@p3.JCK.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: d255146.dialin.hansenet.de
X-Mailer: Mozilla 3.0 (OS/2; U)
X-Spam-Score: 1.1 (+)
X-Scan-Signature: f66b12316365a3fe519e75911daf28a8
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

John C Klensin wrote:

>> When I mentioned hex. NCRs I meant XML, not SGML and its many
>> ways to save keystrokes.

> And I was commenting only on the suggestion that all reference
> to HTML be removed, however horrible it is.

That's one of those mail communication breakdowns, my remark here
about the wonders of SGML (as seen or not in most browsers trying
to display HTML) was in reply to Clive.  It was no objection to or
suggestion for your I-D.  Of course I support CharMod C044 with
explicit delimiters (as in XML but not SGML) and CharMod C043 with
hex. escapes (something XML and SGML couldn't do, but your I-D and
RFC 4646 got it right.).

 [unlike RFC 2231 an URL can't say what it is]
> I think I've understood both of those things.  I just haven't
> seen the justification or requirement to start exploring existing
> protocols in this document, famous or not.

IIRC you have a SHOULD.  One accepted justification to violate a
SHOULD is "your new rule came too late for my old implementation",
and so far it's unnecessary to talk about it.  But IMO there can
be also reasons to violate this SHOULD in future protocols, if it's
in a context remotely related to IRIs.  Or similar situations where
say using B64-encoded UTF-8 is better than ASCII with hex. NCRs.

If you think that's obvious it's okay.  Sometimes folks ask why a
SHOULD is "only" a SHOULD, and want to know what a _good_ reason to
violate it could be (apart from the clear "too late"), and for that
I thought the IRI example might help.

 From a "protocol lawyer" POV, RFC 2324 is "only" informational <eg>

> every time I put something into a document that is not strictly
> necessary I get attacked for excessive length, etc.

If you think that an example for this SHOULD is unnecessary it's
fine.  With Murphy somebody will attack you later claiming that the
potential exceptions have to be spelled out.

 [21 bits vs. 7 bits]
> Sure.  But we routinely express ASCII in terms of octets.  We
> don't use the "7-bit" or "21-bit" language very often.

Yes, the matter of 21 vs. 31 bits was recently discussed on the
Unicode list in conjunction with a (hypothetical) "UTF-21", maybe
Clive had that discussion in mind.  I'm also fascinated by such
charset encoding details.  One reason that I've not yet published
an "UTF-4" I-D was RFC 4042 with its UTF-9 and UTF-18 "nonets".
Your "net UTF-8" I-D also mentions "nonets" (not using that name).

Of course you don't need to mention that matter in the "escapes"
I-D, unless you want to explain why old conventions demand _eight_
hex. digits where (today) _six_ should be good enough.  I only
tried to state that Clive's remark wasn't off topic or something.

> I don't think I disagree with your point -- it is certainly
> factual-- but don't yet see the  need to open this topic up in
> this document (see the comment about length, etc., above).

It's perfectly okay if you stick to the "octet layer" in this I-D.

FWIW, in theory "UTF-4" (like the "old" UTF-8) could be extented
to 31 bits (again), but of course it will never happen.  It would
break UTF-16, BOCU-1, and all implementations of STD 66.  A lame
excuse is that those aliens are supposed to bring their own kind
of "Intergalacode" when they need more bits.

 [about RFC 2345, unrelated to the Unicode escapes]
> I've become convinced, as I have delved further into the history
> of telnet-based protocols, that it was a mistake.

IBTD, if that's about using UTF-8 in whois.  UTF-8 doesn't need
some of the "critical" (wrt telnet) octets, especially no 0xFF.

RFC 3912 was a huge victory for any anti-1591 cabal, but the whois-
battlefield in the war on spam isn't completely lost yet... <beg>

 [about net-UTF-8, unrelated to the Unicode escapes]
> I'm typically willing to answer questions about comments of mine
> that seem obscure if that is more efficient for you.

The rest can wait for net-utf8-03.  If you have by chance an old
copy of RFC 97, it's AWOL in all RFC collections I've heard of.

Frank