Re: Comments on Unicode Format for Network Interchange
Frank Ellermann <nobody@xyzzy.claranet.de> Mon, 18 June 2007 21:23 UTC
Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
by megatron.ietf.org with esmtp (Exim 4.43)
id 1I0Ogs-0005F7-P9; Mon, 18 Jun 2007 17:23:14 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43)
id 1I0Ogr-0005Ey-GO for discuss-confirm+ok@megatron.ietf.org;
Mon, 18 Jun 2007 17:23:13 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
by megatron.ietf.org with esmtp (Exim 4.43) id 1I0Ogr-0005Eq-2s
for discuss@apps.ietf.org; Mon, 18 Jun 2007 17:23:13 -0400
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org)
by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I0Ogq-000106-Gt
for discuss@apps.ietf.org; Mon, 18 Jun 2007 17:23:13 -0400
Received: from list by ciao.gmane.org with local (Exim 4.43)
id 1I0Ogj-0003pv-9x
for discuss@apps.ietf.org; Mon, 18 Jun 2007 23:23:05 +0200
Received: from dialin-145-254-045-028.pools.arcor-ip.net ([145.254.45.28])
by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
id 1AlnuQ-0007hv-00
for <discuss@apps.ietf.org>; Mon, 18 Jun 2007 23:23:05 +0200
Received: from nobody by dialin-145-254-045-028.pools.arcor-ip.net with local
(Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
for <discuss@apps.ietf.org>; Mon, 18 Jun 2007 23:23:05 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: Re: Comments on Unicode Format for Network Interchange
Date: Mon, 18 Jun 2007 23:18:14 +0200
Organization: <URL:http://purl.net/xyzzy>
Lines: 99
Message-ID: <4676F696.321@xyzzy.claranet.de>
References: <6bb028490704231048s41deaf57q33ddb21fd0e76f17@mail.gmail.com>
<462E9074.14BD@xyzzy.claranet.de>
<6bb028490706181034r78352061kda89f149d05620a2@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: dialin-145-254-045-028.pools.arcor-ip.net
X-Mailer: Mozilla 3.0 (OS/2; U)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e1b0e72ff1bbd457ceef31828f216a86
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols
<discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>,
<mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>,
<mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org
Markus Scherer wrote: > Sorry for the very late reply. No issue, I still have that "ping" dated 2007-03-29 sitting in my "Todos" folder and waiting for better times when nobody tries to update the core e-mail RfCs or similar emergencies... ;-) > I didn't understand from the internet-draft that it was only > geared towards telnet. > If it is indeed only about telnet, then the end-of-line defininition > is of course fine, and apologies for causing confusion. The "only" might be misleading, a bunch of other protocols used to be based on telnet, and inherited some features more or less clearly. I guess John has to explain in a new version of this I-D what exactly he's talking about (a long string of "updates RFCs x, y, z" in the header), and where he only proposes to update other protocols (like "whois") in the spirit of his I-D. Maybe 2821bis (SMTP) is an interesting example, at the moment it still "allows" control char.s in places where I'd prefer to get rid of them (as recommended in the "net-Unicode" I-D). Of course 2821bis will insist on CR LF for lineends. Just an example, it's the same issue for many IETF protocols. An IMO bad case is FTP, where "a" (= ascii text, and convert lineends for whatever passes as "local" convention) is the default. The "local" convention supported on my box is "only CR doesn't work as expected". And besides I rarely use FTP for text and get garbage when I forgot the "b" (binary). Fortunately I was never forced to grab a text file from an EBCDIC FTP server. :-) > If it is not intended just for telnet, then I believe > allowing several already-common forms of line endings is > simply pragmatic. It would cause havoc for some scripts I use (POP3, HTTP, SMTP, ident, whois, simple stuff). The SMTP and whois scripts fix a body LF to CR LF on the fly, but they'd break miserably if somebody thinks that a bare ASCII CR is a lineend. Let alone Latin-* NEL, or weirder scenarios. So it's not "only" telnet, but it still is about something you can completely ignore, unless you need to post a NetNews article with telnet to the NNTP port, or similar stunts. [BOM] > My suggestion was intended to clarify the specification for > when a Unicode signature is in fact included even when it is > not legal, not recommended, or simply not necessary. I find > it easier, in practice, to specify how to handle a situation > than to just require that it not occur. Okay, admittedly I don't like it if I get a mail where the body starts with a BOM, because my obsolete MUA treats UTF-8 as windows-1252 claiming that it's Latin-1. Something on the side of the sender could have silently removed this BOM. But the I-D is more general, it has no concept of "body" or SDU. Maybe it should mention the BOM issue in an example (?) For telnet it's IMO pointless, that's lines or rather control sequences plus text defining the content of a screen, the I-D can't say that a BOM should be silently removed when it would end up at the begin of a line. Ignoring BiDi, and besides the draft deprecates all control char.s, even HT, allowing only CR LF in this order. Is it your idea to eliminate BOM like the I-D eliminates SUB (= EOF on some platforms) ? If that's your point I think it's better solved in RFC 3629, that's already at STD. Otherwise the I-D could arguably also talk about non-characters, surrogates, overlong UTF-8, more-than-21-bits UTF-8, etc., all addressed in RFC 3629. > the last sentence of that paragraph, which reads > The stability of the Net-Unicode format is thus guaranteed > when any implementation that converts text into Net-Unicode > format does not permit unassigned characters. > should be deleted, because with a SHOULD for normalization the > stability of the Net-Unicode format does not depend on > normalization stability any more. I can't judge it, my last attempt to implement something in this direction ended again with giving up. IOW the SHOULD is wishful thinking as far as my scripts are concerned. Maybe I try it again with "level one" later, actually I was only curious what SASLPREP really does, in addition to its NFC part (it uses case sensitive NFKC plus tons of prohibited and mapped to nothing or mapped to space rules). Frank
- Comments on Unicode Format for Network Interchange Markus Scherer
- Re: Comments on Unicode Format for Network Interc… Frank Ellermann
- Re: Comments on Unicode Format for Network Interc… Markus Scherer
- Re: Comments on Unicode Format for Network Interc… Frank Ellermann