Re: Last Call: draft-klensin-net-utf8 (Unicode Format for NetworkInterchange) to Proposed Standard
John C Klensin <john-ietf@jck.com> Thu, 10 January 2008 17:19 UTC
Return-path: <ietf-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JD13k-0001zZ-UF; Thu, 10 Jan 2008 12:19:16 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JD13j-0001z6-35 for ietf@ietf.org; Thu, 10 Jan 2008 12:19:15 -0500
Received: from ns.jck.com ([209.187.148.211] helo=bs.jck.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1JD13i-00043s-Ey for ietf@ietf.org; Thu, 10 Jan 2008 12:19:15 -0500
Received: from [127.0.0.1] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.34) id 1JD13h-000FG8-UY; Thu, 10 Jan 2008 12:19:14 -0500
Date: Thu, 10 Jan 2008 12:19:12 -0500
From: John C Klensin <john-ietf@jck.com>
To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, ietf@ietf.org
Message-ID: <27B4E7C374D57C2B18252533@p3.JCK.COM>
In-Reply-To: <fm59k5$eqf$1@ger.gmane.org>
References: <E1JByfQ-0002Xd-Oh@stiedprstage1.ietf.org> <20080110103311.GA19519@nic.fr> <fm59k5$eqf$1@ger.gmane.org>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 244a2fd369eaf00ce6820a760a3de2e8
Cc:
Subject: Re: Last Call: draft-klensin-net-utf8 (Unicode Format for NetworkInterchange) to Proposed Standard
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Errors-To: ietf-bounces@ietf.org
--On Thursday, 10 January, 2008 15:21 +0100 Frank Ellermann <nobody@xyzzy.claranet.de> wrote: >... > Hopefully somebody can confirm that IND is correct, or not. > For HT and FF I hope the final version will somehow express > that both are not really bad, and as far as they're bad FF is > worse than HT. I'm open to consensus about changes for either HT or FF, but the theory of "bad" that was used to construct the spec was: (i) If a "spacing" control has the effect of setting the position of the next character, it is "bad" unless that position is unambiguous. In addition, things are "bad" unless they are necessary in running text (as distinct from faking things that are better handled in markup, followed by either device-specific output or standard page representations, neither of which are normal text). It is unambiguous for SP. It is unambiguous for CRLF. Independent of the "what is a line-end" problem, it is somewhat ambiguous for CR or LF alone and for IND. It is ambiguous for HT. It would be ambiguous for FF except that FF is assigned fairly clear semantics in NVT -- "FF" is not a line ending (CRLF FF is needed) and as Bob Braden noted, there is a fairly clear rule that FF is to be interpreted as "top of next page" if one knows what a page is and as "blank line" otherwise. But that rule is sufficiently often ignored to call for considerable caution about FF, and the text now contains a cautionary note for that reason. There is an interesting demonstration of the law of unintended consequences here. If we could tell that a string was unambiguously UTF-8 (or whatever) by looking at it, even if it contains nothing but ASCII characters, then there would be no reason to try to make net-utf8 a proper superset of NVT. If we could do that, we could also do away with the entire "next line" debate by prohibiting even CRLF and requiring the use of LS (U+2028). In retrospect, there might have been considerable advantages to forcing the ASCII- UTF-8 distinction by requiring that UTF-8 strings all start with a BOM, but it is far too late for that (and probably not, on balance, a good idea despite its advantages). So I don't see how to get there from here -- we are stuck, for historical reasons, with CRLF on the wire as what The Unicode Standard calls NLF (incidentally, Unicode 5.0, Section 5.8, provides significant insight into the complexity of this problem and probably should have been referenced. It would be even more helpful had Table 5-2 included identifying CRLF as a standard Internet "wire" form of NLF, not just binding that form to Windows. > My impression from reading the draft was exactly the opposite, > FF not too bad, HT really bad, that's odd for protocols > allowing WSP. See above. john _______________________________________________ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… Bob Braden
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Stephane Bortzmeyer
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… Kent Karlsson
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Frank Ellermann
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… John C Klensin
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Frank Ellermann
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… Karlsson, Kent
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… Karlsson, Kent
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Frank Ellermann
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Stephane Bortzmeyer
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… Kent Karlsson
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… John C Klensin
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Stephane Bortzmeyer
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… John C Klensin
- Re: Last Call: draft-klensin-net-utf8 (Unicode Fo… Frank Ellermann
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… Kent Karlsson
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… michael.dillon
- RE: Last Call: draft-klensin-net-utf8 (Unicode Fo… John C Klensin