Re: Comments on draft-klensin-net-utf8-06
John C Klensin <john-ietf@jck.com> Tue, 16 October 2007 15:51 UTC
Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ihoi1-0005Hr-6I; Tue, 16 Oct 2007 11:51:53 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1Ihohy-0005EY-TS for discuss-confirm+ok@megatron.ietf.org; Tue, 16 Oct 2007 11:51:50 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ihohx-0005Bb-Tg for discuss@apps.ietf.org; Tue, 16 Oct 2007 11:51:49 -0400
Received: from ns.jck.com ([209.187.148.211] helo=bs.jck.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Ihohn-00019W-Av for discuss@apps.ietf.org; Tue, 16 Oct 2007 11:51:49 -0400
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1Ihohb-000EIL-KS; Tue, 16 Oct 2007 11:51:27 -0400
Date: Tue, 16 Oct 2007 11:51:27 -0400
From: John C Klensin <john-ietf@jck.com>
To: Marcos Sanz/Denic <sanz@denic.de>, discuss@apps.ietf.org
Subject: Re: Comments on draft-klensin-net-utf8-06
Message-ID: <1CEEB76FCFC0070A7B2BDEAE@[10.1.0.164]>
In-Reply-To: <OF037DA1CA.695DAFC1-ONC1257376.004E5008-C1257376.00511560@notes.denic.de>
References: <OF037DA1CA.695DAFC1-ONC1257376.004E5008-C1257376.00511560@notes.denic.de>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 22bbb45ef41b733eb2d03ee71ece8243
Cc:
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org
--On Tuesday, 16 October, 2007 16:45 +0200 "Marcos Sanz/Denic" <sanz@denic.de> wrote: > To my eyes the document is in good shape, but it still leaves > one of the issues at stake partially open: > > Section 2, bullet 2 says "CR SHOULD NOT appear except when > followed by LF". The last paragraph of the section 2 says "CR > MUST NOT appear unless it is immediately followed by LF (...) > or NUL". To me the first of these statements is much less > restrictive than the second. Marcos (and others who have been trapped by this), While I would welcome suggestions about other text and ways to organize this, the two statements are perfectly consistent with each other, not a more restrictive/ less restrictive contradiction. It won't fit with the existing text and layout in this form, but what is being said is: if there is a CR, it MUST be followed by either LF or NUL however, NUL SHOULD be avoided too, so CR LF is the only recommended context in which CR SHOULD be used. > Nitpicking: > > * Section 1.1: s/variable length/variable-length/ for > coherence with other text appearances fixed in working draft. > * Section 3: s/to convert all canonically equivalent sequences > a single unique form/to convert all canonically equivalent > sequences into a single unique form/ s/into/to/, but fixed. > * Section 4: The definition of the normalization stability is > misleading, actually even wrong. > > Old text: > > That is, if a string does not contain any unassigned > characters, and it is normalized according to NFC, it will > always be normalized according to all future versions of the > Unicode Standard. > > Suggested text: > > That is, if a string does not contain any unassigned > characters for a given version of Unicode, and it is > normalized according to > the definition of NFC in that version, it will always result > in the same normalized string according to all future > versions of the Unicode Standard. The text that was used was supplied by Mark Davis after my first attempt didn't come out right. I'm happy to put anything in there that people agree to be correct, but please sort this out with him and/or other UTC members. > * Section 4: "the string order of RFC 3629". It's not very > clear to me what is meant with this. Byte order? Sorting > order? 3629 specifies a byte order (in section 4). It does not address or mention sort order except to note (in the introduction) that UTF-8 preserves it and that sort order based on code point sequence is likely to be fairly useless. I _think_ I would welcome text to clarify this but please note that it is not likely to be possible to use this spec without understanding and following 3629 (that is what the normative reference is all about). So I am loathe to cover things that are well-covered in 3629 lest more confusion be created. > * Section 4: I would drop the last paragraph, since it is a > repetition of what is exhaustively explained in section 5.2. > I got a parsing error at the last sentence of that paragraph > anyway. Hmm. It parses for me. But I agree about the redundancy, except for that last sentence, which makes a normative assertion about this specification that does not appear in Section 5. That last sentence could be restated, less formally, as: If one encounters a UTF-8 string in a protocol, and its syntax and properties are not specifically defined, then it is reasonable to assume that it conforms to this specification. That assumption might, of course, be wrong, which is one reason it is important to be careful about having string-receivers assume that the string-transmitter normalized it. I would appreciate input from others about what to do about this. > * Section 5.2: s/[RFC3454])/[RFC3454]),/ It is correct as written ("..stored in normalized form if...") although the long parenthetical note is a little obnoxious. Suggestions welcome. > * Section 5.2, bullet 4: "This process has been discussed in > the Unicode Consortium under the name 'Stable NFC'". That > might very well be but the only hits I get when googling for > that string are this very draft and some contributions of the > draft author to some mailing lists. So I cast doubt on the > utility of introducing this new term here which is a pointer > to nowhere. Is this again referring to the normalization > stability policy of unicode? > http://unicode.org/standard/stability_policy.html No. I will recheck the terminology. I'm going to hold the document for a few days before re-posting in the hope of getting comments from others. thanks and regards, john
- Comments on draft-klensin-net-utf8-06 Marcos Sanz/Denic
- Re: Comments on draft-klensin-net-utf8-06 John C Klensin
- Re: Comments on draft-klensin-net-utf8-06 Frank Ellermann
- Re: Comments on draft-klensin-net-utf8-06 Marcos Sanz/Denic
- Re: Comments on draft-klensin-net-utf8-06 Clive D.W. Feather
- Re: Comments on draft-klensin-net-utf8-06 John C Klensin