Re: Comments on draft-klensin-net-utf8-06

"Frank Ellermann" <nobody@xyzzy.claranet.de> Wed, 17 October 2007 06:24 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ii2KR-00077w-L5; Wed, 17 Oct 2007 02:24:27 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1Ii2KQ-00076a-01 for discuss-confirm+ok@megatron.ietf.org; Wed, 17 Oct 2007 02:24:26 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ii2KO-0006yQ-MZ for discuss@apps.ietf.org; Wed, 17 Oct 2007 02:24:24 -0400
Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Ii2KD-0007if-NH for discuss@apps.ietf.org; Wed, 17 Oct 2007 02:24:20 -0400
Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Ii2Jv-0002ht-OK for discuss@apps.ietf.org; Wed, 17 Oct 2007 06:23:55 +0000
Received: from mail.st-michaelis.de ([217.86.170.58]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Wed, 17 Oct 2007 06:23:55 +0000
Received: from nobody by mail.st-michaelis.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for <discuss@apps.ietf.org>; Wed, 17 Oct 2007 06:23:55 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: discuss@apps.ietf.org
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Subject: Re: Comments on draft-klensin-net-utf8-06
Date: Wed, 17 Oct 2007 08:21:08 +0200
Lines: 67
Message-ID: <ff49pf$rfn$1@ger.gmane.org>
References: <OF037DA1CA.695DAFC1-ONC1257376.004E5008-C1257376.00511560@notes.denic.de> <1CEEB76FCFC0070A7B2BDEAE@[10.1.0.164]>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: mail.st-michaelis.de
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Spam-Score: 0.0 (/)
X-Scan-Signature: f60d0f7806b0c40781eee6b9cd0b2135
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

John C Klensin wrote:

>> * Section 4: "the string order of RFC 3629". It's not very
>> clear to me  what is meant with this. Byte order? Sorting
>> order?
 
> 3629 specifies a byte order (in section 4).  It does not address
> or mention sort order except to note (in the introduction) that
> UTF-8 preserves it and that sort order based on code point
> sequence is likely to be fairly useless.
 
> I _think_ I would welcome text to clarify this

Please simplify the remark and remove one "that":

-| Were Unicode to be changed in a way that violated these
-| assumptions, i.e., that either invalidated the string order
-| of RFC 3629 or that that changed the stability of NFC as
-| stated above, this specification would not apply.

+| Were Unicode to be changed in a way that violated these
+| assumptions, i.e., that changed the stability of NFC as
+| stated above, this specification would not apply.

UTF-8 as specified in STD 63 is stable.  

> So I am loathe to cover things that are well-covered in 
> 3629 lest more confusion be created.

Yes, just don't mention the "byte-value lexicographic sorting
order of UTF-8 strings", it's covered in STD 63, and besides
not very interesting.

>> * Section 4: I would drop the last paragraph, since it is a
>> repetition of  what is exhaustively explained in section 5.2.
>> I got a parsing error at  the last sentence of that paragraph
>> anyway.

Indeed, that paragraph is unnecessary.  I also can't parse its
first sentence.

> That last sentence could be restated, less formally, as:
 
> If one encounters a UTF-8 string in a protocol, and its
> syntax and properties are not specifically defined, then
> it is reasonable to assume that it conforms to this
> specification.

I still don't understand this.  What is an UTF-8 string with
"unspecified syntax" ?  STD 63 specifies the syntax of UTF-8,
anything not following this syntax is invalid.

The net-utf8 I-D doesn't specify any default properties, what
is an assumption that "unspecified properties" conform to
net-utf8 supposed to do ?  

If you're talking about unassigned code points please say so.
In that case it's covered in 5.2, and you can simply delete
the last paragraph of section 4.

> I'm going to hold the document for a few days before 
> re-posting in the hope of getting comments from others.

Please update the [NFC] reference, s/March 2005/2006-10-12/
for the version belonging to TUS 5.0.

 Frank