Unicode/UTF-8 issues (was: comments on draft-ietf-sasl-anon-00)

"Kurt D. Zeilenga" <Kurt@OpenLDAP.org> Thu, 20 February 2003 18:06 UTC

Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1KI6Mc24981 for ietf-sasl-bks; Thu, 20 Feb 2003 10:06:22 -0800 (PST)
Received: from pretender.boolean.net (root@router.boolean.net [198.144.206.49]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1KI6Kd24969 for <ietf-sasl@imc.org>; Thu, 20 Feb 2003 10:06:20 -0800 (PST)
Received: from nomad.OpenLDAP.org (root@localhost [127.0.0.1]) by pretender.boolean.net (8.12.6/8.12.6) with ESMTP id h1KI69xH029409; Thu, 20 Feb 2003 18:06:09 GMT (envelope-from Kurt@OpenLDAP.org)
Message-Id: <5.2.0.9.0.20030220092854.01a0bd18@127.0.0.1>
X-Sender: kurt@127.0.0.1
X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9
Date: Thu, 20 Feb 2003 10:04:34 -0800
To: Philip Guenther <guenther@sendmail.com>
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: Unicode/UTF-8 issues (was: comments on draft-ietf-sasl-anon-00)
Cc: ietf-sasl@imc.org
In-Reply-To: <200302200807.h1K87JV13985@katroo.Sendmail.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-sasl@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-sasl/mail-archive/>
List-ID: <ietf-sasl.imc.org>
List-Unsubscribe: <mailto:ietf-sasl-request@imc.org?body=unsubscribe>

At 11:54 PM 2/19/2003, Philip Guenther wrote:
>The syntax for UTF-8 characters in the draft permits "non-shortest form"
>encodings

I'm not exactly sure what you are referring to here.  The
draft says that the trace information is transferred as
string of UTF-8 encoded Unicode characters.  A non-shorted
form UTF-8 encoding of a Unicode character is invalid per
RFC 2247.  I believe draft-yergeau-rfc2279bis-04.txt is
more clear on this, so I'll change the reference.

If, however, you mean that the string of Unicode characters
is not normalized using an algorithm which produces the
minimum number of code points, yes.  This is as intended.

>and encodings for characters outside the range of the Unicode
>character set.

Unassigned code points may be transferred.  I will clarify
that as I noted in another post.  However, I believe the
specification is clear that characters "outside the range",
e.g. not in the repertoire of characters, are invalid.

>Neither may be interpreted by a program conforming to
>the Unicode 3.2 specification, which the draft references.

Which is exactly why the specification restricts the string
to a particular repertoire of characters.

Kurt