Re: Unicode/UTF-8 issues (was: commentsondraft-ietf-sasl-anon-00)

"Kurt D. Zeilenga" <Kurt@OpenLDAP.org> Thu, 20 February 2003 20:34 UTC

Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1KKYRm03125 for ietf-sasl-bks; Thu, 20 Feb 2003 12:34:27 -0800 (PST)
Received: from pretender.boolean.net (root@router.boolean.net [198.144.206.49]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1KKYQd03120 for <ietf-sasl@imc.org>; Thu, 20 Feb 2003 12:34:26 -0800 (PST)
Received: from nomad.OpenLDAP.org (root@localhost [127.0.0.1]) by pretender.boolean.net (8.12.6/8.12.6) with ESMTP id h1KKYRxH031004; Thu, 20 Feb 2003 20:34:27 GMT (envelope-from Kurt@OpenLDAP.org)
Message-Id: <5.2.0.9.0.20030220123001.026faf68@127.0.0.1>
X-Sender: kurt@127.0.0.1
X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9
Date: Thu, 20 Feb 2003 12:32:52 -0800
To: Alexey Melnikov <mel@messagingdirect.com>
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: Re: Unicode/UTF-8 issues (was: commentsondraft-ietf-sasl-anon-00)
Cc: Philip Guenther <guenther@sendmail.com>, ietf-sasl@imc.org
In-Reply-To: <3E553615.C2717C86@messagingdirect.com>
References: <5.2.0.9.0.20030220092854.01a0bd18@127.0.0.1> <5.2.0.9.0.20030220103201.01a11718@127.0.0.1> <5.2.0.9.0.20030220115255.026602a0@127.0.0.1>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-sasl@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-sasl/mail-archive/>
List-ID: <ietf-sasl.imc.org>
List-Unsubscribe: <mailto:ietf-sasl-request@imc.org?body=unsubscribe>

At 12:09 PM 2/20/2003, Alexey Melnikov wrote:

>"Kurt D. Zeilenga" wrote:
>
>> Okay, how about I replace the grammar and the paragraph before
>> it as follows:
>
>This mostly works, see below.
>
>>   A formal grammar for the client message using Augmented BNF [ABNF]
>>   is provide below as a tool for understanding this technical
>>   specification.
>>
>>   message     = [ email / token ]
>>                 ;; MUST be prepared in accordance with Section 2
>>
>>   UTF1        = %x00-3F / %x41-7F ;; less '@' (U+0040)
>>   UTF2        = %xC2-DF UTF0
>>   UTF3        = %xE0 %xA0-BF UTF0 / %xE1-EC 2(UTF0) /
>>                 %xED %x80-9F UTF0 / %xEE-EF 2(UTF0)
>>   UTF4        = %xF0 %x90-BF 2(UTF0) / %xF1-F3 3(UTF0) /
>>                 %xF4 %x80-8F 2(UTF0)
>>   UTF0        = %x80-BF
>>
>>   TCHAR       = UTF1 / UTF2 / UTF3 / UTF3 / UTF4
>>               ;; any UTF-8 encoded Unicode character
>>               ;; except '@' (U+0040)
>
>Hmm, this suggests a typo in draft-yergeau-rfc2279bis-04.txt (UTF3 is
>shown twice).

Or an error on my part.  UTF3 should only be listed once here.


>Also note, that this doesn't have UTF5 & UTF6 that you currently have:
>
>      UTF5        = %xF8-FB 4(UTF0)
>      UTF6        = %xFC-FD 5(UTF0)
>
>(but they have to be cleaned up to prevent overlong sequences).

Yes.  There are a couple of 4 v 6 octet cleanups to be made as well.