Re: Unicode/UTF-8 issues (was: commentsondraft-ietf-sasl-anon-00)

Alexey Melnikov <mel@messagingdirect.com> Thu, 20 February 2003 20:10 UTC

Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1KKA8j01474 for ietf-sasl-bks; Thu, 20 Feb 2003 12:10:08 -0800 (PST)
Received: from rembrandt.esys.ca (IDENT:root@rembrandt.esys.ca [198.161.92.131]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1KKA6d01470 for <ietf-sasl@imc.org>; Thu, 20 Feb 2003 12:10:06 -0800 (PST)
Received: from messagingdirect.com (gagarin.isode.com [193.133.227.138]) (authenticated) by rembrandt.esys.ca (8.11.0.Beta0/8.11.0.Beta0) with ESMTP id h1KKBUx02009; Thu, 20 Feb 2003 13:11:30 -0700
Message-ID: <3E553615.C2717C86@messagingdirect.com>
Date: Thu, 20 Feb 2003 13:09:57 -0700
From: Alexey Melnikov <mel@messagingdirect.com>
Organization: ACI WorldWide / MessagingDirect
X-Mailer: Mozilla 4.79 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
CC: Philip Guenther <guenther@sendmail.com>, ietf-sasl@imc.org
Subject: Re: Unicode/UTF-8 issues (was: commentsondraft-ietf-sasl-anon-00)
References: <5.2.0.9.0.20030220092854.01a0bd18@127.0.0.1> <5.2.0.9.0.20030220103201.01a11718@127.0.0.1> <5.2.0.9.0.20030220115255.026602a0@127.0.0.1>
Content-Type: text/plain; charset="koi8-r"
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-sasl@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-sasl/mail-archive/>
List-ID: <ietf-sasl.imc.org>
List-Unsubscribe: <mailto:ietf-sasl-request@imc.org?body=unsubscribe>

"Kurt D. Zeilenga" wrote:

> Okay, how about I replace the grammar and the paragraph before
> it as follows:

This mostly works, see below.

>   A formal grammar for the client message using Augmented BNF [ABNF]
>   is provide below as a tool for understanding this technical
>   specification.
>
>   message     = [ email / token ]
>                 ;; MUST be prepared in accordance with Section 2
>
>   UTF1        = %x00-3F / %x41-7F ;; less '@' (U+0040)
>   UTF2        = %xC2-DF UTF0
>   UTF3        = %xE0 %xA0-BF UTF0 / %xE1-EC 2(UTF0) /
>                 %xED %x80-9F UTF0 / %xEE-EF 2(UTF0)
>   UTF4        = %xF0 %x90-BF 2(UTF0) / %xF1-F3 3(UTF0) /
>                 %xF4 %x80-8F 2(UTF0)
>   UTF0        = %x80-BF
>
>   TCHAR       = UTF1 / UTF2 / UTF3 / UTF3 / UTF4
>               ;; any UTF-8 encoded Unicode character
>               ;; except '@' (U+0040)

Hmm, this suggests a typo in draft-yergeau-rfc2279bis-04.txt (UTF3 is
shown twice).

Also note, that this doesn't have UTF5 & UTF6 that you currently have:

      UTF5        = %xF8-FB 4(UTF0)
      UTF6        = %xFC-FD 5(UTF0)

(but they have to be cleaned up to prevent overlong sequences).

>   email       = addr-spec
>               ;; as defined in [IMAIL], except with no free
>               ;; insertion of linear-white-space, and the
>               ;; local-part MUST either be entirely enclosed in
>               ;; quotes or entirely unquoted
>
>   token       = 1*255TCHAR
>
> and add an appropriate acknowledgement.

Regards,
Alexey Melnikov
__________________________________________
R & D, ACI Worldwide/MessagingDirect
Watford, UK

Work Phone: +44 1923 81 2877
Home Page: http://orthanc.ab.ca/mel

I speak for myself only, not for my employer.
__________________________________________