[EAI] Comments on draft-ietf-eai-imap-utf8-05.txt

Alexey Melnikov <alexey.melnikov@isode.com> Tue, 23 June 2009 09:04 UTC

Return-Path: <alexey.melnikov@isode.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4172D3A6926 for <ima@core3.amsl.com>; Tue, 23 Jun 2009 02:04:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[AWL=-0.540, BAYES_00=-2.599, SARE_LWSHORTT=1.24]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SYRhbSkwr-a9 for <ima@core3.amsl.com>; Tue, 23 Jun 2009 02:04:45 -0700 (PDT)
Received: from rufus.isode.com (rufus.isode.com [62.3.217.251]) by core3.amsl.com (Postfix) with ESMTP id 6DCE73A68AF for <ima@ietf.org>; Tue, 23 Jun 2009 02:04:44 -0700 (PDT)
Received: from [172.16.2.109] (shiny.isode.com [62.3.217.250]) by rufus.isode.com (submission channel) via TCP with ESMTPA id <SkCasQBV9LFl@rufus.isode.com>; Tue, 23 Jun 2009 10:04:57 +0100
Message-ID: <4A409A87.602@isode.com>
Date: Tue, 23 Jun 2009 10:04:07 +0100
From: Alexey Melnikov <alexey.melnikov@isode.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915
X-Accept-Language: en-us, en
To: ima@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: [EAI] Comments on draft-ietf-eai-imap-utf8-05.txt
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Jun 2009 09:04:46 -0000

This version doesn't address most of the issues I've raised earlier:
 http://www.ietf.org/mail-archive/web/ima/current/msg02637.html

Here is the list of my original comments, with a couple of addressed 
issues removed. I've also added Pete's comments on some of them.
===========================

> 3.1.  IMAP UTF-8 Quoted Strings

 [...]

>
>    All IMAP servers SHOULD accept UTF-8 in mailbox names and IMAP
>    servers which support the "Mailbox International Naming Convention"
>    described in RFC 3501 section 5.1.3 MUST accept utf8-quoted mailbox
>    names and convert them to the appropriate internal format.  [TBD
>    stringprep for mailbox names?  Can we reuse SASLprep?].

Note that I was faced with a similar issue in the ManageSieve protocol, 
where script names are in UTF-8. After some discussion in the Sieve WG 
we've decided to go with something like this:

   A Sieve script name is a sequence of Unicode characters encoded in
   UTF-8 [UTF-8].  A script name MUST comply with Net-Unicode Definition
   (Sectio 2 of [NET-UNICODE]), with the following additional
   restrictions:

   o  0000-001F; [CONTROL CHARACTERS]

   o  007F; DELETE

   o  0080-009F; [CONTROL CHARACTERS]

   o  2028; LINE SEPARATOR

   o  2029; PARAGRAPH SEPARATOR

Where:

[NET-UNICODE] Klensin, J. and M. Padlipsky, "Unicode Format for Network
              Interchange", RFC 5198, March 2008.

[Pete: OK]

>    IMAP servers MUST NOT accept UTF-8 characters when storing a new
>    message keyword, unless the mailbox is UTF-8 only, in which case IMAP
>    servers SHOULD accept UTF-8 in message keywords.  [TBD stringprep for
>    message keywords?  Can we reuse SASLprep?]

Haven't we decided to leave keywords as ASCII some time ago?
I don't have a strong opinion either way.

But anyway, this should be similar to the above, with some additional 
restrictions imposed by IMAP: spaces and some US-ASCII separator 
characters are not allowed.

[Pete: OK]

> 3.4.  UTF-8 Interaction with IMAP4 LIST Command Extensions
>
>    When an IMAP server advertises both the "UTF8" capability and the
>    "LIST-EXTENEDED" [RFC5258] capability, the server MUST support the
>    LIST extensions described in this section.  When an IMAP server
>    advertises the UTF8=ONLY capability and the LIST-EXTENDED capability,
>    the server MUST reject these LIST extensions with a BAD response.

Ok, I might be confused here, but I think this contradicts description 
of the UTF8=ONLY capability in section 7:

   This capability permits an IMAP server to advertise that it does not
   support the international mailbox name convention (modified UTF-7),
   and does not permit selection or examination of any mailbox unless
   the UTF8 parameter is provided.

[Pete: I think the point here is that if you have UTF8=ONLY, then it is 
pointless to use the UTF8 or UTF8ONLY param to LIST, since all of the 
mailboxes only support UTF-8 headers. But I guess the param will do no 
harm either. Again, I'll ask Chris.

Alexey:
Pointless for a client that only supports UTF8=ONLY - probably.
However, I think the intent is to encourage client deployments now. In 
short term there would be no UTF8=ONLY servers, so clients would need to 
work when UTF8 and UTF8ONLY are required. I don't think you want to 
require two code paths in such clients.]

> 3.4.1.  UTF8 and UTF8ONLY LIST Selection Options
>
>    The UTF8 LIST selection option tells the server to include mailboxes
>    that only support UTF-8 headers in the output of the list command.
>    The UTF8ONLY LIST selection option tells the server to include all
>    mailboxes that support UTF-8 headers and to exclude mailboxes that
>    don't support UTF-8 headers.  Note that UTF8ONLY implies UTF8 so it
>    is not necessary for the client to request both.  Use of either
>    selection option will also result in UTF-8 mailbox names in the
>    result as described in Section 3.3.

This need to clarify that either selection option implies the UTF8 
return option described in section 3.4.2,
which in its term means that "\NoUTF8" and "\UTF8Only" mailbox 
attributes need to be returned when necessary.
The only place where this information can be found is in the IANA 
registration itself. I think most readers would miss that.

[Pete: OK]

> 3.4.2.  UTF8 LIST Return Option
>
>    If the client supplies the UTF8 LIST return option, then the server
>    MUST include either the \NoUTF8 or the \UTF8Only mailbox attribute as
>    appropriate.  The \NoUTF8 mailbox attribute indicates an attempt to
>    SELECT or EXAMINE that mailbox with the UTF8 parameter will fail with
>    a [NOT-UTF-8] response code.  The \UTF8Only mailbox attribute
>    indicates an attempt to SELECT or EXAMINE that mailbox without the
>    UTF8 parameter will fail with a [UTF-8-ONLY] response code.  Note
>    that computing this information may be expensive on some server
>    implementations so this return option should not be used unless
>    necessary.
>
>    The ABNF [RFC5234] for these LIST extensions follows:
>
>      list-select-independent-opt =/ "UTF8" / "UTF8ONLY"

This has some implications that you might have not thought about: 
extending this productions means that these options don't interact with 
other options, such as RECURSIVEMATCH. While I think the UTF8 option 
should behave in the same way as the REMOTE option (as it might 
potentially extend the list of returned mailboxes), I think UTF8ONLY 
should be treated more like SUBSCRIBED (as it subsets the list of 
returned mailboxes).

So, if you agree, I suggest changing this to:

     list-select-independent-opt =/ "UTF8"
     list-select-base-opt =/ "UTF8ONLY"

[Pete: OK]

>      mbox-list-oflag             =/ "\NoUTF8" / "\UTF8Only"

I think you've copied typo from an old LIST-EXTENDED draft: please 
change "mbox-list-oflag" to "mbx-list-oflag".

[Pete: OK]

  [...]

> 4.  UTF8=APPEND Capability
>
>    If the UTF8=APPEND capability is advertised, then the server accepts
>    UTF-8 headers in the APPEND command message argument.  A client which
>    sends a message with UTF-8 headers to the server MUST include the
>    UTF8 APPEND parameter.  The ABNF for this APPEND parameter follows:
>
>      append-ext    =/ "UTF8"

This is probably Ok, but note that RFC 4466 required each label to have 
a parameter. Whether this is a bug, or there was a reason for that, I 
currently don't remember.

[Pete:

OK, we discussed offline, and the above is wrong. Instead, we think we 
need to replace with:

utf8-literal = "UTF8" SP "(" literal8 ")"
append-data =/ utf8-literal

We will also need to update CATENATE (RFC 4469) by adding:

cat-part =/ utf8-literal


Does that look OK to everyone? (And if you don't have the stomach to 
read 4466, your silence will be taken as acceptance.)]