[apps-discuss] (private) draft review of: draft-ietf-eai-rfc5336bis-07.txt (v3)

Dave CROCKER <dhc@dcrocker.net> Wed, 22 December 2010 18:03 UTC

Return-Path: <dhc@dcrocker.net>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 448D73A6925; Wed, 22 Dec 2010 10:03:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3n2X7E2rjUvK; Wed, 22 Dec 2010 10:03:00 -0800 (PST)
Received: from sbh17.songbird.com (sbh17.songbird.com [72.52.113.17]) by core3.amsl.com (Postfix) with ESMTP id 279EC3A6909; Wed, 22 Dec 2010 10:03:00 -0800 (PST)
Received: from [192.168.1.43] (adsl-67-127-191-82.dsl.pltn13.pacbell.net [67.127.191.82]) (authenticated bits=0) by sbh17.songbird.com (8.13.8/8.13.8) with ESMTP id oBMI4mYh013364 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Wed, 22 Dec 2010 10:04:53 -0800
Message-ID: <4D123DC0.2050501@dcrocker.net>
Date: Wed, 22 Dec 2010 10:04:48 -0800
From: Dave CROCKER <dhc@dcrocker.net>
Organization: Brandenburg InternetWorking
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Apps Discuss <apps-discuss@ietf.org>, ima@ietf.org, draft-ietf-eai-rfc5336bis@tools.ietf.org, SM <sm+ietf@elandsys.com>, Alexey.Melnikov@isode.com
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0 (sbh17.songbird.com [72.52.113.17]); Wed, 22 Dec 2010 10:04:55 -0800 (PST)
Subject: [apps-discuss] (private) draft review of: draft-ietf-eai-rfc5336bis-07.txt (v3)
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: dcrocker@bbiw.net
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Dec 2010 18:03:04 -0000

IETF Applications Review


I have been selected as an Applications Area Review Team reviewer for this
draft.

For background on apps-review, please see:
<http://www.apps.ietf.org/content/applications-area-review-team>

Please resolve these comments along with any other Last Call comments you
may receive. Please wait for direction from your document shepherd or AD
before posting a new version of the draft.



Document: draft-ietf-eai-rfc5336bis-07.txt
Reviewer: Dave Crocker, Brandenburg InternetWorking
Review Date: 2010-12-22


BACKGROUND:

      The document is a specification for an email transport-time option that is 
described in its Abstract as declaring support for "internationalized email 
addresses or header information" and in the Introduction as being "to support an 
internationalized email address".  The extension specifies changes both in the 
transfer protocol and in the message being transferred, including its Body. 
Legacy Internet Mail only supports classic "network ASCII" for data 
representation and for data transfer encoding.

      The charter for the current work cites previous work that was issued as 
Experimental, and summarizes it as having been "... based on the use of an
SMTP extension to enable the use of UTF-8 in envelope address
local-parts, optionally in address domain-parts, and in mail
headers."  This text appears also to serve as the statement of scope for the 
current working group.

      With respect to the mail header, the scope is specified as covering 
<address> and <encoded-word> constructs.  An <encoded-word> is a means of 
mapping Unicode strings onto classic, network ASCII, although the working group 
focus was on binary, UTF-8 support.


RECOMMENDATIONS:

This work represents an extremely important enhancement to Internet mail. It has 
been clear for twenty years that Internet applications need to be able to 
present data in a form that is natural to the user.

The current work benefits from providing a Framework document and from 
distinguishing changes to the email header from changes to the SMTP protocol.

    0.  The documents do raise distinctions between ASCII and Unicode, versus 
between ASCII and UTF-8. However they do not apply them rigorously in the 
documents.  I suggest that the term "internationalization" be used only during 
introductory discussion and never as part of normative text.  ASCII vs. Unicode 
is a distinction between the underlying range of data being represented.  ASCII 
vs. UTF-8 is a distinction between encoding environments. Text should explicitly 
indicate whether it means data representation versus data encoding and it should 
use Unicode for the former and UTF-8 for the latter.

    1.  The Framework document is normative and needs to be completed along with 
the other two specifications, since they quite reasonably state a normative 
dependence on the Framework document.

    2.  The SMTP extension draft needs to focus on SMTP and the direct support 
of UTF-8 in the message header.  It needs to move all discussion and 
specification of Unicode support in the Header to the Header draft.

    3.  There probably needs to be definition of MIME message/uni-rfc822, 
specifying Unicode support within a message contained in MIME.  I'm less clear 
whether there needs to be a MIME Content-Transfer-Encoding form specific to UTF-8.

    4.  The SMTP extension needs to have the client use an explicit signal when 
it is sending a message encoded in UTF-8. This is easiest as a parameter to the 
MAIL command.  The current specification creates a more complex and almost 
heuristic model for distinguishing ASCII from UTF-8 use.

    5.  The SMTP extension needs to remove all restrictions it imposes on MIME 
content-type.  A major reason that MIME was successful was that it was 
transparent to the transfer infrastructure.  The current SMTP extension 
specification changes this model, which actually increases the barrier to 
adoption of Unicode in email.  It needs to be easy for two MUAs that support 
Unicode in the email header to exchange mail even when the infrastructure does 
not support UTF-8.



SUMMARY COMMENTS:

The document conforms to the conventions for defining SMTP options.

There are a number of significant issues with the specification. These are 
covered in detail, below, and are summarized here:

      *  Framework -- The Framework document[1] is (correctly) referenced as 
required reading.  It supplies essential terminology and architecture for this 
specification.  In fact it is a specification, complete with formally normative 
vocabulary.  This means that it must be a normative reference by rfc5336bis.  It 
therefore also means that the Framework document needs to be completed before 
the current specification can be standardized.

      *  Scope -- the specification appears to go significantly beyond the scope 
of the working group's charter, including revisions to basic SMTP that have no 
obvious requirement for support of Unicode during transport. In fact, at least 
one change is likely to /restrict/ system-wide adoption, rather than encourage 
it!  In particular, the specification restricts the conditions under which some 
MIME content is allowed to be sent. (See next bullet.)

      *  Infrastructure Requirement -- Unless this option is in force, carrying 
internationalized email in a MIME part is prohibited.  This is out of scope for 
the working group and it is a counter-productive rule.  Imagine if the same type 
of rule had been specified when MIME was created, saying that MIME could only be 
sent when an "attachments-supported" option were in force. This would have 
prevented the early adoption of MIME use by individual MUAs until the entire 
infrastructure supported MIME.   (As an example of the very high barrier this 
raises, note the difference in real-world support and use of MDN versus DSN.) 
While it is reasonable for the working group to define a new MIME content-type 
that modifies message/rfc822 to support internationalized addresses, it is not 
appropriate for the working group to modify the SMTP transfer model to constrain 
what types of message content can be sent.

      *  Beyond <encoded-word> -- The work, here, appears to have two goals. One 
is to add support for Unicode in a <local-part>; that is, support for 
internationalized addresses.  However note that <encoded-word> and <A-label> 
already accomplish this in a way that is transparent to the existing email 
infrastructure; only the end-systems need to understand it.  The second goal is 
to support Unicode in the more "native" form of UTF-8. (The quotation marks are 
because UTF-8 is not native Unicode, either; it is a highly encoded form of 
Unicode...)  This creates some confusion in the specification.  Given that the 
option for SMTP is "UTF8SMTPbis", then the binary encoding goal seems to 
dominate the work. This is a certainly a reasonable goal, but it will help the 
clarity of the specification to make these two different goals more clear within 
the document and to apply them more carefully.

      *  Terminology confusion -- The Framework document carefully distinguishes 
between "ASCII addresses and non-ASCII addresses.  It equates 
"internationalized" first with "non-ASCII", but then with "UTF8SMTPbis". A core 
problem is that ASCII is part of the actual internationalized set of Unicode 
characters. So, to say that "international" characters are non-ASCII is exclude 
part of Unicode from the term "international".  In addition, equating the term 
"internationalization" with UTF-8 encourages confusion between underlying or 
"native" data -- that is, Unicode -- with the way it is represented over the 
wire.  UTF-8 is merely one means of over-the-wire representation.  So, for 
example, <A-label> and <encoded-word> are two other means of encoding.  It's 
clear that this core distinction really is understood by the authors of this and 
the Framework documents.  However the vocabulary choices and their usage create 
a problem in the details of the specification.  "Internationalization" should 
mean Unicode, not a particular binary representation of it within 8-bit chunks. 
  The problem, here, is in using the term "internationalization" to refer to a 
subset of Unicode, that is, the subset that is not ASCII.  I strongly suggest 
saying "Unicode" when intending to refer to the richer set of characters that 
are the goal of this work, and "UTF-8" when referring to the particular binary 
encoding of Unicode that is the focus of the SMTP extension work.

      *  Non-UTF-8 Unicode support -- Can a message support Unicode without 
UTF-8?  The existence of <encoded-word> and <A-label> constructs makes clear 
that the answer is yes.  Hence, it should be possible to support Unicode 
messages, without this SMTP extension.  Perhaps this is out of scope for this 
SMTP extension and perhaps it is handled by the Email Header draft, but I think 
it worth having this document cite this alternative mode, if only to a) make 
clear that the alternative exists, and b) make more clear what the specific and 
strong benefit of this extension is.

      *  Partial enforcement --  Since ASCII is a subset of Unicode, having this 
extension be in force means that /everything/ is Unicode AND, apparently, is 
encoded in UTF-8.  If the environment created by this extension supports UTF-8, 
then it supports UTF-8, meaning both ASCII and non-ASCII.  Defining rules that 
depend on having this extension be in force but then still distinguish between 
ASCII and non-ASCII does not seem to make sense.

      *  Complexity and Heuristics -- In a number of places, the specification 
defines highly contingent action, where one side can use UTF-8 only if the other 
side has done so. This makes the enhancement much more complicated than 
necessary or appropriate. The enhancement needs to work with an all-or-nothing 
model in which UTF-8 is in force or it is not.  And, yes, this appears to be a 
major change in the model of this specification.  My understanding is that these 
issues were discussed in the working group, but I do not understand why  and 
nearly-heuristic approach was preferred.  Instead I recommend the client to 
signal explicitly when UTF8-encoded addresses are present, such as a 
<Mail-parameters> option (<esmtp-param>) to the MAIL command.

      *  5321 vs. 5322 -- The specification seems to confuse -- or at least to 
mix -- some rules from RFC 5321 versus some from RFC 5322.  <mailbox> is the 
major example.

      *  Since an RFC 5322 message can and often does exist outside of the SMTP 
environment, any changes to the RFC 5322 specification should be in a document 
that is separate from this extension specification.  This specification can then 
cite it.  I suggest moving all RFC 5322 changes to the Header document[2]  and 
merely citing it here.

      *  UTF8SMTPbis -- The draft uses the string "UTF8SMTPbis" when referencing 
the SMTP option.  The Framework document explains the choice, but IANA 
Considerations in this document needs to provide explicit handling instructions 
for it, since this is certain NOT to be the actual string that is used.

      *  Redundant Specification - in a number of places, normative language 
from other specifications is repeated.  This invites divergent specification and 
is generally out of scope for the current work. To the extent that the current 
specification needs to refer to normative parts of other specifications, it 
should do only that:  cite it; do not repeat it.  For example to highlight an 
important normative item from another specification, the current specification 
might describe the "topic" the external language covers, without saying what it 
says.  In other cases, the key point is that the current specification is not 
changing requirements from another specification; that is what should be said, 
rather than repeating what that requirement is. In general, these references to 
external, normative behaviors should be reviewed for relevance to the current 
work.  How is internationalized addresses relevant to the normative detail?

      *  Rule Meta-Naming -- The 'u' preface for revised rules is a reasonable 
idea, but appears to be problematic.  These rules replace existing rules in 
other specifications. There needs to be an explicit and decision about the 
handling of this, and it needs to be applied consistently, either directly 
replacing the rules or else re-naming them consistently, in a fashion that can 
be parsed (similar to the naming template that was done with RFC5322 obs-* 
rules.)  If an initial string is to be used, I suggest UTF-8-* rather than u*, 
in order to make it possible to parse this meta-label more reliably.





DETAILED COMMENTS:

{ I have included text from the draft that provides context, but have skipped 
sequences of text from the draft that do not. }


> 1.  Introduction
>
> The Simple Mail Transfer Protocol [RFC5321] provides a negotiation mechanism
> about service extension with which clients can discover

      with which ->  by which { or } through which

{ Either of these would be my stylistic preference. }


> server capabilities and make decisions for further processing.  This
> document use this mechanism to support an internationalized email

      use -> uses


> address.  An extended overview of the extension model for

{ The option enables use of UTF-8 in the email header, beyond just addresses,
therefore: }

      to support...address
      ->
      to support internationalized email addresses and internationalized
characters for the email header.


> internationalized addresses and headers appears in [RFC4952bis],

      headers -> the email header


> referred to as "the framework document" or just as "framework" elsewhere in
> this specification.  This document specifies an SMTP extension to permit
> internationalized email addresses in envelopes,

      envelopes -> the SMTP envelope

{ 1.  Although 'envelope' is almost certainly unambiguous, its use over the
years has been confusing, so it is worth being particularly clear in a
specification.

   2.  The use of plurals in a specification can be confusing.  I suggest using
the singular form wherever it can work, so that remaining use of plurals becomes
more precise. Also note the particular confusion with the word "headers" in 
email.  Since a single email has only a single header, the use of plural means 
each header from a set of messages... }


> and UNICODE characters (encoded in UTF-8) [RFC3629] in headers.

      headers -> the header


> 1.1.  Role of This Specification
>
> The framework document specifies the requirements for, and describes
> components of, full internationalization of the electronic mail.  A thorough
> understanding of the information in that document and in the base Internet
> email specifications [RFC5321] [RFC5322] is necessary to understand and
> implement this specification.

{ This means that the Framework document is normative; it needs to be cited as 
such.  Since understanding it is a pre-condition of reading the specification, 
it also requires that the Framework document be completed before this document. }


> This document specifies an element of the email internationalization work,
> specifically the definition of an SMTP extension for internationalized email
> address transport delivery.

{ Since it also declares support for internationalization in the message header,
it covers more than delivery.  In effect, it is a transport-time flag for
declaring internationalization of the entire email "environment".  I'm not quite
sure what exact language change to suggest, however.

I also note that it does more than declare support for internationalized 
addresses:  It declares support for encoding them into UTF-8. This is a 
significant, additional requirement and should be stated explicitly. }


> 1.2.  Terminology
...
> This specification defines only those Augmented BNF (ABNF) [RFC5234] syntax
> rules that are different from those of the base email specifications and,
> where the earlier rules are upgraded or extended, gives them new names.
> When the new rule is a small modification to

{ The wording in the first part of this paragraph seems awkward.  I think the
following is clearer: }

      This specification uses Augmented BNF (ABNF) rules [RFC5234], with some
modifications.  The modified rules are defined here and the rest are simply 
imported from [RFC5234].  New names are used, for rules that are upgraded
or enhanced.  When the ...


> the older one, it is typically given a name starting with "u".  Rules

{ The use of "typically" makes the description of this convention problematic. 
It needs to be always true or else not mentioned.  Or perhaps alternative 
phrasing:  }

      When a new...with "u".
      ->
      When a new rule has a name starting with "u", it is a small modification to
an older rule.

{ This asserts what is true and ignores what is not true, such as rules that 
qualify for the "u" but did not get it...

However I now believe that it is important to have a consistent meta-rule for 
naming the revised rules and that it be applied consistently to all rules that 
qualify for it.  Further, the naming convention should be easily parseable, and 
therefore more like what is used in RFC5322, to cover "obsolete" rules.  I 
suggest that new rules, here, be named UTF-8-*.  To repeat:  I believe that 
/all/ rules that resolve to UTF-8 must be renamed, so that implementers know 
what they need to change. }


> that are undefined here may be found in the base email specifications

{ The specifications should be cited here, explicitly, even though they have 
been cited elsewhere.  It is important to leave no ambiguity for the reader. }

      here may be -> here can be

{"may" is a reserved word for normative specification. Normative meaning is not 
based on the use of capitalization. }


> 3.2.  The UTF8SMTPbis Extension

> An SMTP server that announces this extension MUST be prepared to
>    accept a UTF-8 string [RFC3629] in any position in which RFC 5321
>    specifies that a mailbox can appear.  That string MUST be parsed only
>    as specified in [RFC5321], i.e., by separating the mailbox into

{ The 'i.e.' clause is an example of repeating normative text from another 
specification.

The current document is not empowered to give directives about basic SMTP 
parsing, nor is there any internationalization requirement that it do so.  At 
most, it should say that the changes specified in this document do not change 
any other aspect of SMTP processing. }


>         Once isolated by this parsing process, the local part MUST be
>    treated as opaque unless the SMTP server is the final delivery Mail
>    Transfer Agent (MTA).

{ The statement about handling of <local-part> is redundant with the base RFC 
5321 specification.  The statement, here, should be that the handling of 
<local-part> is unchanged from the base specification.  Again, it should not 
repeat the normative language, unless it is changing it.  If it is changing it, 
the change needs to be essential for support of internationalized addresses. }


>                        Any domain names that are to be
>    compared to local strings SHOULD be checked for validity and then
>    MUST be compared as specified in section 3 of [RFC5891].

{ Dictating use of RFC5891 is within scope.  Dictating validation seems not to 
be.  So...}

    -> Any domain name that is to be compared to a local string MUST use Section 
3 of [RFC5891] as the basis for comparison.


> An SMTP client that receives the UTF8SMTPbis extension keyword in response
> to the EHLO command MAY transmit mailbox names within SMTP commands as
> internationalized strings in UTF-8 form.  It MAY send a UTF-8 header
> [RFC5335bis] (which may also include mailbox names in UTF-8).  It MAY
> transmit the domain parts of mailbox names within SMTP commands or the
> message header as A-labels or U-labels

{ I believe that the use of "MAY" is not correct. This would mean that the 
receiver needs a means of distinguishing whether the data are UTF-8 or not. This 
would border on requiring support of a heuristic, but it certainly adds a 
significant processing overhead and additional software complexity.

The only alternative is to specify use of a MAIL command <Mail-parameters> 
option that declares that the message supports internationalized addresses. 
Given the approach in this specification, I believe the intent is also to have 
it mean that UTF-8 encoding is supported.

The core issue here is specifying an option which declares a message to have an 
EAI context for all of the message.  So the processing context is fully 
EAI/UTF-8 or it is legacy net-ASCII.  This is considerably simpler to specify 
and to process, than would be requiring parsing the incoming string and looking 
for non-ASCII UTF-8.

Hence: }

      An SMTP client...U-labels
      ->
      An SMTP client that receives the UTF8SMTPbis extension keyword, in response to
the EHLO command, will transmit <local-part> within SMTP commands as
internationalized strings in UTF-8 form.  It will send the email header in UTF-8
[RFC5335bis] (which can also include <mailbox> names in UTF-8.)  It also will
transmit the domain parts of mailbox names within SMTP commands or the message
header as A-labels or U-labels

{ Note - the term "mailbox names" is not defined here or in RFC 5321.  In RFC 
5321 it appears to be used to mean local-part; however becasue its precise 
meaning is unclear, I strongly urge NOT using it here at all. Instead I suggest 
using whatever ABNF rulename is appropriate.  This guarantees clarity.

   Note:  I changed 'may' to 'can'. Also, when an ABNF rule is being cited within
prose text, such as for <mailbox>, it should be distinguished so that the reader
knows it is a formal term.  I have used <> to bracket the term mailbox. }


> All labels in domain parts of mailbox names which are IDN
>    forms of A-labels or U-labels MUST be valid.

{ This is strange.  Either it is repeating a normative requirement from SMTP or 
it is expanding SMTP to require special validation for IDN forms of domain names 
that is not present for ASCII forms.  Neither interpretation seems like the 
right thing to be doing here. Such a modification to SMTP seems out of scope.

Also, the term <mailbox> has different semantic definitions in RFC 5321 and RFC 
5322.  A strict reading of the differences could be a problem.  A loose reading 
would note that both definitions reduce to include <local-part> and <domain> 
components that are common to both specifications.  I encourage you to review 
this issue carefully and put a note at the beginning of the document stating 
explicitly how you have chosen to handle it.

My best recommendation is that this document should only refer to RFC 5321 ABNF 
and that it should move /all/ RFC 5322 modifications or enhancements to the EAI 
Header document (draft-ietf-eai-rfc5335bis). }


>                    When a Mail User
>    Agent(MUA) submits a message to a Message Submission Server
>    ("MSA")[RFC4409], it is the responsibility of the MSA to ensure that
>    all domain labels are valid.

{ Given that this specification creates broad systemic effects and given that it 
needs to refer to components other than an SMTP client or server, it should cite 
RFC 5598, to give the reader an integrated view of the email service. I'll note 
that citing 5598 has become common for email specifications; so this is not a 
controversial suggestion. }


>The presence of the UTF8SMTPbis
>    extension does not change the requirement of RFC 5321 that servers
>    relaying mail MUST NOT attempt to parse, evaluate, or transform the
>    local part in any way.

{ This sentence replicates normative language from a different specification. 
This is a very bad thing to do, in case the original specification changes its 
language.  I suggest: }

      The presence of...in any way
      ->
      The presence of the UTF8SMTPbis extension does not change RFC 5321 server
relaying behaviors.

{ this retains some text as a flag to the reader, but does not provide the
specific semantics, which might change in the original specification. }


> If the UTF8SMTPbis SMTP extension is not offered by the server, the SMTP
> client MUST NOT transmit an internationalized address and MUST NOT transmit

{ This appears to prohibit the sending of internationalized addresses that are 
encoded in ASCII, rather than in UTF-8.  The purpose of ASCII encoding is to 
eliminate the need for infrastructure support for Unicode characters.  However 
the language here appears to be imposing the barrier of infrastructure support. 
  If the concern is sending UTF-8, then that's the language that needs to be 
used.  Messages with ASCII-encoding of internationalized addresses need to be 
permitted to be sent, without first requiring infrastructure support. }


> a mail message containing internationalized mail headers as described in
> [RFC5335bis] at any level within its MIME structure [RFC2045] and [RFC2047].

{ The prohibition of internationalized mail headers within a MIME structure -- 
"at any level within its MIME structure"  -- is out of scope for the working 
group and is a /major/ change to SMTP.  It is also a really terrible rule!  In 
terms of protocol modeling, it would have been like saying that no one could use 
MIME until the infrastructure supported it!

It is one thing to give the reader a reminder that UTF-8 is illegal in the email 
header of a message and quite another to attempt to prohibit it in attachments. 
  Attachments already carry all sorts of data.  UTF-8 is merely one more type. I 
believe that there currently no SMTP constraints on the carriage of MIME; I also 
believe it essential that there not be, since such constraints impede adoption. 
  (That is why MDN support is good and DNS support is poor.)

Also note that MIME objects exist outside of email transport and that directives 
about legal or illegal MIME ought to be separated from SMTP...

Also, perhaps mail with UTF-8 in the header needs a different
MIME type, such as text/UTF-8-message?... }


> 2.  It may either reject the message during the SMTP transaction or accept

      may -> MAY

{ However, I note that this concerns a SMTP client, not a server.

> 3.3.  Extended Mailbox Address Syntax
>
> RFC 5321, Section 4.1.2, defines the syntax of a mailbox entirely in terms
> of ASCII characters, using the production for a mailbox and those productions

      mailbox ->  <mailbox>


> on which it depends.
>
> The key changes made by this specification are, informally, to

{ I don't understand what it means to change RFC 5321 "informally".  I think it 
is intended to mean that the list is not guaranteed to be complete.  If so, I 
suggest wording such as: }

    -> The key changes made by this specification include:


> o  Change the definition of "Domain" to permit either the RFC 5321 definition
> above or a UTF-8 string representing a DNS label that is conformant with
> IDNA definitions [RFC5890].

{ "either"???  That sounds completely ambiguous.  }


> o  Change the definition of "Local-part" to permit either the definition
> above or a UTF-8 string.  That string MUST NOT contain any of the ASCII
> characters (either graphics or controls) that are not permitted in "atext";
> it is otherwise unrestricted.

{ same concern as above.  }


> According to the description above, the syntax of an internationalized email
> mailbox name (address) is defined in ABNF [RFC5234] as follows.
>
> uMailbox = uLocal-part "@" ( uDomain / address-literal ) ; Replace Mailbox
> in RFC 5321, Section 4.1.2

{  This implies that the option applies only to RFC5321 and not to RFC5322, but 
the later rules make clear that 5322 is also supposed to be covered.

To repeat:  I recommend that all RFC 5322 enhancements and ABNF should be moved 
to the EAI Header draft and that the SMTP extension should merely cite that draft.

Note that RFC5322 is for an object that can and does exist outside of SMTP. 
Enhancements to RFC5322 well might need to apply when there is no SMTP, or at 
least long after it is relevant.  }


> UTF-8-non-ASCII = UTF-8-2 / UTF-8-3 / UTF-8-4
>
> UTF-8-2 =  <See Section 4 of RFC 3629>
>
> UTF-8-3 =  <See Section 4 of RFC 3629>
>
> UTF-8-4 =  <See Section 4 of RFC 3629>

{ These rules are also defined in RFC5335 and draft-ietf-eai-rfc5335bis.  Citing 
RFC 3629 is therefore confusing, and possibly can create specification 
divergence. I suspect that RFC 3629 is not the correct reference, here, since it 
would imply that RFC5335bis' definitions should be ignored... If RFC 3629 /is/ 
the correct reference, then I cannot guess why and I cannot guess why it will 
not create problems with divergent specification. }


> The value of "uDomain" SHOULD be verified by IDNA definitions [RFC5890].  If

{  I do not understand what "SHOULD be verified by IDNA definitions" means.

If it means that the uDomain "SHOULDbe verified" then it is out of scope for 
this specification. There is no reason that SMTP rules concerning verification 
of Unicode-based domains needs to be different from ASCII-based domains.  In 
particular, verification of a Domain name is specified elsewhere.

If it means that the uDomain "should be IDNA complaint" then I do not understand 
having the exceptions that a "should" allows.  For this EAI work, I would think 
that when UTF8SMTPbis is in force, then IDNA compliance needs to be a MUST.  }


> that verification fails, the email address with that uDomain MUST NOT be
> regarded as a valid email address.

    MUST NOT be regarded as a valid
    ->
    MUST BE regarded as an invalid

{ Stating this as an affirmative is stronger.  }


> 3.4.  UTF-8 addresses and Response Codes
>
> An internationalized message MUST NOT be sent to an SMTP server that does
> not support UTF8SMTPbis.  Such a message should be rejected by a server if
> it lacks the support of UTF8SMTPbis.

{ The first sentence is restating a normative rule already existing in other 
specifications.  The second sentence is simply confusing.  If the first sentence 
is applied, then the server does not support UTF8SMTPbis.  I suspect what is 
actually meant in the second sentence is something like: }

    If a server receives email encoded in UTF-8, when UTF8SMTPbis is not in 
force, then the server SHOULD reject the message.

{ However, this means that the current specification is trying to dictate 
behavior in the legacy environment, and it cannot do that.

I believe the rule that is actually intended is: }

      An SMTP client MUST only send a message containing UTF-8 to a server that 
supports UTF8SMTPbis.  If the server does not support this option, then the 
client MUST  terminate the delivery attempt with a permanent error, or else find 
another path.


> The three-digit reply codes used in this section are consistent with their
> meanings as defined in RFC 5321.

{ "consistent with"?  I hope you actually mean that they are "the same"! Either 
they are the same as in 5321 or they are changed. }


> When messages are rejected because the RCPT command requires an ASCII
> address, the response code 553 is used with the meaning "mailbox name not

    used -> returned,


> allowed".  When messages are rejected for other reasons, such as the MAIL
> command requiring an ASCII address, the response code 550 is used with the

{  RFC 5321 uses the terms "completion code", "reply code" and "response code" 
interchangeably.  That probably makes the use of "response code" here legal. 
However the formal ABNF in RFC 5321 defines the rules:

      <reply-line>

      <reply-code>

For clarity and precision, I strongly recommend using only those terms when 
referring to replies/responses/completion.  Since the current document is 
cross-referencing a formal construct from another document, it will help the 
reader to resolve the reference by using only the most formal term.}


> meaning "mailbox unavailable".  When the server supports enhanced mail

{ "unavailable" sounds like a temporary error.  More generally, the text here 
says that this set of rejections is "for other reasons".  However it does not 
really mean all other rejections.  Since there are many, different "other 
reasons" and some of them are temporary errors, this text needs to be revised. 
It needs to make clear the difference in handling temporary versus permanent 
errors and in particular it needs to define both types in terms that are 
specific to this enhancement, in order to distinguish them from all other SMTP 
reply-codes.}


> system status codes [RFC3463], response code "X.6.7" [RFC5248] is used,
> meaning that "UTF-8 addresses not permitted for that sender/recipient".
>
> If the response code is issued after the final "." of the DATA command, the
> response code "554" is used with the meaning "Transaction failed".  When the
> server supports enhanced mail system status codes [RFC3463], response code
> "X.6.9" [RFC5248] is used, meaning that "UTF-8 header message can not be
> transferred to one or more recipient so the message must be rejected".
>
> 3.5.  Body Parts and SMTP Extensions
>
> There is no ESMTP parameter to assert that a message is an internationalized
> message.  An SMTP server that requires accurate knowledge of whether a
> message is internationalized is required to parse all message header fields
> and MIME header fields [RFC2045] and [RFC2047] in the message body.

{ This is confusing.  There is a rule against sending an internationalized 
message, unless UTF8SMTPbis is in force, but there is no requirement that the 
server be told explicitly when a message is internationalized???

As noted above, the simple and appropriate action to take is, instead, to have 
the MAIL command contain an <esmtp-param> that declares that the message 
supports internationalized addresses.

My understanding is that this was discussed by the working group.  Why was this 
not the choice of the working group? }


> While this specification requires that servers support the 8BITMIME
> extension [RFC1652] to ensure that servers have adequate handling capability
> for 8-bit data and to avoid a number of complex encoding problems, the use
> of internationalized addresses obviously does not require non-ASCII body
> parts in the MIME message [RFC2045] and [RFC2047].  The UTF8SMTPbis extension
> MAY be used with the BODY=8BITMIME parameter if that is appropriate given
> the body content or, with the BODY=BINARYMIME parameter, if the server
> advertises BINARYMIME [RFC3030] and that is appropriate.

{ This last sentence is either saying too much or too little.  In general, this 
option does not specify MIME or Body details.  Nor does there appear to be any 
reason that it should.  As a consequence, I believe that any text about MIME or 
the Body needs to be non-normative.  It's fine to provide a small amount of 
pedagogy about the carriage of MIME, but including normative language about it 
here merely invites confusion and possibly even divergent specifications.

Also note that the reference to "BODY=" values does not explicitly cite RFC 
1652.  So the references require the reader to already know what is being 
referred to.  This should be fixed.}


> Assuming that the server advertises UTF8SMTPbis and 8BITMIME, and
> receives at least one non-ASCII address, the precise interpretation of
> "BODY=8BITMIME", and "BODY=BINARYMIME" in the MAIL command is: 1.  If a

{  My reading of the current specification is that it does not change the 
handling of email Body or MIME mechanisms.  Hence, DO NOT REPEAT NORMATIVE 
LANGUAGE FROM OTHER SPECIFICATIONS.  It invites divergent specification.  If the 
current specification is actually re-defining the semantics or syntax of RFC 
1652, then it needs to say so. However my reading of the current specification 
is that it is quite independent of email Body issues. }


> BODY=8BITMIME parameter is present, the header contains UTF-8 characters,
> and some or all of the body parts contain 8-bit line-oriented data. 2.  If a
> BODY=BINARYMIME parameter is present, the header contains UTF-8 characters,
> and some or all body parts contain binary data without restriction as to
> line lengths or delimiters.

{ Also note that the items numbered 1 and 2 are not complete sentences.  They 
only contain the conditional clause.  "If...." is missing the "then".  Hence, no 
actual "interpretation" is provided, contrary to the promise that is given.}


> 3.6.  Additional ESMTP Changes and Clarifications
>
> The information carried in the mail transport process involves addresses
> ("mailboxes") and domain names in various contexts in addition to the MAIL
> and RCPT commands and extended alternatives to them.  In general, the rule
> is that, when RFC 5321 specifies a mailbox, this specification expects UTF-8

    this specification -> this SMTP extension

{ "this specification could be misread as meaning RFC 5321, since it is the most 
recent specification references...}


> to be used for the entire string; when RFC 5321 specifies a domain name, the
> name SHOULD be in the form of A-label if its raw form is non-ASCII.

{ the more serious problem is the continuing confusion about ASCII vs. non-ASCII 
and UTF-8 vs non-UTF-8.  Again, I believe that having this extension be in force 
means that this data are /always/ UTF-8.  Hence, it is not a matter of 
"expects".  It is a matter of "requires".}


> The following subsections list and discuss all of the relevant cases.
>
> 3.6.1.  The Initial SMTP Exchange
>
> When an SMTP connection is opened, the server normally sends a "greeting"
> response consisting of the 220 response code and some information.  The

{ "normally"?  what are the exceptions?}


> client then sends the EHLO command.  Since the client cannot know whether
> the server supports UTF8SMTPbis until after it receives the response from
> EHLO, the client must send only ASCII (LDH label [RFC5890] or A-label)
> domains in the EHLO command and that, if the server provides domain names in
> the EHLO response, they must be in the form of LDH labels or A-labels.
>
> 3.6.2.  Mail eXchangers
>
> Organizations often authorize multiple servers to accept mail addressed to
> them.  For example, the organization may itself operate more than one

      may -> might

or

      may -> can

{ normative vocabulary can only be used for normative statements.  please do a 
global search and replace, where normative vocabulary are used in non-normative 
sentences.}


> server, and may also or instead have an agreement with other organizations to
> accept mail as a backup.  Authorized servers are generally listed in MX
> records as described in RFC 5321.  When more than one server accepts mail for
> the domain-part of a mailbox, it is strongly advised that either all or none
> of them support the UTF8SMTPbis extension.

{ This last sentence sounds normative and I believe it should be. So... }

      it is strongly advised that either all or none of them support
      ->
      all or none of them SHOULD support


> Otherwise, surprising rejections
> can happen during temporary failures, which users might perceive as a serious

{ Probably not just temporary failures.  Having services that are meant to be 
redundant with each other actually provide different semantic behavior is just 
plain dangerous. It is essentially guaranteed that they will cause problems.}


> 3.6.3.  Trace Information
>
>    When an SMTP server receives a message for delivery or further
>    processing, RFC 5321 requires that it MUST insert trace ("time stamp"
>    or "Received") information at the beginning of the message content.
>    For the trace information, this memo updates the time stamp line and
>    the return path line [RFC5321] formally defined as follows:

{ Again, don't repeat normative language.  I suggest simply deleting the first 
sentence. }


> [RFC5321] formally defined as follows:
>
> uReturn-path-line = "Return-Path:" FWS uReverse-path <CRLF> ; Replaces
> Return-path-line in Section 4.4 of RFC 5321

{ On reflection, I think the "u" rule-naming convention is problematic.  If the 
current specification is replacing a rule previously defined elsewhere, it needs 
to use the same rulename.  This is simpler and clearer.  Otherwise, all future 
references to the rule need to use the new name.

Rather, where the draft is replacing a rule from another specification, the rule 
definition includes a comment that cites where the original rule is from; that 
should be sufficient. }


> uReverse-path = uPath / "<>" ; Replace Reverse-path in RFC 5321, section
> 4.1.2
>
> uPath = "<" [ A-d-l ":" ] uMailbox ">" ; Replace Path in RFC 5321, section
> 4.1.2 ; uMailbox is defined in section 3.3 of this document
>
> A-d-l = <See section 4.1.2 of RFC 5321>

{ Small improvement:  Where a rule is defined elsewhere, I suggest having the 
descriptive text in the rule say <Defined in...> rather than <See...>. }


> 3.6.4.  UTF-8 Strings in Replies
>
> 3.6.4.1.  RCPT Commands
>
> If an SMTP client follows this specification and sends any RCPT commands
> containing non-ASCII addresses, the SMTP server is permitted to use UTF-8
> characters in the email address associated with 251 and 551 response codes,
> and the client MUST be able to accept and process them.

{ I assume that "follows this specification" means that the UTF8SMTPbis option 
is in force.  If so, then say that, because it is simpler and more precise. But, 
then, it does require declaring /use/ of the option...

However the meaning of the text here is odd.  It implies that the server is not 
permitted to use UTF-8 unless it has already received UTF-8, even though the 
option is in force.  I suspect that that is not the specification that is 
intended.  Or, at least, I hope it is not.

If it actually /is/ what is intended, it means that the enhanced environment is 
going to have all sorts of /additional/ conditional code, to check on whether a 
specific context in the state machine is allowed to act in one way or another.

The simple and more reasonable model is that when the extension is in force, 
UTF-8 is allowed.  Period.  So... }

      If an SMTP session is using this extension, then the server is permitted 
to use UTF-8 characters in the email address associated with 251 and 551 
reply-codes, and the client MUST be able to accept and process them.

{ I chose "SMTP session" to avoid the more complex discussion of the 
client/server 'negotiation' about using the extension.  So, here it is either in 
force or it isn't and if it is in force, it is for client AND server. }


>      If a given RCPT
> command does not include a non-ASCII envelope address, the server MUST NOT
> return a 251 or 551 response containing a non-ASCII mailbox.  Instead, it
> MUST transform such responses into 250 or 550 responses that do not contain
> non-ASCII addresses.

{  See above.  I very strongly disagree with having the complexity this kind of 
conditional requirement creates.  Or else there is a basic issue involved here 
that the document does not discuss but needs to, to justify the complexity. }


> 3.6.4.2.  VRFY and EXPN Commands and the UTF-8REPLY Parameter
>
> If the VRFY and EXPN commands are transmitted with the optional parameter
> "UTF-8REPLY", it indicates the client can accept UTF-8 strings in replies to
> those commands.  This allows the server to use UTF-8 strings in mailbox

{ Is this extension trying to to say that a client might support UTF8SMTPbis but 
not be able to access UTF-8 replies???  Under what circumstance is this 
reasonable? }


> names and full names that occur in replies without concern that the client
> might be confused by them.  An SMTP client that conforms to this

{ What does it mean to be "confused" by them?  Because the extension is in 
force, we know that the client supports UTF-8. }


> specification MUST accept and correctly process replies from the VRFY and
> EXPN commands that contain UTF-8 strings.  However, the SMTP server MUST NOT
> use UTF-8 strings in replies if the SMTP client does not specifically allow
> such replies by transmitting this parameter.  Most replies do not require

{ Why is this constraint required? }


> that a mailbox name be included in the returned text, and therefore UTF-8 is
> not needed in them. Some replies, notably those resulting from successful
> execution of the VRFY and EXPN commands, do include the mailbox, making the
> provisions of this section important.
>
> VERIFY (VRFY) and EXPAND (EXPN) command syntaxes are changed to:
>
> vrfy = "VRFY" SP ( uLocal-part / uMailbox ) [ SP "UTF-8REPLY" ] CRLF ;
> uLocal-part and uMailbox are defined in ; Section 3.3 of this document.
>
> expn = "EXPN" SP ( uLocal-part / uMailbox ) [ SP "UTF-8REPLY" ] CRLF ;
> uLocal-part and uMailbox are defined in ; Section 3.3 of this document.

{ Note that these rules only specify UTF-8 support, without any contingency on 
use.  That is, they do not cover the variability that is described in the prose, 
and that disparity from the prose specification is an example of the complexity 
created by having the use of UTF-8 in replies be so contingent on the actual 
data sent by the client. }


> If a normal success response (i.e., 250) is returned, the response MAY
> include the full name of the user and MUST include the mailbox of the user.
> It MUST be in either of the following forms:
>
> User Name <uMailbox> ; uMailbox is defined in Section 3.3 of this document.
> ; User Name can contain non-ASCII characters.
>
> uMailbox ; uMailbox is defined in Section 3.3 of this document.

{ "User Name" needs to be specified as an abnf rule.  It isn't.

This appears to be intended to invoke the RFC 5322 definition of <mailbox> 
rather than the RFC 5321 definition.  In any event, it is creating a semantic 
change to the response, from RFC 5321, beyond merely allowing UTF-8 characters. }


> If the SMTP reply requires UTF-8 strings, but UTF-8 is not allowed in the
> reply, and the server supports enhanced mail system status codes [RFC3463],
> the enhanced response code is "X.6.8" [RFC5248], meaning "A reply containing
> a UTF-8 string is required to show the mailbox name, but that form of
> response is not permitted by the client".

{ "the mailbox name"?  which string is that? }


> If the SMTP client does not support the UTF8SMTPbisbis extension, but receives
> a UTF-8 string in a reply, it may not be able to properly report the reply

If a UTF-8 string is sent in a reply, when this extension is not in force, then 
it is a protocol violation.  Period.


> [RFC4952bis] Klensin, J. and Y. Ko, "Overview and Framework for
> Internationalized Email", RFC 4952, July 2010.

      RFC4951
      ->
      draft-ietf-eai-frmwrk-4952bis

{ It is not now, and never will be, RFC 4952;  And RFC4952bis is currently dated 
September 2010. When it is an RFC, it will have a new number.

Also, Internet Drafts, including -bis documents, usually follow a different 
citation labeling convention, such as I-D.rfc4952bis, to make clear that they 
are an I-D and not an RFC. }




[1]  draft-ietf-eai-frmwrk-4952bis-10

[2]  draft-ietf-eai-rfc5335bis-03


d/

-- 

   Dave Crocker
   Brandenburg InternetWorking
   bbiw.net