[apps-discuss] (private) draft review of: draft-ietf-eai-rfc5336bis-07.txt (v3)
Dave CROCKER <dhc@dcrocker.net> Wed, 22 December 2010 18:03 UTC
Return-Path: <dhc@dcrocker.net>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 448D73A6925; Wed, 22 Dec 2010 10:03:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3n2X7E2rjUvK; Wed, 22 Dec 2010 10:03:00 -0800 (PST)
Received: from sbh17.songbird.com (sbh17.songbird.com [72.52.113.17]) by core3.amsl.com (Postfix) with ESMTP id 279EC3A6909; Wed, 22 Dec 2010 10:03:00 -0800 (PST)
Received: from [192.168.1.43] (adsl-67-127-191-82.dsl.pltn13.pacbell.net [67.127.191.82]) (authenticated bits=0) by sbh17.songbird.com (8.13.8/8.13.8) with ESMTP id oBMI4mYh013364 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Wed, 22 Dec 2010 10:04:53 -0800
Message-ID: <4D123DC0.2050501@dcrocker.net>
Date: Wed, 22 Dec 2010 10:04:48 -0800
From: Dave CROCKER <dhc@dcrocker.net>
Organization: Brandenburg InternetWorking
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Apps Discuss <apps-discuss@ietf.org>, ima@ietf.org, draft-ietf-eai-rfc5336bis@tools.ietf.org, SM <sm+ietf@elandsys.com>, Alexey.Melnikov@isode.com
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0 (sbh17.songbird.com [72.52.113.17]); Wed, 22 Dec 2010 10:04:55 -0800 (PST)
Subject: [apps-discuss] (private) draft review of: draft-ietf-eai-rfc5336bis-07.txt (v3)
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: dcrocker@bbiw.net
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Dec 2010 18:03:04 -0000
IETF Applications Review I have been selected as an Applications Area Review Team reviewer for this draft. For background on apps-review, please see: <http://www.apps.ietf.org/content/applications-area-review-team> Please resolve these comments along with any other Last Call comments you may receive. Please wait for direction from your document shepherd or AD before posting a new version of the draft. Document: draft-ietf-eai-rfc5336bis-07.txt Reviewer: Dave Crocker, Brandenburg InternetWorking Review Date: 2010-12-22 BACKGROUND: The document is a specification for an email transport-time option that is described in its Abstract as declaring support for "internationalized email addresses or header information" and in the Introduction as being "to support an internationalized email address". The extension specifies changes both in the transfer protocol and in the message being transferred, including its Body. Legacy Internet Mail only supports classic "network ASCII" for data representation and for data transfer encoding. The charter for the current work cites previous work that was issued as Experimental, and summarizes it as having been "... based on the use of an SMTP extension to enable the use of UTF-8 in envelope address local-parts, optionally in address domain-parts, and in mail headers." This text appears also to serve as the statement of scope for the current working group. With respect to the mail header, the scope is specified as covering <address> and <encoded-word> constructs. An <encoded-word> is a means of mapping Unicode strings onto classic, network ASCII, although the working group focus was on binary, UTF-8 support. RECOMMENDATIONS: This work represents an extremely important enhancement to Internet mail. It has been clear for twenty years that Internet applications need to be able to present data in a form that is natural to the user. The current work benefits from providing a Framework document and from distinguishing changes to the email header from changes to the SMTP protocol. 0. The documents do raise distinctions between ASCII and Unicode, versus between ASCII and UTF-8. However they do not apply them rigorously in the documents. I suggest that the term "internationalization" be used only during introductory discussion and never as part of normative text. ASCII vs. Unicode is a distinction between the underlying range of data being represented. ASCII vs. UTF-8 is a distinction between encoding environments. Text should explicitly indicate whether it means data representation versus data encoding and it should use Unicode for the former and UTF-8 for the latter. 1. The Framework document is normative and needs to be completed along with the other two specifications, since they quite reasonably state a normative dependence on the Framework document. 2. The SMTP extension draft needs to focus on SMTP and the direct support of UTF-8 in the message header. It needs to move all discussion and specification of Unicode support in the Header to the Header draft. 3. There probably needs to be definition of MIME message/uni-rfc822, specifying Unicode support within a message contained in MIME. I'm less clear whether there needs to be a MIME Content-Transfer-Encoding form specific to UTF-8. 4. The SMTP extension needs to have the client use an explicit signal when it is sending a message encoded in UTF-8. This is easiest as a parameter to the MAIL command. The current specification creates a more complex and almost heuristic model for distinguishing ASCII from UTF-8 use. 5. The SMTP extension needs to remove all restrictions it imposes on MIME content-type. A major reason that MIME was successful was that it was transparent to the transfer infrastructure. The current SMTP extension specification changes this model, which actually increases the barrier to adoption of Unicode in email. It needs to be easy for two MUAs that support Unicode in the email header to exchange mail even when the infrastructure does not support UTF-8. SUMMARY COMMENTS: The document conforms to the conventions for defining SMTP options. There are a number of significant issues with the specification. These are covered in detail, below, and are summarized here: * Framework -- The Framework document[1] is (correctly) referenced as required reading. It supplies essential terminology and architecture for this specification. In fact it is a specification, complete with formally normative vocabulary. This means that it must be a normative reference by rfc5336bis. It therefore also means that the Framework document needs to be completed before the current specification can be standardized. * Scope -- the specification appears to go significantly beyond the scope of the working group's charter, including revisions to basic SMTP that have no obvious requirement for support of Unicode during transport. In fact, at least one change is likely to /restrict/ system-wide adoption, rather than encourage it! In particular, the specification restricts the conditions under which some MIME content is allowed to be sent. (See next bullet.) * Infrastructure Requirement -- Unless this option is in force, carrying internationalized email in a MIME part is prohibited. This is out of scope for the working group and it is a counter-productive rule. Imagine if the same type of rule had been specified when MIME was created, saying that MIME could only be sent when an "attachments-supported" option were in force. This would have prevented the early adoption of MIME use by individual MUAs until the entire infrastructure supported MIME. (As an example of the very high barrier this raises, note the difference in real-world support and use of MDN versus DSN.) While it is reasonable for the working group to define a new MIME content-type that modifies message/rfc822 to support internationalized addresses, it is not appropriate for the working group to modify the SMTP transfer model to constrain what types of message content can be sent. * Beyond <encoded-word> -- The work, here, appears to have two goals. One is to add support for Unicode in a <local-part>; that is, support for internationalized addresses. However note that <encoded-word> and <A-label> already accomplish this in a way that is transparent to the existing email infrastructure; only the end-systems need to understand it. The second goal is to support Unicode in the more "native" form of UTF-8. (The quotation marks are because UTF-8 is not native Unicode, either; it is a highly encoded form of Unicode...) This creates some confusion in the specification. Given that the option for SMTP is "UTF8SMTPbis", then the binary encoding goal seems to dominate the work. This is a certainly a reasonable goal, but it will help the clarity of the specification to make these two different goals more clear within the document and to apply them more carefully. * Terminology confusion -- The Framework document carefully distinguishes between "ASCII addresses and non-ASCII addresses. It equates "internationalized" first with "non-ASCII", but then with "UTF8SMTPbis". A core problem is that ASCII is part of the actual internationalized set of Unicode characters. So, to say that "international" characters are non-ASCII is exclude part of Unicode from the term "international". In addition, equating the term "internationalization" with UTF-8 encourages confusion between underlying or "native" data -- that is, Unicode -- with the way it is represented over the wire. UTF-8 is merely one means of over-the-wire representation. So, for example, <A-label> and <encoded-word> are two other means of encoding. It's clear that this core distinction really is understood by the authors of this and the Framework documents. However the vocabulary choices and their usage create a problem in the details of the specification. "Internationalization" should mean Unicode, not a particular binary representation of it within 8-bit chunks. The problem, here, is in using the term "internationalization" to refer to a subset of Unicode, that is, the subset that is not ASCII. I strongly suggest saying "Unicode" when intending to refer to the richer set of characters that are the goal of this work, and "UTF-8" when referring to the particular binary encoding of Unicode that is the focus of the SMTP extension work. * Non-UTF-8 Unicode support -- Can a message support Unicode without UTF-8? The existence of <encoded-word> and <A-label> constructs makes clear that the answer is yes. Hence, it should be possible to support Unicode messages, without this SMTP extension. Perhaps this is out of scope for this SMTP extension and perhaps it is handled by the Email Header draft, but I think it worth having this document cite this alternative mode, if only to a) make clear that the alternative exists, and b) make more clear what the specific and strong benefit of this extension is. * Partial enforcement -- Since ASCII is a subset of Unicode, having this extension be in force means that /everything/ is Unicode AND, apparently, is encoded in UTF-8. If the environment created by this extension supports UTF-8, then it supports UTF-8, meaning both ASCII and non-ASCII. Defining rules that depend on having this extension be in force but then still distinguish between ASCII and non-ASCII does not seem to make sense. * Complexity and Heuristics -- In a number of places, the specification defines highly contingent action, where one side can use UTF-8 only if the other side has done so. This makes the enhancement much more complicated than necessary or appropriate. The enhancement needs to work with an all-or-nothing model in which UTF-8 is in force or it is not. And, yes, this appears to be a major change in the model of this specification. My understanding is that these issues were discussed in the working group, but I do not understand why and nearly-heuristic approach was preferred. Instead I recommend the client to signal explicitly when UTF8-encoded addresses are present, such as a <Mail-parameters> option (<esmtp-param>) to the MAIL command. * 5321 vs. 5322 -- The specification seems to confuse -- or at least to mix -- some rules from RFC 5321 versus some from RFC 5322. <mailbox> is the major example. * Since an RFC 5322 message can and often does exist outside of the SMTP environment, any changes to the RFC 5322 specification should be in a document that is separate from this extension specification. This specification can then cite it. I suggest moving all RFC 5322 changes to the Header document[2] and merely citing it here. * UTF8SMTPbis -- The draft uses the string "UTF8SMTPbis" when referencing the SMTP option. The Framework document explains the choice, but IANA Considerations in this document needs to provide explicit handling instructions for it, since this is certain NOT to be the actual string that is used. * Redundant Specification - in a number of places, normative language from other specifications is repeated. This invites divergent specification and is generally out of scope for the current work. To the extent that the current specification needs to refer to normative parts of other specifications, it should do only that: cite it; do not repeat it. For example to highlight an important normative item from another specification, the current specification might describe the "topic" the external language covers, without saying what it says. In other cases, the key point is that the current specification is not changing requirements from another specification; that is what should be said, rather than repeating what that requirement is. In general, these references to external, normative behaviors should be reviewed for relevance to the current work. How is internationalized addresses relevant to the normative detail? * Rule Meta-Naming -- The 'u' preface for revised rules is a reasonable idea, but appears to be problematic. These rules replace existing rules in other specifications. There needs to be an explicit and decision about the handling of this, and it needs to be applied consistently, either directly replacing the rules or else re-naming them consistently, in a fashion that can be parsed (similar to the naming template that was done with RFC5322 obs-* rules.) If an initial string is to be used, I suggest UTF-8-* rather than u*, in order to make it possible to parse this meta-label more reliably. DETAILED COMMENTS: { I have included text from the draft that provides context, but have skipped sequences of text from the draft that do not. } > 1. Introduction > > The Simple Mail Transfer Protocol [RFC5321] provides a negotiation mechanism > about service extension with which clients can discover with which -> by which { or } through which { Either of these would be my stylistic preference. } > server capabilities and make decisions for further processing. This > document use this mechanism to support an internationalized email use -> uses > address. An extended overview of the extension model for { The option enables use of UTF-8 in the email header, beyond just addresses, therefore: } to support...address -> to support internationalized email addresses and internationalized characters for the email header. > internationalized addresses and headers appears in [RFC4952bis], headers -> the email header > referred to as "the framework document" or just as "framework" elsewhere in > this specification. This document specifies an SMTP extension to permit > internationalized email addresses in envelopes, envelopes -> the SMTP envelope { 1. Although 'envelope' is almost certainly unambiguous, its use over the years has been confusing, so it is worth being particularly clear in a specification. 2. The use of plurals in a specification can be confusing. I suggest using the singular form wherever it can work, so that remaining use of plurals becomes more precise. Also note the particular confusion with the word "headers" in email. Since a single email has only a single header, the use of plural means each header from a set of messages... } > and UNICODE characters (encoded in UTF-8) [RFC3629] in headers. headers -> the header > 1.1. Role of This Specification > > The framework document specifies the requirements for, and describes > components of, full internationalization of the electronic mail. A thorough > understanding of the information in that document and in the base Internet > email specifications [RFC5321] [RFC5322] is necessary to understand and > implement this specification. { This means that the Framework document is normative; it needs to be cited as such. Since understanding it is a pre-condition of reading the specification, it also requires that the Framework document be completed before this document. } > This document specifies an element of the email internationalization work, > specifically the definition of an SMTP extension for internationalized email > address transport delivery. { Since it also declares support for internationalization in the message header, it covers more than delivery. In effect, it is a transport-time flag for declaring internationalization of the entire email "environment". I'm not quite sure what exact language change to suggest, however. I also note that it does more than declare support for internationalized addresses: It declares support for encoding them into UTF-8. This is a significant, additional requirement and should be stated explicitly. } > 1.2. Terminology ... > This specification defines only those Augmented BNF (ABNF) [RFC5234] syntax > rules that are different from those of the base email specifications and, > where the earlier rules are upgraded or extended, gives them new names. > When the new rule is a small modification to { The wording in the first part of this paragraph seems awkward. I think the following is clearer: } This specification uses Augmented BNF (ABNF) rules [RFC5234], with some modifications. The modified rules are defined here and the rest are simply imported from [RFC5234]. New names are used, for rules that are upgraded or enhanced. When the ... > the older one, it is typically given a name starting with "u". Rules { The use of "typically" makes the description of this convention problematic. It needs to be always true or else not mentioned. Or perhaps alternative phrasing: } When a new...with "u". -> When a new rule has a name starting with "u", it is a small modification to an older rule. { This asserts what is true and ignores what is not true, such as rules that qualify for the "u" but did not get it... However I now believe that it is important to have a consistent meta-rule for naming the revised rules and that it be applied consistently to all rules that qualify for it. Further, the naming convention should be easily parseable, and therefore more like what is used in RFC5322, to cover "obsolete" rules. I suggest that new rules, here, be named UTF-8-*. To repeat: I believe that /all/ rules that resolve to UTF-8 must be renamed, so that implementers know what they need to change. } > that are undefined here may be found in the base email specifications { The specifications should be cited here, explicitly, even though they have been cited elsewhere. It is important to leave no ambiguity for the reader. } here may be -> here can be {"may" is a reserved word for normative specification. Normative meaning is not based on the use of capitalization. } > 3.2. The UTF8SMTPbis Extension > An SMTP server that announces this extension MUST be prepared to > accept a UTF-8 string [RFC3629] in any position in which RFC 5321 > specifies that a mailbox can appear. That string MUST be parsed only > as specified in [RFC5321], i.e., by separating the mailbox into { The 'i.e.' clause is an example of repeating normative text from another specification. The current document is not empowered to give directives about basic SMTP parsing, nor is there any internationalization requirement that it do so. At most, it should say that the changes specified in this document do not change any other aspect of SMTP processing. } > Once isolated by this parsing process, the local part MUST be > treated as opaque unless the SMTP server is the final delivery Mail > Transfer Agent (MTA). { The statement about handling of <local-part> is redundant with the base RFC 5321 specification. The statement, here, should be that the handling of <local-part> is unchanged from the base specification. Again, it should not repeat the normative language, unless it is changing it. If it is changing it, the change needs to be essential for support of internationalized addresses. } > Any domain names that are to be > compared to local strings SHOULD be checked for validity and then > MUST be compared as specified in section 3 of [RFC5891]. { Dictating use of RFC5891 is within scope. Dictating validation seems not to be. So...} -> Any domain name that is to be compared to a local string MUST use Section 3 of [RFC5891] as the basis for comparison. > An SMTP client that receives the UTF8SMTPbis extension keyword in response > to the EHLO command MAY transmit mailbox names within SMTP commands as > internationalized strings in UTF-8 form. It MAY send a UTF-8 header > [RFC5335bis] (which may also include mailbox names in UTF-8). It MAY > transmit the domain parts of mailbox names within SMTP commands or the > message header as A-labels or U-labels { I believe that the use of "MAY" is not correct. This would mean that the receiver needs a means of distinguishing whether the data are UTF-8 or not. This would border on requiring support of a heuristic, but it certainly adds a significant processing overhead and additional software complexity. The only alternative is to specify use of a MAIL command <Mail-parameters> option that declares that the message supports internationalized addresses. Given the approach in this specification, I believe the intent is also to have it mean that UTF-8 encoding is supported. The core issue here is specifying an option which declares a message to have an EAI context for all of the message. So the processing context is fully EAI/UTF-8 or it is legacy net-ASCII. This is considerably simpler to specify and to process, than would be requiring parsing the incoming string and looking for non-ASCII UTF-8. Hence: } An SMTP client...U-labels -> An SMTP client that receives the UTF8SMTPbis extension keyword, in response to the EHLO command, will transmit <local-part> within SMTP commands as internationalized strings in UTF-8 form. It will send the email header in UTF-8 [RFC5335bis] (which can also include <mailbox> names in UTF-8.) It also will transmit the domain parts of mailbox names within SMTP commands or the message header as A-labels or U-labels { Note - the term "mailbox names" is not defined here or in RFC 5321. In RFC 5321 it appears to be used to mean local-part; however becasue its precise meaning is unclear, I strongly urge NOT using it here at all. Instead I suggest using whatever ABNF rulename is appropriate. This guarantees clarity. Note: I changed 'may' to 'can'. Also, when an ABNF rule is being cited within prose text, such as for <mailbox>, it should be distinguished so that the reader knows it is a formal term. I have used <> to bracket the term mailbox. } > All labels in domain parts of mailbox names which are IDN > forms of A-labels or U-labels MUST be valid. { This is strange. Either it is repeating a normative requirement from SMTP or it is expanding SMTP to require special validation for IDN forms of domain names that is not present for ASCII forms. Neither interpretation seems like the right thing to be doing here. Such a modification to SMTP seems out of scope. Also, the term <mailbox> has different semantic definitions in RFC 5321 and RFC 5322. A strict reading of the differences could be a problem. A loose reading would note that both definitions reduce to include <local-part> and <domain> components that are common to both specifications. I encourage you to review this issue carefully and put a note at the beginning of the document stating explicitly how you have chosen to handle it. My best recommendation is that this document should only refer to RFC 5321 ABNF and that it should move /all/ RFC 5322 modifications or enhancements to the EAI Header document (draft-ietf-eai-rfc5335bis). } > When a Mail User > Agent(MUA) submits a message to a Message Submission Server > ("MSA")[RFC4409], it is the responsibility of the MSA to ensure that > all domain labels are valid. { Given that this specification creates broad systemic effects and given that it needs to refer to components other than an SMTP client or server, it should cite RFC 5598, to give the reader an integrated view of the email service. I'll note that citing 5598 has become common for email specifications; so this is not a controversial suggestion. } >The presence of the UTF8SMTPbis > extension does not change the requirement of RFC 5321 that servers > relaying mail MUST NOT attempt to parse, evaluate, or transform the > local part in any way. { This sentence replicates normative language from a different specification. This is a very bad thing to do, in case the original specification changes its language. I suggest: } The presence of...in any way -> The presence of the UTF8SMTPbis extension does not change RFC 5321 server relaying behaviors. { this retains some text as a flag to the reader, but does not provide the specific semantics, which might change in the original specification. } > If the UTF8SMTPbis SMTP extension is not offered by the server, the SMTP > client MUST NOT transmit an internationalized address and MUST NOT transmit { This appears to prohibit the sending of internationalized addresses that are encoded in ASCII, rather than in UTF-8. The purpose of ASCII encoding is to eliminate the need for infrastructure support for Unicode characters. However the language here appears to be imposing the barrier of infrastructure support. If the concern is sending UTF-8, then that's the language that needs to be used. Messages with ASCII-encoding of internationalized addresses need to be permitted to be sent, without first requiring infrastructure support. } > a mail message containing internationalized mail headers as described in > [RFC5335bis] at any level within its MIME structure [RFC2045] and [RFC2047]. { The prohibition of internationalized mail headers within a MIME structure -- "at any level within its MIME structure" -- is out of scope for the working group and is a /major/ change to SMTP. It is also a really terrible rule! In terms of protocol modeling, it would have been like saying that no one could use MIME until the infrastructure supported it! It is one thing to give the reader a reminder that UTF-8 is illegal in the email header of a message and quite another to attempt to prohibit it in attachments. Attachments already carry all sorts of data. UTF-8 is merely one more type. I believe that there currently no SMTP constraints on the carriage of MIME; I also believe it essential that there not be, since such constraints impede adoption. (That is why MDN support is good and DNS support is poor.) Also note that MIME objects exist outside of email transport and that directives about legal or illegal MIME ought to be separated from SMTP... Also, perhaps mail with UTF-8 in the header needs a different MIME type, such as text/UTF-8-message?... } > 2. It may either reject the message during the SMTP transaction or accept may -> MAY { However, I note that this concerns a SMTP client, not a server. > 3.3. Extended Mailbox Address Syntax > > RFC 5321, Section 4.1.2, defines the syntax of a mailbox entirely in terms > of ASCII characters, using the production for a mailbox and those productions mailbox -> <mailbox> > on which it depends. > > The key changes made by this specification are, informally, to { I don't understand what it means to change RFC 5321 "informally". I think it is intended to mean that the list is not guaranteed to be complete. If so, I suggest wording such as: } -> The key changes made by this specification include: > o Change the definition of "Domain" to permit either the RFC 5321 definition > above or a UTF-8 string representing a DNS label that is conformant with > IDNA definitions [RFC5890]. { "either"??? That sounds completely ambiguous. } > o Change the definition of "Local-part" to permit either the definition > above or a UTF-8 string. That string MUST NOT contain any of the ASCII > characters (either graphics or controls) that are not permitted in "atext"; > it is otherwise unrestricted. { same concern as above. } > According to the description above, the syntax of an internationalized email > mailbox name (address) is defined in ABNF [RFC5234] as follows. > > uMailbox = uLocal-part "@" ( uDomain / address-literal ) ; Replace Mailbox > in RFC 5321, Section 4.1.2 { This implies that the option applies only to RFC5321 and not to RFC5322, but the later rules make clear that 5322 is also supposed to be covered. To repeat: I recommend that all RFC 5322 enhancements and ABNF should be moved to the EAI Header draft and that the SMTP extension should merely cite that draft. Note that RFC5322 is for an object that can and does exist outside of SMTP. Enhancements to RFC5322 well might need to apply when there is no SMTP, or at least long after it is relevant. } > UTF-8-non-ASCII = UTF-8-2 / UTF-8-3 / UTF-8-4 > > UTF-8-2 = <See Section 4 of RFC 3629> > > UTF-8-3 = <See Section 4 of RFC 3629> > > UTF-8-4 = <See Section 4 of RFC 3629> { These rules are also defined in RFC5335 and draft-ietf-eai-rfc5335bis. Citing RFC 3629 is therefore confusing, and possibly can create specification divergence. I suspect that RFC 3629 is not the correct reference, here, since it would imply that RFC5335bis' definitions should be ignored... If RFC 3629 /is/ the correct reference, then I cannot guess why and I cannot guess why it will not create problems with divergent specification. } > The value of "uDomain" SHOULD be verified by IDNA definitions [RFC5890]. If { I do not understand what "SHOULD be verified by IDNA definitions" means. If it means that the uDomain "SHOULDbe verified" then it is out of scope for this specification. There is no reason that SMTP rules concerning verification of Unicode-based domains needs to be different from ASCII-based domains. In particular, verification of a Domain name is specified elsewhere. If it means that the uDomain "should be IDNA complaint" then I do not understand having the exceptions that a "should" allows. For this EAI work, I would think that when UTF8SMTPbis is in force, then IDNA compliance needs to be a MUST. } > that verification fails, the email address with that uDomain MUST NOT be > regarded as a valid email address. MUST NOT be regarded as a valid -> MUST BE regarded as an invalid { Stating this as an affirmative is stronger. } > 3.4. UTF-8 addresses and Response Codes > > An internationalized message MUST NOT be sent to an SMTP server that does > not support UTF8SMTPbis. Such a message should be rejected by a server if > it lacks the support of UTF8SMTPbis. { The first sentence is restating a normative rule already existing in other specifications. The second sentence is simply confusing. If the first sentence is applied, then the server does not support UTF8SMTPbis. I suspect what is actually meant in the second sentence is something like: } If a server receives email encoded in UTF-8, when UTF8SMTPbis is not in force, then the server SHOULD reject the message. { However, this means that the current specification is trying to dictate behavior in the legacy environment, and it cannot do that. I believe the rule that is actually intended is: } An SMTP client MUST only send a message containing UTF-8 to a server that supports UTF8SMTPbis. If the server does not support this option, then the client MUST terminate the delivery attempt with a permanent error, or else find another path. > The three-digit reply codes used in this section are consistent with their > meanings as defined in RFC 5321. { "consistent with"? I hope you actually mean that they are "the same"! Either they are the same as in 5321 or they are changed. } > When messages are rejected because the RCPT command requires an ASCII > address, the response code 553 is used with the meaning "mailbox name not used -> returned, > allowed". When messages are rejected for other reasons, such as the MAIL > command requiring an ASCII address, the response code 550 is used with the { RFC 5321 uses the terms "completion code", "reply code" and "response code" interchangeably. That probably makes the use of "response code" here legal. However the formal ABNF in RFC 5321 defines the rules: <reply-line> <reply-code> For clarity and precision, I strongly recommend using only those terms when referring to replies/responses/completion. Since the current document is cross-referencing a formal construct from another document, it will help the reader to resolve the reference by using only the most formal term.} > meaning "mailbox unavailable". When the server supports enhanced mail { "unavailable" sounds like a temporary error. More generally, the text here says that this set of rejections is "for other reasons". However it does not really mean all other rejections. Since there are many, different "other reasons" and some of them are temporary errors, this text needs to be revised. It needs to make clear the difference in handling temporary versus permanent errors and in particular it needs to define both types in terms that are specific to this enhancement, in order to distinguish them from all other SMTP reply-codes.} > system status codes [RFC3463], response code "X.6.7" [RFC5248] is used, > meaning that "UTF-8 addresses not permitted for that sender/recipient". > > If the response code is issued after the final "." of the DATA command, the > response code "554" is used with the meaning "Transaction failed". When the > server supports enhanced mail system status codes [RFC3463], response code > "X.6.9" [RFC5248] is used, meaning that "UTF-8 header message can not be > transferred to one or more recipient so the message must be rejected". > > 3.5. Body Parts and SMTP Extensions > > There is no ESMTP parameter to assert that a message is an internationalized > message. An SMTP server that requires accurate knowledge of whether a > message is internationalized is required to parse all message header fields > and MIME header fields [RFC2045] and [RFC2047] in the message body. { This is confusing. There is a rule against sending an internationalized message, unless UTF8SMTPbis is in force, but there is no requirement that the server be told explicitly when a message is internationalized??? As noted above, the simple and appropriate action to take is, instead, to have the MAIL command contain an <esmtp-param> that declares that the message supports internationalized addresses. My understanding is that this was discussed by the working group. Why was this not the choice of the working group? } > While this specification requires that servers support the 8BITMIME > extension [RFC1652] to ensure that servers have adequate handling capability > for 8-bit data and to avoid a number of complex encoding problems, the use > of internationalized addresses obviously does not require non-ASCII body > parts in the MIME message [RFC2045] and [RFC2047]. The UTF8SMTPbis extension > MAY be used with the BODY=8BITMIME parameter if that is appropriate given > the body content or, with the BODY=BINARYMIME parameter, if the server > advertises BINARYMIME [RFC3030] and that is appropriate. { This last sentence is either saying too much or too little. In general, this option does not specify MIME or Body details. Nor does there appear to be any reason that it should. As a consequence, I believe that any text about MIME or the Body needs to be non-normative. It's fine to provide a small amount of pedagogy about the carriage of MIME, but including normative language about it here merely invites confusion and possibly even divergent specifications. Also note that the reference to "BODY=" values does not explicitly cite RFC 1652. So the references require the reader to already know what is being referred to. This should be fixed.} > Assuming that the server advertises UTF8SMTPbis and 8BITMIME, and > receives at least one non-ASCII address, the precise interpretation of > "BODY=8BITMIME", and "BODY=BINARYMIME" in the MAIL command is: 1. If a { My reading of the current specification is that it does not change the handling of email Body or MIME mechanisms. Hence, DO NOT REPEAT NORMATIVE LANGUAGE FROM OTHER SPECIFICATIONS. It invites divergent specification. If the current specification is actually re-defining the semantics or syntax of RFC 1652, then it needs to say so. However my reading of the current specification is that it is quite independent of email Body issues. } > BODY=8BITMIME parameter is present, the header contains UTF-8 characters, > and some or all of the body parts contain 8-bit line-oriented data. 2. If a > BODY=BINARYMIME parameter is present, the header contains UTF-8 characters, > and some or all body parts contain binary data without restriction as to > line lengths or delimiters. { Also note that the items numbered 1 and 2 are not complete sentences. They only contain the conditional clause. "If...." is missing the "then". Hence, no actual "interpretation" is provided, contrary to the promise that is given.} > 3.6. Additional ESMTP Changes and Clarifications > > The information carried in the mail transport process involves addresses > ("mailboxes") and domain names in various contexts in addition to the MAIL > and RCPT commands and extended alternatives to them. In general, the rule > is that, when RFC 5321 specifies a mailbox, this specification expects UTF-8 this specification -> this SMTP extension { "this specification could be misread as meaning RFC 5321, since it is the most recent specification references...} > to be used for the entire string; when RFC 5321 specifies a domain name, the > name SHOULD be in the form of A-label if its raw form is non-ASCII. { the more serious problem is the continuing confusion about ASCII vs. non-ASCII and UTF-8 vs non-UTF-8. Again, I believe that having this extension be in force means that this data are /always/ UTF-8. Hence, it is not a matter of "expects". It is a matter of "requires".} > The following subsections list and discuss all of the relevant cases. > > 3.6.1. The Initial SMTP Exchange > > When an SMTP connection is opened, the server normally sends a "greeting" > response consisting of the 220 response code and some information. The { "normally"? what are the exceptions?} > client then sends the EHLO command. Since the client cannot know whether > the server supports UTF8SMTPbis until after it receives the response from > EHLO, the client must send only ASCII (LDH label [RFC5890] or A-label) > domains in the EHLO command and that, if the server provides domain names in > the EHLO response, they must be in the form of LDH labels or A-labels. > > 3.6.2. Mail eXchangers > > Organizations often authorize multiple servers to accept mail addressed to > them. For example, the organization may itself operate more than one may -> might or may -> can { normative vocabulary can only be used for normative statements. please do a global search and replace, where normative vocabulary are used in non-normative sentences.} > server, and may also or instead have an agreement with other organizations to > accept mail as a backup. Authorized servers are generally listed in MX > records as described in RFC 5321. When more than one server accepts mail for > the domain-part of a mailbox, it is strongly advised that either all or none > of them support the UTF8SMTPbis extension. { This last sentence sounds normative and I believe it should be. So... } it is strongly advised that either all or none of them support -> all or none of them SHOULD support > Otherwise, surprising rejections > can happen during temporary failures, which users might perceive as a serious { Probably not just temporary failures. Having services that are meant to be redundant with each other actually provide different semantic behavior is just plain dangerous. It is essentially guaranteed that they will cause problems.} > 3.6.3. Trace Information > > When an SMTP server receives a message for delivery or further > processing, RFC 5321 requires that it MUST insert trace ("time stamp" > or "Received") information at the beginning of the message content. > For the trace information, this memo updates the time stamp line and > the return path line [RFC5321] formally defined as follows: { Again, don't repeat normative language. I suggest simply deleting the first sentence. } > [RFC5321] formally defined as follows: > > uReturn-path-line = "Return-Path:" FWS uReverse-path <CRLF> ; Replaces > Return-path-line in Section 4.4 of RFC 5321 { On reflection, I think the "u" rule-naming convention is problematic. If the current specification is replacing a rule previously defined elsewhere, it needs to use the same rulename. This is simpler and clearer. Otherwise, all future references to the rule need to use the new name. Rather, where the draft is replacing a rule from another specification, the rule definition includes a comment that cites where the original rule is from; that should be sufficient. } > uReverse-path = uPath / "<>" ; Replace Reverse-path in RFC 5321, section > 4.1.2 > > uPath = "<" [ A-d-l ":" ] uMailbox ">" ; Replace Path in RFC 5321, section > 4.1.2 ; uMailbox is defined in section 3.3 of this document > > A-d-l = <See section 4.1.2 of RFC 5321> { Small improvement: Where a rule is defined elsewhere, I suggest having the descriptive text in the rule say <Defined in...> rather than <See...>. } > 3.6.4. UTF-8 Strings in Replies > > 3.6.4.1. RCPT Commands > > If an SMTP client follows this specification and sends any RCPT commands > containing non-ASCII addresses, the SMTP server is permitted to use UTF-8 > characters in the email address associated with 251 and 551 response codes, > and the client MUST be able to accept and process them. { I assume that "follows this specification" means that the UTF8SMTPbis option is in force. If so, then say that, because it is simpler and more precise. But, then, it does require declaring /use/ of the option... However the meaning of the text here is odd. It implies that the server is not permitted to use UTF-8 unless it has already received UTF-8, even though the option is in force. I suspect that that is not the specification that is intended. Or, at least, I hope it is not. If it actually /is/ what is intended, it means that the enhanced environment is going to have all sorts of /additional/ conditional code, to check on whether a specific context in the state machine is allowed to act in one way or another. The simple and more reasonable model is that when the extension is in force, UTF-8 is allowed. Period. So... } If an SMTP session is using this extension, then the server is permitted to use UTF-8 characters in the email address associated with 251 and 551 reply-codes, and the client MUST be able to accept and process them. { I chose "SMTP session" to avoid the more complex discussion of the client/server 'negotiation' about using the extension. So, here it is either in force or it isn't and if it is in force, it is for client AND server. } > If a given RCPT > command does not include a non-ASCII envelope address, the server MUST NOT > return a 251 or 551 response containing a non-ASCII mailbox. Instead, it > MUST transform such responses into 250 or 550 responses that do not contain > non-ASCII addresses. { See above. I very strongly disagree with having the complexity this kind of conditional requirement creates. Or else there is a basic issue involved here that the document does not discuss but needs to, to justify the complexity. } > 3.6.4.2. VRFY and EXPN Commands and the UTF-8REPLY Parameter > > If the VRFY and EXPN commands are transmitted with the optional parameter > "UTF-8REPLY", it indicates the client can accept UTF-8 strings in replies to > those commands. This allows the server to use UTF-8 strings in mailbox { Is this extension trying to to say that a client might support UTF8SMTPbis but not be able to access UTF-8 replies??? Under what circumstance is this reasonable? } > names and full names that occur in replies without concern that the client > might be confused by them. An SMTP client that conforms to this { What does it mean to be "confused" by them? Because the extension is in force, we know that the client supports UTF-8. } > specification MUST accept and correctly process replies from the VRFY and > EXPN commands that contain UTF-8 strings. However, the SMTP server MUST NOT > use UTF-8 strings in replies if the SMTP client does not specifically allow > such replies by transmitting this parameter. Most replies do not require { Why is this constraint required? } > that a mailbox name be included in the returned text, and therefore UTF-8 is > not needed in them. Some replies, notably those resulting from successful > execution of the VRFY and EXPN commands, do include the mailbox, making the > provisions of this section important. > > VERIFY (VRFY) and EXPAND (EXPN) command syntaxes are changed to: > > vrfy = "VRFY" SP ( uLocal-part / uMailbox ) [ SP "UTF-8REPLY" ] CRLF ; > uLocal-part and uMailbox are defined in ; Section 3.3 of this document. > > expn = "EXPN" SP ( uLocal-part / uMailbox ) [ SP "UTF-8REPLY" ] CRLF ; > uLocal-part and uMailbox are defined in ; Section 3.3 of this document. { Note that these rules only specify UTF-8 support, without any contingency on use. That is, they do not cover the variability that is described in the prose, and that disparity from the prose specification is an example of the complexity created by having the use of UTF-8 in replies be so contingent on the actual data sent by the client. } > If a normal success response (i.e., 250) is returned, the response MAY > include the full name of the user and MUST include the mailbox of the user. > It MUST be in either of the following forms: > > User Name <uMailbox> ; uMailbox is defined in Section 3.3 of this document. > ; User Name can contain non-ASCII characters. > > uMailbox ; uMailbox is defined in Section 3.3 of this document. { "User Name" needs to be specified as an abnf rule. It isn't. This appears to be intended to invoke the RFC 5322 definition of <mailbox> rather than the RFC 5321 definition. In any event, it is creating a semantic change to the response, from RFC 5321, beyond merely allowing UTF-8 characters. } > If the SMTP reply requires UTF-8 strings, but UTF-8 is not allowed in the > reply, and the server supports enhanced mail system status codes [RFC3463], > the enhanced response code is "X.6.8" [RFC5248], meaning "A reply containing > a UTF-8 string is required to show the mailbox name, but that form of > response is not permitted by the client". { "the mailbox name"? which string is that? } > If the SMTP client does not support the UTF8SMTPbisbis extension, but receives > a UTF-8 string in a reply, it may not be able to properly report the reply If a UTF-8 string is sent in a reply, when this extension is not in force, then it is a protocol violation. Period. > [RFC4952bis] Klensin, J. and Y. Ko, "Overview and Framework for > Internationalized Email", RFC 4952, July 2010. RFC4951 -> draft-ietf-eai-frmwrk-4952bis { It is not now, and never will be, RFC 4952; And RFC4952bis is currently dated September 2010. When it is an RFC, it will have a new number. Also, Internet Drafts, including -bis documents, usually follow a different citation labeling convention, such as I-D.rfc4952bis, to make clear that they are an I-D and not an RFC. } [1] draft-ietf-eai-frmwrk-4952bis-10 [2] draft-ietf-eai-rfc5335bis-03 d/ -- Dave Crocker Brandenburg InternetWorking bbiw.net
- [apps-discuss] (private) draft review of: draft-i… Dave CROCKER
- Re: [apps-discuss] (private) draft review of: dra… Dave CROCKER