[EAI] Analysis of comments on 5336bis (was: Re: apps-team review of draft-ietf-eai-rfc5336bis)

John C Klensin <klensin@jck.com> Mon, 27 December 2010 19:37 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 600A93A67BD for <ima@core3.amsl.com>; Mon, 27 Dec 2010 11:37:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.507
X-Spam-Level:
X-Spam-Status: No, score=-2.507 tagged_above=-999 required=5 tests=[AWL=0.092, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qvfjV9jKPzfO for <ima@core3.amsl.com>; Mon, 27 Dec 2010 11:37:35 -0800 (PST)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by core3.amsl.com (Postfix) with ESMTP id 2E82528B797 for <ima@ietf.org>; Mon, 27 Dec 2010 11:37:33 -0800 (PST)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1PXIut-000OMN-8S for ima@ietf.org; Mon, 27 Dec 2010 14:39:37 -0500
X-Vipre-Scanned: 05AF6702001DEB05AF684F-TDI
Date: Mon, 27 Dec 2010 14:39:31 -0500
From: John C Klensin <klensin@jck.com>
To: ima@ietf.org
Message-ID: <68655A9F86D4BE7ED933F8A6@[192.168.1.128]>
In-Reply-To: <Pine.OSX.4.64.1012221602490.40683@mac-allocchio3.elettra.trieste.it>
References: <Pine.OSX.4.64.1012221602490.40683@mac-allocchio3.elettra.trieste.it>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: [EAI] Analysis of comments on 5336bis (was: Re: apps-team review of draft-ietf-eai-rfc5336bis)
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Dec 2010 19:37:39 -0000

Hi.

Long, careful, and thoughtful reviews deserve long --and I hope
careful and thoughtful-- responses.  This note attempts to
identify the main substantive points in the reviews and react
to them.  Some may consider it a special case of no good deed
going unpunished; if so, I'm sorry.

I'm sending the note separately to the EAI WG mailing list
(ima@ietf.org) and the Apps Discuss one.  I hope that we can
move discussion to the WG list.  Put differently, please do not
respond to this note on the apps-discuss list -- if anyone has
additional comments, please take them to the WG list.  My
apologies to those who will get it twice -- there seemed no
better way to avoid, or at least reduce, multiple threads of
discussions that were not visible to WG participants.

Writing only as a participant in the WG and co-author of the
"EAI framework" document, I have several comments on the
various Applications Area Review Team comments (my co-chair
helped a bit with part of this note but I'll take the blame
as needed).  To be certain we all start with the same
background assumptions, please remember that the original form
of the "EAI Framework" supported the Experimental version of
these protocols document was published in July 2007 as RFC
4952.  Its replacement, draft-ietf-eai-frmwrk-4952bis-10, is
more to the point today.  That draft is intended to be aligned
with the documents that are now under consideration (or to have
them aligned with it), has completed IETF Last Call, was
approved for publication by the IESG in late September, and has
been in the RFC Editor queue awaiting normative references to
the current set of documents and resolution of a naming issue
that the WG decided to leave until after the approval process
completed (more on that below).

Speaking very personally because I don't know whether the WG
participants will agree, I think these specifications are
important enough that I'd rather have them right than stand on
any sort of procedural argument.  However, I think the
community also needs to consider when, newly-discovered
showstoppers aside, it becomes too late to raise fundamental
design issues that were reviewed extensively by the WG prior to
forwarding the first-round documents to the IESG, went through
IETF Last Call in 2007 and 2008, have been implemented and
tested experimentally, and where the basic framework and
changes from the 2007-2008 specs were reviewed by the WG,
passed through IETF Last Call, and were approved by the IESG in
September of this year.

There are places in the comments below where I've tried to
summarize what I believe is WG opinion or WG consensus, but
even those remarks should be taken as personal opinion rather
than the formal opinion of the WG.  The WG has deliberately not
been given a preview of these notes and is seeing them for the
first time at the same time as reviewers and others.  I do hope
that some of the notes will act as a basis for discussion.

For convenience, I'm going to try to consider the review
comments by topic (there is some overlap among reviews) rather
than responding to each review comment separately.  When it
seems useful to identify particular comments, I've used the
first name of the author of the comments and the date of the
posting.  To avoid a hugely long note, this one is strictly
about 5336bis (the SMTP extensions); a separate note will be
posted about each of the other relevant documents.  Note that,
again speaking personally, I agree with some of the comments
even though WG consensus, as reflected in the current drafts,
does not.  

I have not commented on, or even listed, specific issues whose
consequences appear to be entirely editorial, even if the
editorial issues are quite broad.  I'll leave those comments to
the authors/editors (subject, as usual, to WG approval).  I
have also omitted most issues that seem to have been more than
adequately covered in mailing list discussions.

The balance of this note is about the reviews and on-list
comments about draft-ietf-eai-rfc5336bis.  It does not include
comments that appeared to be redundant unless listing different
versions seemed to add perspective or other information.  While
I've commented briefly on the VRFY and EXPN issues below, those
are being much better covered in the discussions between Dave
and Shawn, Ned's analysis, and the subsequent than I could do
or have attempted here.  Separate notes about the other reviews
will follow.


(1) Overview, scope, and summary:

Claudio Allocchio 2010-12-22 wrote: 

> The document specifies how to apply internationalization to
> some SMTP elements, and to some Header elements which
> depends from the SMTP elements.  However there seem to be a
> discrepancy from what the abstract states, and what is
> elaborated into the document details: the document also
> deals with quite a number of elements which involve
> bodyparts, and affect the whole email legacy system, and
> beyond, such as logs, traces, and wherever email addresses
> are used (certificates, etc.). This seems thus to go beyond
> what the summary states, even this fact is not explicitly
> discussed. It is unclear thus if this "update" of RFC5321
> and RFC5322 is only limited in scope or not. My reading of
> the document say it goes beyond, and implicitly affects also
> what is being transported into the email system.  This is
> thus a serious issue to be solved before the document can go
> to the Standard Track.

With the understanding that the abstract, and several other
editorial issues with the text, could certainly be improved,
please note that specifications for header fields that contain
trace-related information (and that are inserted by MTAs) have
appeared in all of RFCs 821/ 2821/ 5321.  The trend over time,
starting with some explicit DRUMS decisions, has been to move
more of that information into the x21 member of the pair rather
than having it split between x21 and x22.  So the choice with
this document is between updating information that appears in
5321 and creating a split between envelope and header
information that is different than the base email documents.
The WG concluded that tracking the organization and information
content of 5321 would be less confusing than trying to create a
new model.

By contrast, the information about certificates, etc., was
introduced into the predecessor of this document at the
insistence of the community and the IESG during Last Call on
RFC 5336.  We could take it out of here, possibly by reopening
the Framework document and inserting additional "other
consequences for email and the Internet" sections there.  FWIW,
my personal opinion is that doing so would not add
significantly to either document comprehensibility or protocol
quality, but that is a matter for WG and community consensus.


Dave Crocker 2010-12-22 wrote:

> * Beyond <encoded-word> -- The work, here, appears to have
> two goals. One is to add support for Unicode in a
> <local-part>; that is, support for internationalized
> addresses.  However note that <encoded-word> and <A-label>
> already accomplish this in a way that is transparent to the
> existing email infrastructure; only the end-systems need to
> understand it. 

When the discussions that led to the establishment of the EAI
WG started, the goal was seen as expanding the syntax of
<local-part>, as you suggest.  It turns out that doing that is
not really feasible, at least without leading to horrible
kludges, so the WG was chartered (now four or five years ago)
to extend header fields as well.  As mentioned elsewhere, the
IETF has had repeated opportunities to review those decisions
and has formally approved them.  Even mapping back and forth
between A-labels and U-labels is, in the real world, not as
transparent as one would like (or obtain if one knew that
everything that looks like a domain name is a domain name in
the public DNS -- see draft-iab-idn-encoding or review the
slides from the Hiroshima technical plenary for more details on
that issue).  

Curiously, if the WG had not been constrained by very old (and
necessary) rules about how the email environment works, its job
might have been much more simple and more in line with at least
one extrapolation from the comment above: if we had
standardized local-part formats sufficiently to avoid having to
prohibit interpreting or tampering with those fields enroute,
the WG could, in principle, have written extremely short
specifications allowing encoded-words (possibly restricted to
encoded UTF-8) in local-parts and essentially been done.  While
I doubt that solution would have been acceptable to many
implementers and user constituencies (in part because "only the
end-systems need to understand it" has never worked as well as
we often hope it does), it would have been much easier in terms
of specification simplicity.

In case it is useful for understanding the WG's choices in this
area, the WG in principle had a choice between adopting a
minimal change --relying as much as possible on trick encodings
and dependencies on endpoint handling as possible -- and moving
in the direction of relatively comprehensive use of
natively-encoded (see below) Unicode in email with some
transition provisions that would gradually atrophy.  In
practice, there was no choice at all: internal discussions made
it clear that a standard based on the former approach would
simply be ignored and localized or proprietary solutions
adopted and only one based on the latter had an reasonable
chance of implementation and global deployment.  Extrapolation
from the independent-of-EAI work reflected in
draft-iab-idn-encoding -- work that originated in the concerns
of a couple of operating systems that dominate the network
today-- also suggests that more dependency on exotic,
IETF-specific encodings is not a direction we should be going
in unless it is absolutely necessary.

> The second goal is to support Unicode in the
> more "native" form of UTF-8. (The quotation marks are
> because UTF-8 is not native Unicode, either; it is a highly
> encoded form of Unicode...)

As others have commented on the mailing list, the above and
some of the comments that followed in the review demonstrate a
fairly profound misunderstanding of Unicode and Unicode
terminology.  The only encoding form more "native" than UTF-8
(or UTF-16 or UTF-32) is a set of abstract integers that have
no intrinsic computer encoding form.  The IETF has established
a number of "non-native" encoding forms, including UTF-7 and
the Punycode-derived ACE form, but they are avoided where
possible in these specifications.

See the comments under #7 below for an additional aspect of the
ASCII-versus-Unicode distinctions.



(2) Explicitness versus discovery in MIME and related topics,
including a possible parameter on the MAIL command.

Claudio Allocchio 2010-12-22 wrote: 

> Another big issue is about blending the boundaries of
> servers in the transport system (MTAs), clients (UAs) and
> their roles, when the document specify that servers shall
> use guessing techniques from looking into bodyparts to
> understand if what they're transporting is internationalized
> according to this specification or not.  This is IMHO a
> basic violation of the role distiction in the legacy email
> system, and affects the MIME model al well.   

Yes, except that this "blending" already occurs with MIME and
other SMTP extensions.  Over the years, we have made some
decisions to be very explicit about parameters and
pre-identification of content and other decisions to force
systems to inspect or parse content because we did not believe
that such parameters could be trusted (or that any sensible
implementer would trust them).  I hope we have gotten most of
those historical decisions right although I have some doubts,
at least in retrospect, about some of them.

Given that history, I don't think it is appropriate to assume
that the EAI effort is leading us into a role distinction sin
or not because we are already there.  The question is what the
right choice is for this particular situation.  The WG
considered that choice at great length (albeit in the 2006-2008
period rather than more recently, but that part of the
specification has not changed) and came to the conclusion
that forcing inspection was the better plan.  Testing with the
Experimental versions of the protocols did not expose serious
problems with that choice. 

I am sure that the WG would be happy to reconsider this topic
is appropriate, but some solid evidence that there is a problem
and that the current decision is wrong would be far more
helpful in that regard than assertions based on a philosophy of
email that IETF protocols have already violated.  That
assertion, and some evidence, have been made for a MAIL command
parameter:

Claudio Allocchio 2010-12-22 wrote: 

> 3.5 Body Parts and SMTP Extensions

> I have problems in understanding why an SMTP server should
> peek extensively into the message geing transferred to
> understand that it is handling an interationalzed message.
> "There is no ESMTP parameter to assert that a message is an
> internationalized message." in my opinion should be fixed in
> another way, by ADDING an appropriate ESMTP parameter to
>...

Dave Crocker 2010-12-22 wrote:

> 4.  The SMTP extension needs to have the client use an
> explicit signal when it is sending a message encoded in
> UTF-8. This is easiest as a parameter to the MAIL command.
> The current specification creates a more complex and almost
> heuristic model for distinguishing ASCII from UTF-8 use.

Ned Freed 2010-12-22 wrote:

> (0) I cannot emphasize strongly enough how important it is
> to have a UTF8SMTP sort of parameter on the MAIL FROM to
> indicate that a message employing this extension is being
> submitted/relayed/delivered.  
>...

Ned Freed 2010-12-23 wrote (small excerpt):

> Again, headers don't have the proper semantics. A MAIL FROM
> parameter, OTOH, does.

The working group did consider the tradeoffs between (i)
forcing a server (really relay) to do a complete MIME parse if
it received an extended message and had to decide what to do if
the next hop did not support the extensions and (ii) putting in
a parameter whose presence or absence might not be trustworthy
given a history of, e.g., text/plain body parts being sent with
distinctly non-ASCII content but without charset parameters,
misuse of "name extensions" in lieu of adherence to
Content-Type, etc.  The decision of the WG was reflected in RFC
5336 (after IETF Last Call, etc.) and I note that none of the
known implementers of the RFC 5336 spec are objecting to that
decision now.  The consensus of the reviewers now seems to be
that the WG got it wrong.  The WG needs to review this again;
the change is certainly easy to make should it be appropriate.



(3) Fundamentals of handling syntax changes from other
specifications (especially 5321/5322):

Claudio Allocchio 2010-12-22 wrote: 

> The document quite often repeats rules from other
> specifications, AND re-describes these rules. This is
> dangerous (it makes very difficult to keep aligned
> documents, and will lead to different interpretations) and
> not needed.

The tradeoff, I hope obviously, is with the reader having to
flip back and forth repeatedly between documents to have any
hope of understanding the specification accurately.  I'm
personally sympathetic to your concern, but many WG
participants disagree. 

Specific suggestions that could be used as a model would be
helpful to the WG and the editors.


Claudio Allocchio 2010-12-22 wrote: 

> 3.6.1 "the server normally sends..." --"the server sends"
> (sorry... we do not like ABnormal servers!) :-)

This is another example of a narrative/ informative comment.
We know about exceptions.   Perhaps the sentence would be
better stated as "an RFC 5321-conforming server sends...".



(4) Gateways and related boundary systems

Claudio Allocchio 2010-12-22 wrote: 
 
> This specification updates a number of SMTP and Headers
> fields from RFC5321 and RFC5322. However these fields are
> also used in Gatewaying to other email systems, which are
> still existing. Section 3.7 in RFC5321 describes specific
> cases. However this memo provides no clue on possible effect
> that internationalized fields can have in case gatewaying
> occurs.  Gatewaying functions strongly rely on these values,
> and the introduction of these modification should describe
> this scenario, too. I suggest to include a specific section
> about these issues, and suggest at least a number of
> possible actions which Gateways can take to survive this
> change.

I'm not sure I understand why Gateways are different from
anything else.  If a gateway doesn't advertise the extension,
then it should never see a message that contains either
non-ASCII envelope addresses or non-ASCII content in the
headers (other than as encoded-words, of course).   The WG does
have some placeholder benchmarks for advice of various sorts;
IMO it would be quite useful to supply some of that advice for
gateways once sufficient operational experience had
accumulated.

But the only really hard gateway problem I see is if a given
gateway MTA interfaces to two back-end systems, one of which is
ASCII-only and one of which is broadly UTF-8 capable.  It will
either need to be reconfigured into two virtual gateways, one
of which advertises the extensions and one of which does not,
or it will simply have to reject mail with UTF-8 header fields
that is addressed to a system that can't handle it.  I note,
however, that such a gateway already has a problem that it
presumably has some mechanism to deal with: if it receives mail
with UTF-8 (or other non-ASCII) headers from one of its
i18n-capable systems with an Internet destination, it already
has to reject the message or perform some sort of conversions
that are, necessarily, not described in IETF specifications.


(5) The legacy of the Experimental protocols

Claudio Allocchio 2010-12-22 wrote: 

> The EAI WG charter states that in handling email message we
> shall be "consistent with the long-term view that email with
> invalid addresses or syntax should be rejected, rather than
> fixed up in transit between submission servers and delivery
> servers.". However a number of sections in the specification
> seem to contradict this, where they suggest actions, by
> exammining message elements (in the SMTP and in the Headers
> and in the Bodyparts) in order to try to deliver anyhow.
> This is a subtle line between maximising the service chance
> to deliver, and "fixing" things during the transaction, and
> it can encourage implementors to do many of these acions,
> instead of stiching to the idea "not supported --non
> delivery".

The experimental protocols described in RFC 5336 and elsewhere
included a rather complex mechanism for in-transit downgrading.
I'd encourage you to go look it up if you are interested, but
the general design was based on the 8BITMIME model -- a sender
discovering that the next-hop server wouldn't accept the option
could either reject the message or engage in a downgrading
process that was distinctly not lightweight.  The conclusion
from implementations and experiments was that it was too
heavyweight, too hard to get right, and that the number of
cases in which the downgrade procedure could not be applied
would be very confusing to users.  All vestiges of that
machinery should have been removed from the current versions of
the documents.  But I found some in an earlier review and it
may well be that some of the other review comments have
identified others.  If so, they clearly need to come out; the
editors may need help in identifying specific examples.


(6) Server actions when UTF8SMTP[bis] is not supported (and
Section 3.6.2).

Claudio Allocchio 2010-12-22 wrote: 

> The specification does NOT suggest a preferred order among
> the 3 possible options when the SMTP server state it does
> not support UTF8SMTPbis. Having 3 "MAY" options at the same
> level can create a very uncertain email service behaviour,
> and a very bad user's experience! This is consistent with
> what happens in other cases for other "options" in the email
> system (like DSNs and MDNs), but there should be at least an
> informative document describing the various scenarios, and
> giving some guidance to implementors in order to avoid
> totally incosistent behaviours.

Small bit of calibration about this: remember that the EAI
specs cannot do anything to dictate the behavior of an SMTP
server that does not support the EAI extensions and may have
never heard of them.  There is also a general design
assumptions, shared with other [E]SMTP assumption, that the
specification should not make any changes to general email
behavior that are not necessary consequences of what the
extension is trying to accomplish.

On rereading the document after going through these comments,
it is clear to me that the document would be improved by being
much more clear about what text is a change with EAI and
normative, what is a necessary consequence of provisions in RFC
5321 or other applicable standards, and what is just
informative narrative.  I don't know whether the WG shares that
point of view, nor whether it is desirable to try to rewrite
the document along those lines at this stage (in general, these
are not "known technical defects" so, if we could be assured
that Draft Standard versions of the documents would appear on a
timely basis, it might make sense to let them go for the
present).

These three options are simple extrapolations of rules that
already exist in RFC 5321: SMTP-senders must not send invalid
messages into the public Internet and must reject or bounce
such messages if they cannot be fixed up; only Submission
servers (and gateways) are permitted to make significant
changes to fix things up; and, implicitly, if mail is not
deliverable along the normal path and one knows a legitimate
path that would work, one is permitted to use it.  

Note that the first (MSA) option is useful for non-ASCII
address local-parts iff the MSA has an external source of
information that permits substituting ASCII-only addresses for
non-ASCII ones.  For address domain-parts and miscellaneous
header fields, there are typically encoded-word (or A-label)
options available.  Perhaps the document should be more
explicit about that.

If that flexibility "just in practice break[s] the concept of
universal email common service, creating islands which do not
talk to each other", then we would already have seen that
problem.  For example, the specification of 8BITMIME can lead
to the same situation without EAI: if a relay receives a
message with C-T-E 8bit or binary, and the first-preference
reachable MX doesn't support that option, it can downgrade the
message, reject it, or, IIR, try other servers at the same
preference level to see if one of them will provide support for
the extension.

Ned Freed 20101222 wrote:

>> 1.  If and only if the SMTP client (sender) is a Message
>> Submission Server ("MSA") [RFC4409], it MAY, consistent
>> with the general provisions for changes by such servers,
>> rewrite the envelope, headers, or message material to
>> make them entirely ASCII and consistent with the
>> provisions of RFC 5321 [RFC5321] and RFC 5322 [RFC5322].

> This is a bad idea no matter how you slice it. Such
> transformations depend on the availability of address
> mapping (from a UTF-8 address to an equivalent ASCII
> address) information; an agent being an MSA is neither
> necessary nor sufficient reason to have this information.

However, RFC 5321 already requires that an MSA not inject a
message into the public Internet until/unless it is valid.  And
it (and RFC 4409) effectively give the MSA permission to do
anything an MUA can do in order to accomplish that.  There is
no (intentional at least) assertion in 5336bis that the MSA (or
the MUA, for that matter) must have access to address mapping
information, just no requirement that it not use that
information if it has access to it.  If it doesn't have the
needed information, then there are clearly a lot of things it
cannot do; the quoted statement above is then just a
reiteration of the 5321 prohibition on passing trash into the
network (whether it is worth reiterating is a separate matter).

Of course, if a message is transferred from an MUA to the MSA
that contains non-ASCII header information in UTF-8 form but no
non-ASCII addresses, the above provisions don't imply the MSA
doing anything more than mapping that header information into
encoded words.  Again, there is no requirement that the MSA do
that mapping; the statement simply reiterates, in an i18n
context, the general permission for the MSA to take something
that works in a user->MUA->MSA environment and fix it to
conform to public Internet specifications.

It is worth noting that, during WG discussions, several strong
claims were made about intentions to implement both UTF-8 to
encoded word mappings and tables to support precisely the type
of address mapping suggested above in selected MSAs.  Assuming
people were telling the truth (and I certainly assume they
were), the conditions specified above, including address
mapping, are going to be exercised.

> But the bigger problem by far is defining what
> transformations are permissible. The present framework
> document is basically silent on what's allowed, so I for one
> have no idea what it means to be "consistent with the
> general provisions for changes by such servers" actually
> means. And unless you're willing to spell this out, I see
> this as practically guarnteeing a slew of interop problems.

With the understanding that my comments above go much further
than the existing text, that the WG hasn't agreed to any
expansion, and that Dave seems to be arguing for less text that
is just repetition of existing rules in 5321/5322, how much
further would you suggest going in spelling things out?


(7) Suboptimal use of ABNF and ABNF constructions

Dave Crocker 2010-12-22 wrote:

> * Rule Meta-Naming -- The 'u' preface for revised rules is a
> reasonable idea, but appears to be problematic.  These rules
> replace existing rules in other specifications. There needs
> to be an explicit and decision about the handling of this,
> and it needs to be applied consistently, either directly
> replacing the rules or else re-naming them consistently,
>...
> { On reflection, I think the "u" rule-naming convention is
> problematic.  If the current specification is replacing a
> rule previously defined elsewhere, it needs to use the same
> rulename.  This is simpler and clearer.

(and several comments by others, noting that Dave's second
recommendation appears to contradict Claudio's)

As I commented at more length in a note to Dave, Claudio, and the
EAI list on 2010-12-24 14:20 -0500, we've had a lot of trouble
getting ABNF for these documents right, much less optimal.  A
significant part of the optimality problem is that that there
seems to be no established good way to update one specification
by extending its syntax -- a problem that the IETF community
will almost certainly have to deal with again as we
internationalize more protocols.

In the particular case of ASCII and UTF-8 constructions, even
though ASCII is a compatible proper subset of UTF-8, there
is often a need to distinguish the two because plain-ASCII
works in legacy protocols (and does not require special
normalization or comparison rules that might, themselves, be
controversial) while UTF-8 does not work in the legacy cases
and raises more complex normalization, comparison, and
collation issues.   FWIW, we had the same problem in the
IDNA2008 specs because it was necessary to distinguish between
a traditional DNS "preferred syntax" name (aka "LDH") that did
not start with an ACE prefix and one that did use that prefixed
form, in spite of the fact that the latter, by design, would be
treated as a traditional "LDH" name by any application that was
not aware of the special IDNA constructions.

For the particular case of EAI, the WG decided to go with the
"u" prefix on new rules that superseded 5321 or 5322 rules.
The WG tried to be consistent with that prefix on new rules,
and preserve most of the original rules.  All rules were
extended rather than overwritten.  The WG had discussed,
somewhat earlier, the possibility of simply coping the entire
ABNF collection from RFC5321/5322 and then extending or
modifying rules as needed.  That would have eliminated the need
to reference back to those documents for ABNF but the idea was
rejected, largely to avoid exactly the problem with updating
that several reviewers commented on.

Again, this is a general problem: if the community has a clear
preference as to how it should be dealt with, I'm nearly sure
that the EAI WG will conform.  In the absence of a general
solution, my personal guess is that the "u" convention (applied
accurately and consistently) may turn out to be the best we can
do.


(8) MIME restrictions and nested encodings

Dave Crocker 2010-12-22 wrote:

> 5.  The SMTP extension needs to remove all restrictions it
> imposes on MIME content-type.  A major reason that MIME was
> successful was that it was transparent to the transfer
> infrastructure.  The current SMTP extension specification
> changes this model, which actually increases the barrier to
> adoption of Unicode in email.  It needs to be easy for two
> MUAs that support Unicode in the email header to exchange
> mail even when the infrastructure does not support UTF-8.

Right.  Unfortunately, there is already a restriction in MIME
that links Content-Type to Content-Transfer-Encoding, the "no
nested encodings" rule.  That rule is imposed primarily as a
restriction on message/ subtypes.  And it works because MIME
assumed that all header information was going to be ASCII and
hence able to be transmitted in C-T-E: 7bit.

So now we have a little problem: MIME is designed to permit
forwarding messages in encapsulated form using message/rfc822.
But message/rfc822 uses ASCII headers only and has to obey the
"no nested encodings" rule.  One would like to be able to
encapsulate an EAI-conformant message and even forward it into
a legacy environment.  There are actually a number of options
for doing that, including deciding that a message with
non-ASCII headers cannot be forwarded into a 5321/5322
environment (pretty obviously a bad idea to prohibit that),
creating a new top-level MIME type with different rules (the
transition issues and implications for legacy MUAs are pretty
astonishing), or creating a new "message/" subtype with
different rules and what may be seen as an additional
Content-Type/ Content-Transfer-Encoding entanglement.  After
considerable discussion, the WG selected the latter (see #13
below) and got IETF signoff in RFC 5336.


(9) Strong conformance language

Claudio Allocchio 2010-12-22 wrote: 

> 3.1 Framework for the Internationalization Extension

> - bullet point 3. I do not see any need to state that
> clients MUST reject...  and MUST be fully compliant... etc.
> This is implicit in any specification.  If there is a
> specific need to add this detailed behaviour (to prevent
>...

Our general practice with email specifications has been to
specify conforming behavior and simply leave non-conforming
behavior "outside the scope of the specification".  That
doesn't work in this case because, according to the WG's
analysis, a partial implementation --whether outside the spec
or otherwise-- would be bad news for interoperability.  So it
seemed necessary to be very clear about what clients were
expected to do and what servers could reasonably expect.
Whether that was better done with this text, with an
"Implementer's Note" (which has no normative status), or with
an extended explanation of the problems and a threat or two
would seem to me to be largely matters of editorial taste,
especially for a Proposed Standard.


(10) Security-related issues

Claudio Allocchio 2010-12-22 wrote: 

> 5.  Security Considerations

> there is one more issue: as internationalized items will
> also result into logs and traces, their presence may affects
> trouble ticket systems used by security operations teams
> (CERTs/CSIRTs). And they also can affect quick handling of
> incidents, has it may require more time to correctly read
>....

Agreed.   I assume the editors will consider your proposed text.


(11) Editorial issues with significant substantive
implications.

Dave Crocker 2010-12-22 wrote:

> { Note - the term "mailbox names" is not defined here or in
> RFC 5321.  In RFC 5321 it appears to be used to mean
> local-part; however because its precise meaning is unclear,
> I strongly urge NOT using it here at all. Instead I suggest
> using whatever ABNF rulename is appropriate.  This
> guarantees clarity.

Well, while getting rid of "mailbox name" or defining it very
precisely, would probably be an improvement, there is no ANBF
rulename that would really be appropriate, at least without
additional decoration.  As you and others have pointed out
recently, rules for <mailbox> appear in both 5321 (where it
refers to the mailbox name (sic) itself, e.g.,
local-part@domain-part) and in 5322 (where it refers to
something that might include a display name and assorted
spacing).   I have no doubt that this document should be as
clear and precise as possible, but its authors should not be
blamed for trying to work around the errors of the authors of
5321, 5322, and their predecessors. 

Dave Crocker 2010-12-22 wrote:

>> The value of "uDomain" SHOULD be verified by IDNA
>> definitions [RFC5890].  If

> { I do not understand what "SHOULD be verified by IDNA
> definitions" means.

> If it means that the uDomain "SHOULDbe verified" then it is
> out of scope for this specification. There is no reason that
> SMTP rules concerning verification of Unicode-based domains
> needs to be different from ASCII-based domains.  In
> particular, verification of a Domain name is specified
> elsewhere.
>...

Editorial issues aside, 5321 contains some very
carefully-written language whose intent is to require that
domain-parts be validated lexically but to avoid requiring
that they be looked up in the DNS until/unless that is
necessary for some other reason.  The "lookup" side of
IDNA2008 (in RFC 5392) makes more or less the same circle
around this issue but with the understanding that lexical
validation of U-labels is a somewhat more complex process than
checking against a short list of characters, positions, and
lengths.   I believe that is what the author was trying to
capture.  Certainly it could have been expressed differently.

Dave Crocker 2010-12-22 wrote:

>> Otherwise, surprising rejections
>> can happen during temporary failures, which users might
>> perceive as a serious 

> { Probably not just temporary failures.  Having services
> that are meant to be redundant with each other actually
> provide different semantic behavior is just plain dangerous.
> It is essentially guaranteed that they will cause problems.}

Sure.  But you argued extensively elsewhere for not duplicating
text that appears elsewhere and, IMO, more important, for not
making statements that go beyond a rather narrow reading of the
requirements for internationalized email.  

Ned Freed 20101222 wrote:

> So you can use alternate MXes, but not alternate A records?
> I actually agree with this - an alternate A is presumably
> another address for the same host, so no point in trying it.
> But this needs to be spelled out.
 
> At an absolute minimum this needs to be accompanied by some
> security considerations text. The issue, I hope, is obvious:
> There are sites aplenty with large numbers of MX records and
> which won't support this extension any time soon. And all
> too often those MXes are *slow* to respond. So sending a
> message that causes them all to be tried is ... you get the
> idea.
 
> In fact I'm tempted to go one further and say that if an MX
> at a given priority is found not to support this extension,
> you SHOULD assume that MXes at lower priority don't either.

And this is where we descend into a situation that extends far
beyond EAI.  The problem, as Ned's note almost suggests, is
actually quite general.  For example, if we had

   foo.example.com. MX 0  foomail0.example.com.
   foo.example.com. MX 10 foomail1.example.com.
   foo.example.com. MX 10 foomail2.example.com.

and "foomail1" supported 8BITMIME with downgrading and
"foomail2" did not, a system sending a message containing
C-T-E: 8bit that reached foomail1 would have the message go
through with downgrading if needed but would get a rejection or
option not supported condition if it happened to get to
foomail2.  Or it might get lucky and hit foomail0 and find out
what it actually supported.  Similar situations could apply
--at least wrt rejection versus bounces-- if foomail1 supported
a different set of antispam provisions than foomail2, or, for
that matter, if both deferred antispam processing to foomail0.

As far as I can tell after a very brief scan, there is no
current normative IETF specification that discusses this
situation and gives guidance about it.  In retrospect, we
probably should have at least described the issues with
non-identity of services in RFC 1425, but we didn't do that,
much less make specific recommendations.  

I agree with Ned that it would make a lot of sense to
explicitly say "no alternate A records because, if they don't
represent the same host, you should be using MXs" and "if the
first MX you try doesn't support the extension, assume the
others don't either and stop".  But, because of the generality
of the problem, this document doesn't seem to me to be the
right place to say that.  5321bis probably would be a
reasonable if we think it is a clarification, but especially
with YAM in a coma and no mention of this topic in the
pre-approval document, the odds of my getting to 5321bis before
this EAI work is closed out are in the "low to none" range.
Worse, while someone could quickly cons up an I-D to specify
this behavior and try to push it through as a Proposed Standard
that 5336bis could reference, processing of such a document
would essentially constitute a claim that the spec is new and
not a clarification to 5321, precluding folding it into 5321bis
later, at least without marching that document along the
standards track to Draft Standard first.  Suggestions about how
to get out of that mess are welcome, but I don't think it is an
EAI problem (even though 5336bis might plausibly be modified
--or the Framework document reopened and modified -- to explain
just how ambiguous the whole situation is).

For RFC 5336, it was important to call the issue out because
the "downgrade or not" behavior is even more complicated than
the potential behavior variations with 8BITMIME (remembering
that one can offer the option but, if the next hop doesn't
support 8bit transport, one can either downgrade or reject) and
other extensions..  Whether it is necessary to say anything at
all in 5336bis, given how little we have said in places that
perhaps establish precedents, is debatable.  But please let's
not suggest expanding on this case in this document to cover
other email environments, especially in the context of urging
the the spec keep its scope and assertions narrow.


Dave Crocker 2010-12-22 wrote:

>> 3.6.4.2.  VRFY and EXPN Commands and the UTF-8REPLY
>>Parameter 

>> If the VRFY and EXPN commands are transmitted with the
>> optional parameter "UTF-8REPLY", it indicates the client
>> can accept UTF-8 strings in replies to those commands.
>> This allows the server to use UTF-8 strings in mailbox

> { Is this extension trying to to say that a client might
> support UTF8SMTPbis but not be able to access UTF-8
> replies???  Under what circumstance is this reasonable? }
>...

Not what it is intending to say (could have been expressed
better).  Consider the following scenario: Client sends EHLO,
server offers UTF8SMTPbis.  Client sends VRFY as its next
command, using an ASCII-only argument and no parameter.  The
server has no clue as to whether or not the client is
EAI-compliant or otherwise has any idea what to with a
reply containing a non-ASCII address.  For the server to return
a non-ASCII string under those circumstances would be in
violation of 5321.



(12) The keyword denoted as "UTF8SMTPbis" in the drafts.

While the changes between the Experimental protocols and the
present draft are not large, they are also not trivial.  After
fairly extensive discussion, the WG concluded that these new
specifications should be denoted by a different SMTP Extension
keyword than the "UTF8SMTP" that was used by the Experimental
version (RFC 5336).  In large part because of concerns that
some implementation would be deployed in advance of the
specification becoming final, the WG concluded that the new/
final keyword should be assigned only as part of the IESG
approval process.  "UTF8SMTPbis" is only a placeholder and will
not be that final keyword.  It may be worth noting that, if the
protocol is changed to require a MAIL command parameter, that
WG decision will have turned out to be quite important.

Anyone who strongly believes that some variation on the
"UTF8SMTP" theme misrepresents what is actually happening in
these specifications is welcome to suggest other keywords to
the IESG.


(13) New content types for embedded messages.

Dave Crocker 2010-12-22 wrote:

> 3.  There probably needs to be definition of MIME
> message/uni-rfc822, specifying Unicode support within a
> message contained in MIME.  I'm less clear whether there
> needs to be a MIME Content-Transfer-Encoding form specific
> to UTF-8.

See message/global in draft-ietf-eai-rfc5335bis.  This is an
area where the WG's decisions about what material belonged in
which document followed the advice to keep header and body part
discussions in 5335bis rather than 5336bis.


(14) Procedural and quasi-procedural issues

Dave Crocker 2010-12-22 wrote:

> 1.  The Framework document is normative and needs to be
> completed along with the other two specifications, since
> they quite reasonably state a normative dependence on the
> Framework document.
>...
> * Framework -- The Framework document[1] is (correctly)
> referenced as required reading.  It supplies essential
> terminology and architecture for this specification.  In
> fact it is a specification, complete with formally normative
> vocabulary.  This means that it must be a normative
> reference by rfc5336bis.  It therefore also means that the
> Framework document needs to be completed before the current
> specification can be standardized.

Note that the Framework document went through IETF Last Call
and was approved by the IESG last September for publication as
Informational.  The WG was agnostic on whether it should be
Informational or Standards Track.   Since it is still in the
RFC Editor queue awaiting references to the documents now under
review, I assume that, if there were sufficient consensus that
it is normative and should be on the Standards Track, the IESG
could be persuaded to revisit that decision.  In addition,
while I don't believe that anything in these reviews requires
doing so, the document could, in principle, be recalled by the
IESG and material changed or altered.

Dave Crocker 2010-12-22 wrote:

> * Infrastructure Requirement -- Unless this option is in
> force, carrying internationalized email in a MIME part is
> prohibited.  This is out of scope for the working group and
> it is a counter-productive rule.  Imagine if the same type
>...

There may well be an editorial problem with how this is written
up or even with how "internationalized mail" is defined.  But I
see nothing that supports the apparent hyperbole in the
comments above.  Remember that 5322 and the base MIME specs
prohibit non-ASCII headers, so messages containing such headers
(as the message header or MIME headers) are prohibited by those
specs without some new extension(s).   There is no
(intentional) prohibition in these documents against carrying
non-ASCII text in mail messages.  And, if one wanted to
pseudo-encapsulate a message and its headers by treating it as,
e.g., text/plain with charset=utf8, rather than as
message/rfc822 (or message/global), these specifications don't
prohibit that either (might not be a good idea because of the
loss of recognition and handling of boundary markers and internal
Content-Types and Content-Transfer-Encoddings, but that is not
a problem for this specification).

So, unless the comment above is just the result of confusion
brought on by an editorial problem, I don't see where you are
going with it.


(15) Future of rfc5336bis.

Ned Freed, 20101222 wrote: 

> One final comment. Several of the issues that have been raised
> are so fundamental that once they are addressed (point (0)
> especially), this document is almost certainly going to
> require another full review cycle.

Sadly, I agree.  I even believe that, were none of these
fundamental issues present, the number of editorial corrections
(including ABNF matters) that are required will be sufficiently
extensive that the risk of introducing new problems would
require such a review cycle.

     john