Re: [EAI] Random thought #1 - UTF8SMTP handshaking

Shawn Steele <Shawn.Steele@microsoft.com> Thu, 08 July 2010 18:00 UTC

Return-Path: <Shawn.Steele@microsoft.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id F21433A6857 for <ima@core3.amsl.com>; Thu, 8 Jul 2010 11:00:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.256
X-Spam-Level:
X-Spam-Status: No, score=-10.256 tagged_above=-999 required=5 tests=[AWL=0.343, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zy0XyqUu4ejx for <ima@core3.amsl.com>; Thu, 8 Jul 2010 11:00:50 -0700 (PDT)
Received: from smtp.microsoft.com (mail3.microsoft.com [131.107.115.214]) by core3.amsl.com (Postfix) with ESMTP id 72D893A6B39 for <ima@ietf.org>; Thu, 8 Jul 2010 11:00:50 -0700 (PDT)
Received: from TK5EX14MLTC102.redmond.corp.microsoft.com (157.54.79.180) by TK5-EXGWY-E803.partners.extranet.microsoft.com (10.251.56.169) with Microsoft SMTP Server (TLS) id 8.2.176.0; Thu, 8 Jul 2010 11:00:54 -0700
Received: from TK5EX14MBXC141.redmond.corp.microsoft.com ([169.254.9.215]) by TK5EX14MLTC102.redmond.corp.microsoft.com ([157.54.79.180]) with mapi id 14.01.0160.007; Thu, 8 Jul 2010 11:00:53 -0700
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: John C Klensin <klensin@jck.com>, Jiankang YAO <yaojk@cnnic.cn>, "ima@ietf.org" <ima@ietf.org>
Thread-Topic: [EAI] Random thought #1 - UTF8SMTP handshaking
Thread-Index: AQHLHsa4bDXoz4hGT/WWB0TrTt0mlpKnUREg
Date: Thu, 08 Jul 2010 18:00:53 +0000
Message-ID: <E14011F8737B524BB564B05FF748464A0DA50D5A@TK5EX14MBXC141.redmond.corp.microsoft.com>
References: <A45808855E72C5C914527D40@PST.JCK.COM>
In-Reply-To: <A45808855E72C5C914527D40@PST.JCK.COM>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Subject: Re: [EAI] Random thought #1 - UTF8SMTP handshaking
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jul 2010 18:00:52 -0000

For the record, I can live with the draft with the way it is, it sounds like too much work to do anything else.  I'm still concerned that we effectively have partial handshaking, but I don't see any reason why it can't work as drafted.

-Shawn


-----Original Message-----
From: John C Klensin [mailto:klensin@jck.com] 
Sent: Thursday, July 08, 2010 10:55 AM
To: Shawn Steele; Jiankang YAO; ima@ietf.org
Subject: RE: [EAI] Random thought #1 - UTF8SMTP handshaking



--On Thursday, July 08, 2010 16:38 +0000 Shawn Steele <Shawn.Steele@microsoft.com> wrote:

>...
>> There is no ISO8859SMTP extension.  If someone wanted to make one up, 
>> they would be obligated to resolve the problem you are concerned 
>> about.
> 
> Can you guarantee there's no private XISO8859SMTP extension?

Of course not.  I can't even guarantee that there is no private BAUDOT or packed ASCII extension, nor an extension that specifies that Content-types are to be interpreted according to rules different from those in the IANA registry, nor an extension that specifies that all DNS name components of email addresses are to be interpreted with the root to the left.  

The right model for dealing with _any_ private extension is that it is up to those who are using it to sort out interoperability with whatever is standardized.  One might reasonably discuss an exception if an extension were demonstrated to be widely deployed and important, but I haven't seen anyone come forward and say "we have an ISO8859SMTP extension, it has been shipping in products for a decade, and nine million people are depending on it", only your wild speculations about some extension that might exist out there somewhere.  Even if such an extension did exist, the "one encoding on the wire" argument would constitute a significant tradeoff factor against dealing with it, but we'd at least have a basis for a meaningful discussion.

What I can guarantee is that trying to design an extension to be bulletproof against any bad idea that someone might have devised privately would lead to madness as well as to huge delays as we try to sort through all of the alternatives, attacks, and real
or imaginary edge cases.   And I can suggest that adopting the
"someone might have done this, therefore we have to support it forever" mentality is a large part of what got HTML --and the URI and IRI specs-- into trouble.

>> if one enables bad practices in the name of more interoperability, 
>> what one mostly gets is more bad practices...
> 
> I'm not suggesting this to "enable bad practices", I'm suggesting it 
> to ensure good practices.  Right now the draft has half-way 
> handshaking.  The server says "I'm EAI", and ASSUMES that the client 
> may be EAI if it sends data with the high bit set.  Previously the 
> server should've rejected that data, but now it'll start allowing it 
> because it's making that assumption, not because it actually "knows" 
> that the client supports EAI.

See procedural note at end.

> To be very clear about what you're alluding, I've been told that 
> Exchange is very strict about what it does and does not allow, so 
> apparently this isn't something Exchange needs for compatibility.  
> I've tested that myself with the FROM: and RCPT:, which fail.  I've 
> also had requests from a Korean customer wanting to know how to get 
> local code page email addresses to interoperate outside of their 
> intranet.

I'd suggest that the correct answer is "things leak, you really want to convert to and be using UTF-8 addresses even in the intranet, and you should not be trying this".  Think about the design of an MUA that picks up an address from where it is embedded in a text file.  If the standard is "UTF-8 and UTF-8 only on the Internet" then the conversion is clear.  If it is "well, you can use UTF-8, or your local code page, or maybe 2022=based table switching", then you have gotten into the worst parts of the "alternate address" theme because you don't know which one to try, which order to try them in, etc.  And, worse, with multiple alternatives, the odds of getting things configured correctly so that a submission server could assume that all relays in the chain would support the same options would go down considerably.

> Obviously they weren't using Exchange SMTP, since we don't support 
> that, even on the intranet.  (I don't know what they were using).
 
> In any event, I don't think it's interesting enough to make a huge 
> deal out of, however it did seem, to me, that we could have a chance 
> to confirm that the client was indeed intending EAI behavior, for all 
> commands, without assumptions and without needing a bunch of UTF8REPLY 
> tokens.  You've been around mail longer than me :), so if you think 
> the assumption is good enough for interoperability with the huge 
> variety of clients out there, then you're likely correct.
> 
> Why is it evil for the server to know what the client supports?

It isn't.  And, if the above is your main point, then let's not waste time and get distracted by arguments about, e.g., private ISO8859SMTP extensions or non-conforming DATA content.

I've got two sets of reasons to be hesitant about your proposed command, one about protocol design (and, if you will, religion) and the other about procedures.

_Religion_:  I'm extremely reluctant to see another state introduced into SMTP.  It isn't that we couldn't, but it would be hard to get right, it would turn developing the text for 5336bis into a task that would require deep understanding of
5321 and the design decisions that have accumulated since 821, it would require that we sort out the order in which commands could occur, and it would require that we spend a lot of time ensuring that such a new state didn't have bad interactions with
extensions already developed and on the standards track.   We'd
have to analyze the implications of additional turnarounds, specify whether RSET cleared the "UTF8" state, and review
pipelining and figure out whether we needed to update it.   None
of that is insurmountable, but it wouldn't be quick and I don't think the case has been made yet that it would be worth the trouble.

_Procedures_:   If we are going to introduce this sort of
facility and remain consistent with the WG's roots, we should specify this facility, publish an Experimental spec, develop implementations, and test them against each other and the installed base to be sure that there are no ill effects.
Perhaps we could bypass those steps, but it would be risky.  My experience with the IETF suggests that even doing the level of examination needed to ensure consistency with 5321 and the Pipelining extension if a new state were added would stretch things out by at least six months and that going through the Experimental spec and testing arrangements would stretch that out to 12 - 18 months.  If I adjust for the level of activity in this WG (count the number of comments about anything but editorial issues in 4952bis since it was posted), you can probably double those numbers. 

There is also the small issue of deployment speed: as with 8BITMIME, the current model requires that a server advertise the extension (a few lines of code) and disable some tests.
Implementing a new command, advertising its availability, and maintaining extra state information turns implementation of email i18n back into a big deal.  To the extent to which we are depending on rapid deployment (at least within language
communities) as part of our defense against needing much more complex transition models, making implementation unnecessarily harder is not in our interest.

If there were compelling need here, I'd think that was worthwhile.  But, so far, all we've heard about are theoretical cases what would constitute encouraging bad ideas (e.g., anything on the wire other than UTF-8) if were real and we recognized them.

I note, although you find it distasteful (and, actually, I do too), that requiring "UTF8 stuff here" parameters on the MAIL, RCPT, and DATA commands is much less problematic than introducing a "UTF-8" command, precisely because it doesn't introduce new state.  Of course, it means we would have to look at extensions such as BDAT too, but...

     john