Re: [ietf-822] utf8 messages

Daniel Vargha <dvargha@mimecast.com> Fri, 15 August 2014 15:15 UTC

Return-Path: <dvargha@mimecast.com>
X-Original-To: ietf-822@ietfa.amsl.com
Delivered-To: ietf-822@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4783D1A0B0A for <ietf-822@ietfa.amsl.com>; Fri, 15 Aug 2014 08:15:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.969
X-Spam-Level:
X-Spam-Status: No, score=-4.969 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.668, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sqsHJ_VjwVgU for <ietf-822@ietfa.amsl.com>; Fri, 15 Aug 2014 08:15:56 -0700 (PDT)
Received: from service-alpha-uk.mimecast.com (service-alpha-outbound1.mimecast.com [91.220.42.229]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 204BD1A0B08 for <ietf-822@ietf.org>; Fri, 15 Aug 2014 08:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mimecast.com; s=20130419; t=1408115753; bh=K6O5tqZ1bwBsERTC1UcbM6OEEJ3/0Lu7vtXt3wFrQ2w=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To:Content-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Q+wLUsdvWL6uzxmbj8pRqmrvAbn42tZVR1WFIEiG729Nsf7PiSbr2nHj4theqQf+4yh5VNwm/im25tXE07YVtrmUZgkdvR2N4BIOFwfTfUXllAN6Z5TLcAmtl2T7G8aVAndY5QHoGgSIalIgsiHIvvZm5qbCZW2iuH4nEidN3aI=
Received: from remote.mimecast.com (146.101.202.133 [146.101.202.133]) (Using TLS) by uk-sl-b.uk.mimecast.lan; Fri, 15 Aug 2014 16:15:44 +0100
Received: from MC-LON-EXCH06.mcsltd.internal (192.168.40.206) by MC-LON-EXCH03.mcsltd.internal (192.168.40.12) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 15 Aug 2014 16:15:44 +0100
Received: from MC-LON-EXCH03.mcsltd.internal ([fe80::3879:e7a7:5e3d:3699]) by MC-LON-EXCH06.mcsltd.internal ([fe80::fc47:f11e:e9aa:b670%13]) with mapi id 14.03.0195.001; Fri, 15 Aug 2014 16:15:42 +0100
From: Daniel Vargha <dvargha@mimecast.com>
To: Ned Freed <ned.freed@mrochek.com>
Thread-Topic: [ietf-822] utf8 messages
Thread-Index: AQHPtaaAWFoietPkYkCGwOkQxU65z5vMFRA1gAB7aoCAAIf8hoABBMgAgABpbQCAAMX5yIACU9GAgAAhYRWAAAprAA==
Date: Fri, 15 Aug 2014 15:15:43 +0000
Message-ID: <D013DB6D.1977B%dvargha@mimecast.com>
References: <CABa8R6tWEhjjZSvq6NbM7EimokOms3suZufn0-6N1SB_fzGM8Q@mail.gmail.com> <01PB9FABWA4E0000SM@mauve.mrochek.com> <CABa8R6tns-idiZTj=+vb9fVNyH-nNYT+w9oNMb80XbCs5osvFw@mail.gmail.com> <01PBABOOL4QO0000SM@mauve.mrochek.com> <CABa8R6vBqS1ewmTtHh8tTOdzobsWpvSEokRxOqpj1Oq3hA+vsw@mail.gmail.com> <D0111ECB.195FD%dvargha@mimecast.com> <01PBCA98IPI00000SM@mauve.mrochek.com> <D013B9C1.1972E%dvargha@mimecast.com> <01PBEGWAGVDG0000SM@mauve.mrochek.com>
In-Reply-To: <01PBEGWAGVDG0000SM@mauve.mrochek.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.4.3.140616
x-originating-ip: [205.217.25.189]
Content-ID: <1FAF110F95D2DE488D81939466CFBA93@mimecast.com>
MIME-Version: 1.0
X-MC-Unique: AA-rxJRcR5C0QPYq4tLfXg-1
Content-Type: text/plain; charset="KSC5601"
Content-Transfer-Encoding: base64
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf-822/O9HvPC-g2DXSeby8bvQOcEBvca8
Cc: "ietf-822@ietf.org" <ietf-822@ietf.org>
Subject: Re: [ietf-822] utf8 messages
X-BeenThere: ietf-822@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion of issues related to Internet Message Format \[RFC 822, RFC 2822, RFC 5322\]" <ietf-822.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-822/>
List-Post: <mailto:ietf-822@ietf.org>
List-Help: <mailto:ietf-822-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Aug 2014 15:15:58 -0000

On 15/08/2014 15:13, "Ned Freed" <ned.freed@mrochek.com> wrote:

>> On 14/08/2014 01:56, "Ned Freed" <ned.freed@mrochek.com> wrote:
>
>> >> I fully agree with Brandon, the standard SHOULD consider the use case
>> >>when a
>> >> message is transferred from one system to another as a blob (e.g.
>>flat
>> >>file) and
>> >> the only available "metadata" is that the message is in MIME format.
>> >>Having
>> >> some sort of well defined UTF8 indicator in the header section of the
>> >>message
>> >> would make it much simpler to adopt the new standard as it would
>>require
>> >> substantially less development effort in most cases.
>> >
>> >I'm skeptical of the claim, but if you absolutely have to have
>>something,
>> >why
>> >not add a Received: field containing a "with smtputf8" clause, assuming
>> >one
>> >isn't there already?
>
>> Received: headers are not very reliable, and the syntax is is not well
>> defined.
>
>On the contrary, it's quite well defined. See RFC 5321. The issue isn't
>that
>it's poorly defined, but rather that there are a lot of agents that don't
>create it properly.

Maybe because it was defined too late, and not in the right place? (RFC
5321
is about SMTP not about MIME) From the parser's point of view the reason
is 
indifferent, the reality is that it is better not to rely on it. Even RFC
5321 
says: 

"...receiving systems MUST NOT reject mail based on the format of a trace
header field and SHOULD be extremely robust in the light of unexpected
information or formats in those header fields."

>
>> Successfully parsing a Received: header itself requires a lot of
>> heuristics.
>
>A full parse does, and so does looking for IP address information (which
>doesn't appear directly as a clause value and whose position was only
>standardized late in the game). Looking for a with clause with a
>particular
>value does not.

Looks like we have quite different ideas about reliability and parsing.
I certainly would not consider the partial parsing approach you suggested
as reliable.

> 
>
>> To be honest I would not be happy to rely on them. Also, when a  message
>> is transferred between archive stores no new Received: header is
>>normally
>> added.
>
>Uh huh. And neither is whatever new header is being proposed here. Why
>is one preferable to the other?

Because 
1) the Received: header is already used and abused in many ways
2) as you admitted, there are a lot of agents that don't create it properly
3) semantically it does not makes sense to put the charset information in
   the Received: header (it is meant to be a trace field)
4) if we define a new field, we don't need to worry about finding the
newly 
   defined field with bogus syntax in historic emails sent before the
standard 
   was published

>
>> >
>> >> Regarding Ned's concern about inconsistent states I think it would
>>be a
>> >>workable
>> >> solution to only honour the UTF8 indicator in the headers when the
>>UTF8
>> >>flag
>> >> is not available from metadata. In a well known UTF8 context where
>>the
>> >>SMTP
>> >> protocol or the message store already "knows" that the message is
>>UTF8
>> >>the
>> >> indicator in the headers can be ignored.
>> >
>> >That assumes people will read the standard. It's far more likely that,
>> >given an obvious indicator, they will simply use it.
>
>> Is this a serious argument? Why would you bother writing a standard if
>>you
>> don't expect people to read it?
>
>It's deadly serious. There's quite a lot of monkey-see-monkey-do
>out there.
>
>Your "why bother" argument is bogus though. Other people do read the
>standards;
>enough to make it worthwhile to develop them. And some finally read them
>when
>they find their hack didn't work.

Is this enough reason for designing the standard for the monkeys?

>
>> >
>> >> I think it is generally desirable to reduce (or at least not
>>increase)
>> >>the amount
>> >> of heuristics required to successfully parse a MIME message. We
>>should
>> >>try to
>> >> learn from previous mistakes instead of repeating them.
>> >
>> >That's the absolute worst example you could have picked, because the
>>most
>> >serious design error in MIME is the MIME-Version: field. You know, the
>> >field
>> >that tells you whether or not a given message is a MIME message. Sound
>> >familiar?
>
>> I don¹t understand this comment. What example are you referring to? (Of
>> course
>> I am familiar with the MIME-Version: header, I have read the
>>corresponding
>> RFC
>> many times)
>
>The MIME-Version field has turned into a wart on the protocol. There's no
>way
>to bump the version since too many things are hard-coded to look for the
>1.0,
>so it's primary purpose of providing a version indicator is gone. We're
>stuck
>at 1.0. You can't even put a comment on the field since some agents are
>known
>to hate that.
>
>And on the other side, a lot of agents will assume MIME even if the if
>field
>isn't present. It also gets attached willy-nilly to a lot of non-MIME
>messages
>because that's easier than checking.
>
>As a result the information value of the field is essentially
>nonexistent: You
>have to attach it to MIME messages but you cannot count on it to tell you
>anything.
>
>It's a fully worked example of how redundant indicators turn into warts.
>And in
>the case of MIME-Version, despite being in the standard from day one this
>process was more or less complete in a couple of years.

Thanks for explaining. As I said before, I am familiar with the
MIME-Version: 
header, but I have not used is as an example. (You said it was the absolute
worst example I could have picked)

Daniel