[ietf-822] utf8 messages
Brandon Long <blong@google.com> Mon, 11 August 2014 20:45 UTC
Return-Path: <blong@google.com>
X-Original-To: ietf-822@ietfa.amsl.com
Delivered-To: ietf-822@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8A1DD1A0002 for <ietf-822@ietfa.amsl.com>; Mon, 11 Aug 2014 13:45:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.046
X-Spam-Level:
X-Spam-Status: No, score=-2.046 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.668, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id krNgRtcvKNuO for <ietf-822@ietfa.amsl.com>; Mon, 11 Aug 2014 13:45:49 -0700 (PDT)
Received: from mail-ig0-x234.google.com (mail-ig0-x234.google.com [IPv6:2607:f8b0:4001:c05::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 55D411A0010 for <ietf-822@ietf.org>; Mon, 11 Aug 2014 13:45:49 -0700 (PDT)
Received: by mail-ig0-f180.google.com with SMTP id l13so4928167iga.7 for <ietf-822@ietf.org>; Mon, 11 Aug 2014 13:45:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=71+ZTUknbmOzrnd8RMFmxMxJhJc3+RPKqpf4uV6v3Ak=; b=YkkgoWaJIyJYyWABR3plH9wzqbdoof64ZyO9Ngf+72tstSh790CY2Yg6NOtcgtTxJC BwR4sucCh1Ydg+wxCk/+R3KEevekZp2yHibz+WrNeamUF6zUW3ZPIVuMa6Kh8J9ulKww OffJgh1ZCeQ9QlhGjPYtRPqnmba2YjJZfFLQc55wVhZ5xKGMUlqGK8HstlJktJ8X5+Gz lFvo1DE58si7InFpjhEAt1vOHToRo0qEZuFes1Z1UxhvE1kE8a6eJREIBF1QHgpSHu4A UwixmFHancJ1fcHv4nkq0H9g+FwrifGqxC5s/LIzP0uWQSGxCg318W1K4w0aMMbZKg6E Wtlw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=71+ZTUknbmOzrnd8RMFmxMxJhJc3+RPKqpf4uV6v3Ak=; b=fMQIKcHpUXod3NmbamonHlx6lYs0Hf7NYvkAj+/+W483mWnVQcMiQMqxlOoKwCjOOI HTxFn3i9LNiDrMfF2qLWfQCZ61UMnBuJn6vwhr3QJot6n+bzFmYKGE1ZL9ebyaD0vzqL gtSv4KX5b6s4fTginZNhwTi75oCayRQTj38XE5YXL4796QNJTjOfrubRfSGMgTTF67n4 5nlJ+JmrrTZaHo02/WifFUYJNFnbqTmjlf5EMYPcisHGXrRX6D0dJgvq7JRCztvjkq3t nRANfytx2p7Wvz8TLuhr6TA19b0guUd7wYdhU8bLhfmv1+Y8sdLMCOUWdpaCS7jdIOyv yNzw==
X-Gm-Message-State: ALoCoQk5bhSRW5T2yYmhkjfovauw8TUUgGgBevK6UHS1bpKSfzgLlQmyz8fXO3pf+3yBVB60OTuC
MIME-Version: 1.0
X-Received: by 10.50.80.39 with SMTP id o7mr100753igx.0.1407789948294; Mon, 11 Aug 2014 13:45:48 -0700 (PDT)
Received: by 10.64.62.78 with HTTP; Mon, 11 Aug 2014 13:45:48 -0700 (PDT)
Date: Mon, 11 Aug 2014 13:45:48 -0700
Message-ID: <CABa8R6tWEhjjZSvq6NbM7EimokOms3suZufn0-6N1SB_fzGM8Q@mail.gmail.com>
From: Brandon Long <blong@google.com>
To: ietf-822@ietf.org
Content-Type: multipart/alternative; boundary="089e01493922c988e0050060a3e5"
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf-822/z3apPe_6hgR51uIHfDQiIY2An3k
Subject: [ietf-822] utf8 messages
X-BeenThere: ietf-822@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion of issues related to Internet Message Format \[RFC 822, RFC 2822, RFC 5322\]" <ietf-822.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-822/>
List-Post: <mailto:ietf-822@ietf.org>
List-Help: <mailto:ietf-822-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Aug 2014 20:45:51 -0000
In our recent launch of support for EAI, we noticed an issue with 6532 "utf8" messages. As near as I can tell, there is nothing about a 6532 message which tells you it is such a message... except the existence of 8bit characters in the headers. Ie, 7bit -> 5322, 8bit -> 6532. Our problem is that this isn't actually true in practice. Prior to launching support for 6532 messages, we've already had to support widespread use of 8bit messages that were not always in utf8. Since these typically didn't specify which charset they were in, we used a variety of techniques including direct charset detection on such messages. The problem we're having with 6532 messages, is that we moved from explicitly identified charsets via 2047/etc mechanisms, to "its just utf8"... and sometimes we mis-detect the utf8 as cp1250 or other encodings. Now, we can work on improving our detection and maybe start biasing it to utf8 or even just assuming utf8 for any 8bit message which is in interchange valid utf8. Anything we do there will result in some potential for mistakes, of course. This would all be solved if 6532 messages were actually denoted as such, and I recall seeing at least one such X header used by another service we've been interoperability testing with: X-CM-HeaderCharset: UTF-8 CM no doubt standing for CoreMail, which is the software used: X-Mailer: Coremail Webmail Server Version XT3.0.4 build 20140526(27182.6409.6185) Copyright (c) 2002-2014 www.mailtech.cn coremail Thoughts? It looks like there was a i-Email/Header-Type originally, but was removed early in the utf8smtp timeframe: http://www.ietf.org/mail-archive/web/ima/current/msg01358.html The general consensus for removal seemed to be "you'll know because it was specified at SMTP time", "just look for 8bit" and "its bad to duplicate data between the envelope and the headers". Looks like it goes nearly to the beginning of the utf8smtp time frame: http://www.ietf.org/mail-archive/web/ima/current/msg00079.html It seems that the pre-existence of 8bit messages was not considered by those who felt it wasn't necessary, as least as far as I've read in the discussions (wow do I wish the mhonarc had been updated with an easier to explore/read model) Now, as hinted at in the consensus to remove such a marker from the draft, we can certainly add such a header when composing 6532 messages or when we receive any message via SMTPUTF8 for our own utility, but I would think there would be some utility in such a marker being mutually understood and shared. Brandon
- [ietf-822] utf8 messages Brandon Long
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Brandon Long
- Re: [ietf-822] utf8 messages Mark Martinec
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Jan Kundrát
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Brandon Long
- Re: [ietf-822] utf8 messages Alessandro Vesely
- Re: [ietf-822] utf8 messages Daniel Vargha
- Re: [ietf-822] utf8 messages Mark Martinec
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Daniel Vargha
- Re: [ietf-822] utf8 messages Brandon Long
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Brandon Long
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Jan Kundrát
- Re: [ietf-822] utf8 messages Daniel Vargha
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Daniel Vargha
- Re: [ietf-822] utf8 messages Brandon Long
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Arnt Gulbrandsen
- Re: [ietf-822] utf8 messages Mark Martinec
- Re: [ietf-822] utf8 messages Jan Kundrát
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Arnt Gulbrandsen
- Re: [ietf-822] utf8 messages Tony Finch
- Re: [ietf-822] utf8 messages Ned Freed
- Re: [ietf-822] utf8 messages Mark Martinec
- Re: [ietf-822] utf8 messages Chris Newman