Re: [EAI] UTF32
John C Klensin <klensin@jck.com> Tue, 21 April 2015 14:38 UTC
Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B91EF1AC3D3 for <ima@ietfa.amsl.com>; Tue, 21 Apr 2015 07:38:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.711
X-Spam-Level:
X-Spam-Status: No, score=-0.711 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aJBzX5HxXGpj for <ima@ietfa.amsl.com>; Tue, 21 Apr 2015 07:38:21 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 252761AC3C3 for <ima@ietf.org>; Tue, 21 Apr 2015 07:38:21 -0700 (PDT)
Received: from [198.252.137.35] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <klensin@jck.com>) id 1YkZJa-0001aD-9B; Tue, 21 Apr 2015 10:38:18 -0400
Date: Tue, 21 Apr 2015 10:38:13 -0400
From: John C Klensin <klensin@jck.com>
To: Oleksandr Tsaruk <tsaruk@i.ua>, ima@ietf.org
Message-ID: <ED0FFB5B08EDBB19172476F4@JcK-HP8200.jck.com>
In-Reply-To: <E1YkXtF-0002DH-0s@st06.mi6.kiev.ua>
References: <3D9223A5-135E-4F43-B814-EB7BE51D207C@linkedin.com> <01PKTYIGGNDC0000AQ@mauve.mrochek.com> <E1YkXtF-0002DH-0s@st06.mi6.kiev.ua>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.35
X-SA-Exim-Mail-From: klensin@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/ima/u_feDyxxhvECKwxe7FhrcY9-1Ig>
Cc: cyrillicgp@icann.org
Subject: Re: [EAI] UTF32
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Apr 2015 14:38:22 -0000
--On Tuesday, April 21, 2015 16:07 +0300 Oleksandr Tsaruk <tsaruk@i.ua> wrote: > Is it possible to reconsider (in a very long run) EAI WG > general approach to: > > "This working group's previous experimental efforts > investigated the use of UTF-32 as a general approach to email > internationalization." In such case email/domaine > internationalization problem could be solved in original > scripts? In principle, yes. In practice, given the increasing (and increasingly universal) use of UTF-8 on the wire, probably not. The more important question is what you think going to UTF-32 would accomplish. It is fully isomorphic with UTF-8 -- there is no information that can be represented one way and not the other. It is much less compact than UTF-8 for "western" alphabetic scripts (including Cyrillic), less compact for any BMP code point, and never worse (in terms of more bytes per code point. UTF-32 does not get involved with the "surrogate" mess, but neither does UTF-8. Neither helps at all with the various normalization or comparison problems. The only advantage I can think of at the moment is that UTF-32 permits getting a count of the number of code points present by counting octets and dividing by four while UTF-8 (and UTF-16) require some calculations. However, one rarely cares about number of code points as compared to, e.g., number of "print positions" or "characters" and, given combining sequences and non-spacing characters and marks, getting from a code point count to print position information cannot be done without considerable knowledge of the code points involved (and, for some scripts, rendering procedures). So, can you explain what you think a move to UTF-32, even if it were possible, would accomplish? john
- Re: [EAI] SMTPUTF8 and 8BITMIME Mark Martinec
- [EAI] SMTPUTF8 and 8BITMIME Franck Martin
- Re: [EAI] SMTPUTF8 and 8BITMIME ned+ima
- [EAI] UTF32 Oleksandr Tsaruk
- Re: [EAI] UTF32 John C Klensin
- Re: [EAI] UTF32 Oleksandr Tsaruk
- Re: [EAI] UTF32 John C Klensin
- Re: [EAI] UTF32 Andrew Sullivan
- Re: [EAI] UTF32 Franck Martin
- Re: [EAI] UTF32 Martin J. Dürst
- Re: [EAI] UTF32 Oleksandr Tsaruk
- Re: [EAI] UTF32 John C Klensin
- Re: [EAI] UTF32 Mark Davis ☕️
- Re: [EAI] UTF32 ned+ima
- Re: [EAI] UTF32 Mark Davis ☕️