Re: [EAI] UTF32

Oleksandr Tsaruk <tsaruk@i.ua> Fri, 24 April 2015 12:06 UTC

Return-Path: <tsaruk@i.ua>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 293FA1ABC74 for <ima@ietfa.amsl.com>; Fri, 24 Apr 2015 05:06:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.699
X-Spam-Level:
X-Spam-Status: No, score=0.699 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tBcqguSWxiyI for <ima@ietfa.amsl.com>; Fri, 24 Apr 2015 05:06:36 -0700 (PDT)
Received: from st15.mi6.kiev.ua (st15.mi6.kiev.ua [91.198.36.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2A4011AC43D for <ima@ietf.org>; Fri, 24 Apr 2015 05:06:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=i.ua; s=mail; h=Message-Id:Content-Transfer-Encoding:Content-Type:References:MIME-Version:Date:In-Reply-To:Cc:From:Subject:To; bh=X9FcqHj57gFv90fFmfAgztkY2KC+Mrs8VTdnkJpBeMQ=; b=FFct1uCvA95JLJuwUxCAwb0Z2yrRZa6EyaOyJBpqwDflmV9ANxL2BDnGaUl8lWrfPTBvbaWLRrtpcrm9nRuQ8bUqysNE7e16/Tf0z75VMjE/LqpYZrzqYlkTQ4njTxGcyUZp3xP3Dzpik5fubk6f3/Vah3Syf4Yr+t5Dk+Q6bjM=;
Received: from web by st15.mi6.kiev.ua with local (Exim 4.80.1) (envelope-from <tsaruk@i.ua>) id 1YlcNL-0004xm-HF; Fri, 24 Apr 2015 15:06:31 +0300
To: duerst@it.aoyama.ac.jp
From: Oleksandr Tsaruk <tsaruk@i.ua>
In-Reply-To: <553A17B6.5090803@it.aoyama.ac.jp>
Date: Fri, 24 Apr 2015 15:06:31 +0300
MIME-Version: 1.0
References: <3D9223A5-135E-4F43-B814-EB7BE51D207C@linkedin.com> <01PKTYIGGNDC0000AQ@mauve.mrochek.com> <E1YkXtF-0002DH-0s@st06.mi6.kiev.ua> <ED0FFB5B08EDBB19172476F4@JcK-HP8200.jck.com> <E1YkuAt-0001Yk-0v@st05.mi6.kiev.ua> <B522DEBAE28592BD6029B7D2@JcK-HP8200.jck.com> <553A17B6.5090803@it.aoyama.ac.jp>
X-Mailer: I.UA Mail System
X-Server: st15.mi6.kiev.ua
X-Sender-IP: 193.19.152.14 (192.168.40.33)
X-User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Message-Id: <E1YlcNL-0004xm-HF@st15.mi6.kiev.ua>
Archived-At: <http://mailarchive.ietf.org/arch/msg/ima/ydJBmQaeotG35HwDs2f8tNRm5co>
Cc: ima@ietf.org
Subject: Re: [EAI] UTF32
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Apr 2015 12:06:39 -0000

Dear colleagues

I am very glad that I successfully have drawn your attention, I an new at IETF, and participated only Dallas meeting but have got some experience at IDN board and at ICANN GAC.
We saw that Punycode solving basic IDN problems but have no perspectives. 

That is why I posted this provocative question on UTF32 and glad to see development of idea in ISO/IEC JTC1/SC2/WG2.

The basic propose is to find solution for IDN domains and email addresses be present in original scripts.  

Best regards,
Oleksandr Tsaruk


24.04.2015 13:15, "\"Martin J. Drst\"" <duerst@it.aoyama.ac.jp>
>On 2015/04/22 22:21, John C Klensin wrote:
> 
> > Now you may be thinking about something else. The Unicode code
> > space ranges only from 0 to 0x10FFFF. A 32bit code space would
> > be 0 to 0xFFFFFFFF. If one believed that the Unicode code space
> > were too small and that a full 32 bit space were needed, that
> > would be a different matter entirely. The Unicode folks are
> > convinced that more than the current space will never be needed
> > to represent all symbols of interest but, at one stage, they
> > believed that a 16bit code space would be enough. I haven't
> > studied the mechanisms in some years, but, if a larger code
> > space were needed, extensions would be needed to both the UTF-8
> > and UTF-16 encoding models to accommodate it while anything that
> > used a 32 bit space directly would presumably be fairly
> > transparent, at least until one got into various Unicode tables
> > and algorithms that assume the smaller code space. But, again,
> > that has little to do with the difference between UTF-16 and
> > UTF-32 except as a side effect.
> 
> More specifically, at one point, Unicode thought that a 16-bit space 
> might be enough (for that time being, at least), while on the other hand 
> ISO/IEC JTC1/SC2/WG2, the ones responsible for ISO 10646, thought that 
> an architecture with a full 31 bits would be better (the 32nd bit was 
> always reserved because nobody wanted to repeat the 8-bit "signed char" 
> vs. "unsigned char" mess). UCS-4 and UTF-8 were both designed to 
> encompass this 31-bit code space. UTF-8 needed up to 6 bytes for a 
> character, in a very straightforward way, according to its original 
> design (There is still code out there with traces from this period.). 
> UCS-2 was of course limited to 16 bits.
> 
> With time passing, it became clearer on both sides that 16 bits wasn't 
> enough but 31 bits was overkill. The introduction of UTF-16 created an 
> upper limit of 0x10FFFF in one of the Unicode encoding forms. Therefore 
> both Unicode and SC2/WG2 agreed on this overall limit. UTF-32 was 
> introduced as a version of UCS-4 with an explicit upper codepoint limit 
> of 0x10FFFF, and the definition of UTF-8 was changed to only go up to 
> four bytes.
> 
> In the case of an extraterrestrial invasion by a culture with millions 
> of characters or a sudden excessive emoji binge, the limits for UCS-32 
> and UTF-8 could be changed again, and some further kludge could be 
> introduced in UTF-16. But the chance that we'll get there is very low, 
> at least for the moment.
> 
> Regards, Martin.


Sincerely yours, 
Oleksandr Tsaruk, Ph.D.


-- реклама -----------------------------------------------------------
Поторопись зарегистрировать самый короткий почтовый адрес @i.ua
http://mail.i.ua/reg - и получи 1Gb для хранения писем