Re: [iucg] Unicode 7.0.0, (combining) Hamza Above, and normalization for comparison

JFC Morfin <jefsey@jefsey.com> Wed, 06 August 2014 12:01 UTC

Return-Path: <jefsey@jefsey.com>
X-Original-To: iucg@ietfa.amsl.com
Delivered-To: iucg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C495A1A007A for <iucg@ietfa.amsl.com>; Wed, 6 Aug 2014 05:01:45 -0700 (PDT)
X-Quarantine-ID: <fL1uXxfdJE7l>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BAD HEADER SECTION, Non-encoded 8-bit data (char E4 hex): To: Patrik F\344ltstr\366m <paf@fr[...]
X-Spam-Flag: NO
X-Spam-Score: 1.931
X-Spam-Level: *
X-Spam-Status: No, score=1.931 tagged_above=-999 required=5 tests=[BAYES_50=0.8, IP_NOT_FRIENDLY=0.334, MIME_8BIT_HEADER=0.3, MISSING_MID=0.497] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fL1uXxfdJE7l for <iucg@ietfa.amsl.com>; Wed, 6 Aug 2014 05:01:44 -0700 (PDT)
Received: from host.presenceweb.org (host.presenceweb.org [67.222.106.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 81D171B2991 for <iucg@ietf.org>; Wed, 6 Aug 2014 05:01:44 -0700 (PDT)
Received: from 21.104.14.81.rev.sfr.net ([81.14.104.21]:59650 helo=MORFIN-PC.mail.jefsey.com) by host.presenceweb.org with esmtpa (Exim 4.82) (envelope-from <jefsey@jefsey.com>) id 1XEzuX-0000w6-Vt; Wed, 06 Aug 2014 05:01:42 -0700
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Wed, 06 Aug 2014 14:01:29 +0200
To: Patrik F�ltstr�m <paf@frobbit.se>,Mark Davis ������ <mark@macchiato.com>
From: JFC Morfin <jefsey@jefsey.com>
In-Reply-To: <219A83FB-B0C4-4B58-93A9-84A976B9147E@frobbit.se>
References: <C0D401D76B8D1BA472604BB4@JCK-EEE10> <CAJ2xs_F9+6_+Fz-xFdSGBUV82qmMa33Y8+F9mjinMKx9=YoKcA@mail.gmail.com> <CAJ2xs_H_Gy9b_A5LZj0o9rFffbvbnVGLv+22CD7NhmZhLXE6Rg@mail.gmail.com> <219A83FB-B0C4-4B58-93A9-84A976B9147E@frobbit.se>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; format=flowed
Content-Transfer-Encoding: 8bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - host.presenceweb.org
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jefsey.com
X-Get-Message-Sender-Via: host.presenceweb.org: authenticated_id: jefsey+jefsey.com/only user confirmed/virtual account not confirmed
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: http://mailarchive.ietf.org/arch/msg/iucg/jA-l_zBniPoSLumyDJo_lbg76OE
Cc: Marc Blanchet <Marc.Blanchet@viagenie.ca>, IDNA update work <idna-update@alvestrand.no>, "iucg@ietf.org" <iucg@ietf.org>, gerard lang <gerard_lang@orange.fr>
Subject: Re: [iucg] Unicode 7.0.0, (combining) Hamza Above, and normalization for comparison
X-BeenThere: iucg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: internet users contributing group <iucg@ietf.org>
List-Id: internet users contributing group <iucg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/iucg>, <mailto:iucg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/iucg/>
List-Post: <mailto:iucg@ietf.org>
List-Help: <mailto:iucg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/iucg>, <mailto:iucg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Aug 2014 12:01:46 -0000
X-Message-ID:
Message-ID: <20140806120149.3722.68947.ARCHIVE@ietfa.amsl.com>

At 07:03 06/08/2014, Patrik Fältström wrote:
>To be honest, I do not think it matters where it is discussed.

I suggest we keep it discussed here. The reason why is the ICANN 
response to the plaintiffs in the .ir, etc. case. "the DNS provides a 
human interface to the internet protocol addressing system". This 
seems to be a good definition to commonly sustain as it is 
technically true, easy to understand, and makes a clear distinction 
between the human and the non-human issues.

The most complex issue of the human confusability of the ISO 10646 
code points calls for a visual to binary anti-phishing algorithm. 
Such an algorithm should be added to the idna table allowing 
registries to accept xn-- registrations or not, based upon the domain 
names already registered.

To start the debate on this issue I would suggest a possibilty for 
such an algorithm: a mathematical proximity confusability 
discrimination between character 32x32 rasterizations (i.e. 1024 bits 
structured strings). I note that this also implies a common font of 
reference: I do not think this is a problem as it is on the human 
side and that conflicts will be subject to courts: what counts is the 
font local law will consider. Up to each ccTLD to provide that 
information and to have it added to ISO 3106, which already includes 
the administrative languages we should get renamed anyway as 
standardization languages coupled with the accepted script(s).

Initial question:
1. what is the URL of the complete Unicode code point table value/description?
2. I found rasterisations made for different scripts but not for all.

jfc