[idn] a way toward homograph resolution ? (was "improving WG operation")
"JFC (Jefsey) Morfin" <jefsey@jefsey.com> Wed, 11 May 2005 05:44 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA27328 for <idn-archive@lists.ietf.org>; Wed, 11 May 2005 01:44:45 -0400 (EDT)
Received: from majordom by psg.com with local (Exim 4.50 (FreeBSD)) id 1DVjrv-000KQO-7z for idn-data@psg.com; Wed, 11 May 2005 05:34:51 +0000
Received: from [63.247.74.122] (helo=montage.altserver.com) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.50 (FreeBSD)) id 1DVjrt-000KQ9-6W for idn@ops.ietf.org; Wed, 11 May 2005 05:34:49 +0000
Received: from lns-p19-4-idf-82-65-244-40.adsl.proxad.net ([82.65.244.40] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DVjrr-0006dp-HB; Tue, 10 May 2005 22:34:47 -0700
Message-Id: <6.2.1.2.2.20050511050500.045cf140@mail.jefsey.com>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2
Date: Wed, 11 May 2005 06:08:18 +0200
To: ietf@ietf.org
From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
Subject: [idn] a way toward homograph resolution ? (was "improving WG operation")
Cc: idn@ops.ietf.org, "Hallam-Baker, Phillip" <pbaker@verisign.com>
In-Reply-To: <001601c555d3$453fd9c0$7f1afea9@oemcomputer>
References: <198A730C2044DE4A96749D13E167AD37250259@MOU1WNEXMB04.vcorp.ad.vrsn.com> <6.2.1.2.2.20050511021431.048f8060@mail.jefsey.com> <001601c555d3$453fd9c0$7f1afea9@oemcomputer>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - montage.altserver.com
X-AntiAbuse: Original Domain - ops.ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jefsey.com
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2
Sender: owner-idn@ops.ietf.org
Precedence: bulk
On 04:43 11/05/2005, Randy Presuhn said: >From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com> > > To: "Hallam-Baker, Phillip" <pbaker@verisign.com> > > Cc: <ietf@ietf.org> > > Sent: Tuesday, May 10, 2005 5:29 PM > > Subject: RE: improving WG operation >... > > They do not not only delete. I suggest you just come to the WG-ltru where > > they have decided to document RFC 2277 charsets into RFC 3066 langtags. So > > you can enjoy charset conflicts, something you never though about, I > > presume. You cannot stop progress. >... > >I guess Jefsey is upset because the WG rejected his proposal >to expand our scope to include charsets. The ltru WG is most >emphatically *not* confusing charsets with language tags. I am not upset :-). To the countrary I find extremely interesting that some people were able to rename charsets "scripts" in order to insert charsets into languages descriptions while claiming they dont (cf. above). Obviously they are unhappy when I expose the trick. Anyway the result is great fun: people will be prevented from accessing a page they know to read, if they do not know the language. This cacologic however might be a good way to solve the IDN homograph issue and the phishing problem. If we revert from those famous "scripts" to what they are, i.e. unicode partitions, hence stable and well documented charsets (http://www.unicode.org/Public/4.1.0/ucd/Scripts.txt) , using them browsers can expose the homographs not related to the page charset in IDNs, and kill the risks of phishing. This only calls for the browsers to extract the charset, I mean the script name from the langtag, call this file, read the list of codes points in the charset/associated to the script, and display the URL accordingly, indicating the characters which are no part of the script/charset. This relieves the ccTLD/TLD Manager from responsibilities he cannot fulfil at 3+level. There are howver still (minor) points to address: - there are some minor disparities between the "script" name in the langtag, and the script name in the script.txt file should be reduced over time. I suppose that if this is a major issue, there will be help. - the script.txt file is currently supported on the Unicode site. Even in caching it (92 K) it will be called everytime people will start their browser. This may therefore represent several billions of access a day. - the WG-ltru only realy wants to address XML issues, related to old XML libraries. Some coordination with other WGs or interests could be fruitful. They plan the language tags registry to extend to scripts and to register them. I suppose other WGs could benefit from this (all those involved in a way or another with internationalisation and languages). jfc
- [idn] a way toward homograph resolution ? (was "i… JFC (Jefsey) Morfin
- [idn] RE: a way toward homograph resolution ? (wa… JFC (Jefsey) Morfin