Re: [idn] Re: character tables

John C Klensin <klensin@jck.com> Wed, 02 March 2005 15:50 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA25026 for <idn-archive@lists.ietf.org>; Wed, 2 Mar 2005 10:50:54 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D6W0f-0007Ne-1k for idn-data@psg.com; Wed, 02 Mar 2005 15:43:37 +0000
Received: from [209.187.148.211] (helo=bs.jck.com) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.44 (FreeBSD)) id 1D6W0Z-0007MJ-5q for idn@ops.ietf.org; Wed, 02 Mar 2005 15:43:31 +0000
Received: from [209.187.148.215] (helo=scan.jck.com) by bs.jck.com with esmtp (Exim 4.34) id 1D6W0V-000FP5-GY; Wed, 02 Mar 2005 10:43:27 -0500
Date: Wed, 02 Mar 2005 10:43:27 -0500
From: John C Klensin <klensin@jck.com>
To: Erik van der Poel <erik@vanderpoel.org>, Gervase Markham <gerv@mozilla.org>
cc: Paul Hoffman <phoffman@imc.org>, idn@ops.ietf.org
Subject: Re: [idn] Re: character tables
Message-ID: <A32EEA4DEC04F4C4122A64F9@scan.jck.com>
In-Reply-To: <4225455C.2030109@vanderpoel.org>
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <421FA55B.9000308@vanderpoel.org> <421FCBD7.8000805@vanderpoel.org> <42227EBF.9040703@vanderpoel.org> <45781B7428C6AA07C3B283BD@scan.jck.com> <42229BBC.8020608@vanderpoel.org> <p0621021ebe484f52c0c5@[10.20.30.249]> <4225ABAB.60002@mozilla.org> <p0621022dbe4ab4b8a3fa@[10.20.30.249]> <42251B80.5050503@vanderpoel.org> <4225455C.2030109@vanderpoel.org>
X-Mailer: Mulberry/3.1.6 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Erik,

A few observations...

	(1) First, a registry does have the right to require
	that registrants observe particular rules and conditions
	in subdomains they delegate and to pass those rules down
	the tree.  Whether that is wise or sensible is another
	issue, and enforceability is yet another question.
	But, unless national law prevents it, RFC 1591, to which
	all TLD registries more or less agreed, rather
	explicitly provides for passing the responsibilities to
	the community down the tree.  Even ignoring troublesome
	concepts like "require" and "enforce", certainly nothing
	prevents registries from educating and persuading
	registrants about how they should behave.
	
	(2) In my regular role as a luser, I really like fast,
	easily-used, small-footprint browsers.  I'm more
	security-conscious and suspicious than the user average,
	and therefore also like handy tools to help me dissect
	and verify things that might look suspicious.  Tying up
	a browser with heuristics, such as mixed-script
	detectors, that may not work well and have a large
	footprint, doesn't impress me as a good tradeoff.  For
	better or worse, the assumption of a decade ago that
	most criminals, especially most electronic criminals,
	were stupid is no longer applicable, if ever it was.
	That implies, I think, that if we design a simple test
	that blocks some look-alike cases but permits other,
	more subtle, ones, we will simply drive the phishers to
	better understand and use the subtle stuff: not a good
	tradeoff.

	(3) As far as surfing around the world is concerned,
	we've got a situation today in which the domain name
	associated with a particular URL does not really predict
	the content to be found on that page.  That will
	undoubtedly get worse, as more folks discover that the
	intersection of domain and host administration with web
	site organization often makes it much easier to maintain
	versions of pages in multiple languages in the same,
	rather than different, DNS trees.  So, since I don't
	read Chinese, I'm unlikely to frequently seek out pages
	whose content is in Chinese.  But I frequently find
	pages I can read via URLs that contain elements written
	in pinyin.  I fully expect those elements, and some of
	the subdomain names, will shift to Chinese characters as
	IDNs and IRIs are more widely available.   I also expect
	that transition will make things more comfortable for
	someone who reads Chinese and would prefer to not deal
	with Latin characters and harder for me, but that is a
	reasonable tradeoff over which none of us will have much
	influence.
	
	(4) We need to get unstuck from thinking about this
	purely as a browser problem.   The usual phishing attack
	involves an email message containing a link.  For those
	email clients that don't immediately invoke a full
	browser as soon as a link appears --and many of those
	links occur in plain-text, not HTML, email-- they are
	invoking the browser when the link is clicked on.  The
	situation in the browser is then different, since none
	of the "hover over link", "look at status bar", etc.,
	tools are going to apply, or, at least, are not going to
	work in the ways that some of these discussions suggest
	for links that appear on web pages that are already open
	in the browser.  Now, we have given MUA writers no
	advice about what they should pass to the browser if
	they see an IRI or otherwise-encoded string that
	contains an IDN.  If they pass the IRI/
	native-script-form IDN, they risk passing it to a
	browser version that doesn't have a clue.  So maybe they
	force the thing into URI/ punycode form and pass that.
	Now, do you really want the browser to look at the
	thing, perform ToUnicode on the name (which, of course,
	may yield something other than what the user saw),
	perform some tests, and then pop up a "you just passed
	me an IDN that looks suspicious, do you really want to
	open that page?" box.  I think probably not.   Moreover,
	I think that, if you do, there would quickly be a
	sufficient number of false positives (positive for bad
	stuff) to get users really used to clicking "yes"
	without thinking... and cursing the browser implementer
	for bothering them with a pointless warning.

So my conclusion is that we need a mixed
protocol-registry-browser strategy.  That strategy, IMO, should
shifted the processing burdens as much as possible to the first
two.  And I think that notions that the problem can or should be
solved in any of those three places alone are probably misguided.

     john






--On Tuesday, 01 March, 2005 20:47 -0800 Erik van der Poel
<erik@vanderpoel.org> wrote:

>> However, I note that this particular conversation is between
>> a browser  developer (Gervase) and one of the IDNA authors
>> (Paul), neither of which  is a registry representative, so
>> why exactly are you 2 having this  conversation? :-)
>> 
>> Sorry, I'm half joking. Half, because you two have every
>> right to  discuss whatever you wish. The other half because I
>> believe browser  developers can afford to focus more on their
>> end of things.
> 
> Sorry, I've been told that this half-joking thing was
> confusing, and I now believe I shouldn't have tried to be so
> cute.
> 
> All I'm trying to say to *Gervase* is that it doesn't really
> matter *what* characters are allowed to be registered in a
> registry, as long as the browser takes steps to warn the user
> when something phishy might be going on, e.g. a slash
> homograph, or a Cyrillic small 'a' when the user was probably
> expecting a Latin small 'a'. As I have pointed out, the
> registry does *not* have control over higher-numbered level
> domains. E.g. .de controls the 2nd level domain (2LD), but not
> the 3LD, 4LD and so on. That is where the slash homograph
> problem *really* matters.
> 
>> Instead, I wish the browser developers would 
>> focus more on the *user*, who may be "surfing" from one site
>> to the  next, spanning the globe, and crossing language
>> boundaries.
> 
> Sorry, this may not have been the best logic to use in my
> argument. It would have been better to talk about phishers,
> who often spam users with email containing URIs that *could*
> contain IDN labels with dangerous homographs at any level of
> the name, 2LD, 3LD, or whatever.
> 
> (Most users *don't* surf around the world, since many are
> monolingual or maybe bilingual.)
> 
> Anyway, help me out, guys and gals. Pull my logic through the
> wringer, and comb it with the finest comb you have at your
> disposal. This way, we can collectively improve our
> understanding of the IDN phishing problem and ways to address
> it.
> 
> Erik