Re: [I18nrp] Conservatism principle doesn't go far enough

Larry Masinter <LMM@acm.org> Mon, 04 February 2019 20:10 UTC

Return-Path: <masinter@gmail.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0FBDB124B0C for <i18nrp@ietfa.amsl.com>; Mon, 4 Feb 2019 12:10:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0vzE4v-RoCJE for <i18nrp@ietfa.amsl.com>; Mon, 4 Feb 2019 12:10:18 -0800 (PST)
Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 70EFA1200D7 for <i18nrp@ietf.org>; Mon, 4 Feb 2019 12:10:18 -0800 (PST)
Received: by mail-pf1-x433.google.com with SMTP id y126so437833pfb.4 for <i18nrp@ietf.org>; Mon, 04 Feb 2019 12:10:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:references:in-reply-to:subject:date:message-id :mime-version:thread-index:content-language; bh=ZixwuK+hKCjsCnzHThmAo76v79R/xGfzt7RXUcm4HaY=; b=P9GRD+0efCw7Bd+kAE5vks5rKek3bGOa4B/ZdMUZ43VSGqt4AHK2l/MiEqhNKBJ2DI dTe2YLFzkZJLg+yFSvFm6EfQyoshyHD1CcNJwp8RjKl0FFW1Qnpm0F0wdCsmzXJVk3wv zY/LFci+VZ1jMEKZCbdVa6Dcc8jRTNNoTOVbkHf9eo/ebgKDd8d/o1ouhGKEROuXpEbr wq4euAthj86dYBNu0a6hNaTEo2JVdQbYeFgxGW8JKJNS0PpE8IH2osZ1p3B8qa9rIDPj rohqwLKgvZupF34jnAn2by1S+7ETfigqI6IqP9LOfSUU2SzSSD7bt0XUEsLspTfzSw2U 6OFg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:references:in-reply-to:subject :date:message-id:mime-version:thread-index:content-language; bh=ZixwuK+hKCjsCnzHThmAo76v79R/xGfzt7RXUcm4HaY=; b=Gz/Nk5SJ6ihNW+N+6QVFH7PUu3xwKkFJB1Rd8sJ0+FLblKhxDLZ2n/FzTHm9c1hd1/ g60aptd+B2/LuqTx6y5tlgvduBlnu/4MjlhWbMQ24SiGWfKFkCzceBBfJpDXKgo0jHXE 7rgw5oxz/nC0yzfAwUHWMfD8oMT/QNdUNAueCX/hVHIU1qK9bLhiGfpy77pb5NwUTD2Z 84wdmwihBl9YLmnno+ziyniBUvcWJ12iynNvU1ghYmM07DigvXjELmR86dHhtpSk3y/0 vn7aYwxv6Bh9H4A20sNt3u7LsYgplHdlvIKDmeIuHiN751TMzBVcQMz81w9Aha5etA/n cgDQ==
X-Gm-Message-State: AHQUAuaYeRCos+ACnOegjlQgW1EJ0hbCOwhbk9PZQmoHNyGp4kTySqlj EygYn+bRWisWQoI51sQhsiift3v8
X-Google-Smtp-Source: AHgI3IYfL2HaJHRfvyWMy0n+wgcC87H0M3HrJ8FBn+/t8gOaaOQQmqQHoBcTf9vMpNSs1WuTMQshbg==
X-Received: by 2002:a62:b9a:: with SMTP id 26mr1142468pfl.196.1549311017513; Mon, 04 Feb 2019 12:10:17 -0800 (PST)
Received: from TVPC (c-67-169-101-78.hsd1.ca.comcast.net. [67.169.101.78]) by smtp.gmail.com with ESMTPSA id h64sm1256879pfc.142.2019.02.04.12.10.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Feb 2019 12:10:15 -0800 (PST)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: "'Asmus Freytag (c)'" <asmusf@ix.netcom.com>, 'Nico Williams' <nico@cryptonector.com>
Cc: i18nrp@ietf.org
References: <20190201021802.A5160200D93BBA@ary.qy> <4C0F3C8D65FB57C697E72F8D@PSB> <016001d4bb75$15350130$3f9f0390$@acm.org> <a956b63b-cff0-5df3-b7fc-511274542349@ix.netcom.com> <20190203234846.GA4108@localhost> <1c176e53-2f27-ca83-7e59-52099021ddcd@ix.netcom.com>
In-Reply-To: <1c176e53-2f27-ca83-7e59-52099021ddcd@ix.netcom.com>
Date: Mon, 04 Feb 2019 12:10:15 -0800
Message-ID: <021d01d4bcc5$a4d7c3d0$ee874b70$@acm.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_0236_01D4BC82.96B80640"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQFzM9kcOq2FuT80+GKNalknoqnKGAGC9QFjAlacPTcCqnlCwgFLP+mGAmAaLcKmQX94kA==
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/C05Pjhid8vq9tdPqzZMWi0rBdSM>
Subject: Re: [I18nrp] Conservatism principle doesn't go far enough
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Feb 2019 20:10:26 -0000

Let me be a little more careful. The document 

https://chromium.googlesource.com/chromium/src/+/master/docs/security/url_display_guidelines/url_display_guidelines.md

 

seems to be primarily about one use case:  The user has somehow navigated to a site (through links, typing in, search, whatever), and is now looking at a web page; what should the UA show (in the address bar or any other security context) to validate “who is this coming from? Which site sent me this data? It says it’s my bank, did it come from them?”

 

The spec is only about the visual display and doesn’t cover copy/paste or save-as-bookmark or other operations which work with a different form.

 

Insofar as

*	normalization doesn’t change the visual display (?)
*	The URL that was actually used for DNS was normalized before use anyway

 

then at worst the advice to normalize is harmless. So I take back my critique.

 

 

https://www.m3aawg.org/sites/default/files/m3aawg-unicode-best-practices-2016-02.pdf

 

is about different use cases, with similar but not identical advice and requirements.

The original point, though, stands: the “Conservatism” principle doesn’t go far enough in warning those registering domains to avoid those labels which downstream processors will reject or display poorly.

 

 

Is it possible to get any statistics on DNS requests that include unnormalized strings? And some hits about where they come from?

I know at one time Scott Hollenbeck gathered some data on use of %xx encoding in DNS which I will forward.

 

--- Begin Message ---
I haven’t been following the discussion. You can post this info if you wish and identify me and Verisign as the source.

 

Scott

 

From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: Friday, March 14, 2014 12:21 PM
To: Hollenbeck, Scott
Subject: RE: Universal Acceptance of IDN TLDs

 

maybe send this stuff out to the URI list?

 

 

 

From: Hollenbeck, Scott [mailto:shollenbeck@verisign.com] 
Sent: Friday, March 14, 2014 4:34 AM
To: Larry Masinter
Subject: RE: Universal Acceptance of IDN TLDs

 

Larry, my researcher also studied the geolocation attributes of the source IP addresses responsible for queries containing “%xx”, the networks from which those queries originated, and the most commonly encoded characters.

 

Geolocation: 33% from China, 21% from the USA, long tail distribution on the rest.

 

ASNs: More than 6,000 unique ASN sources.

 

Top 10 most common characters (in descending order): %20, %2A, %2C, %2F, %40, %7C, %5c, %24, %25, %29

 

Scott

 

From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: Wednesday, March 05, 2014 1:06 PM
To: Hollenbeck, Scott
Subject: FW: Universal Acceptance of IDN TLDs

 

What happens when DNS gets queries for %xx-encoded names?

How difficult would it be to deploy %xx-utf8 queries to the equivalent punycode?

Of the IDNs in use, how many would exceed the DNS length limit when encoded as %xx (3 bytes / UTF-8 byte)?

 

 

 

 

From: Larry Masinter 
Sent: Wednesday, March 05, 2014 2:40 PM
To: 'Mark Davis ☕'
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: RE: Universal Acceptance of IDN TLDs

 

The handling of %xx-encoded domain names in DNS servers would be a fallback for use in legacy systems that are not IDN-aware.

 

So the length limit argument doesn’t carry a lot of weight – it is strictly a transitional deployment enhancement for working around legacy components which extract domain names from URIs but rcan only process 7-bit URIs and not 8-bit IRIs.

 

You can deploy IDNs when all of  the applications you care about will work for the users you care about for the DNS names you want to use.

 

Components that handle IRIs directly and pull out domain names for future processing shouldn’t ever need the %xx encoding, although decoding it is also a good idea.

 

 

 

From: mark.edward.davis@gmail.com <mailto:mark.edward.davis@gmail.com>  [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Wednesday, March 05, 2014 2:16 PM
To: Larry Masinter
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: Re: Universal Acceptance of IDN TLDs

 

If you mean having the DNS system natively accept %xx for domain labels as well as Punycode, I suspect that that ship has long since sailed. (That was one of the options discussed, but was turned down because of the length limitations.)

 

If on the other hand, you mean that client software should accept %xx notation as well as straight Unicode and punycode, that is another story. That can be handled by a client-side mapping, permitted by either IDNA2008 or UTS46. (And I agree that it's a good idea.)

 

With that, I could type in my address bar any of:

1.	xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
2.	IDNA-ȿ-x.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
3.	IDNA-%C8%BF-x.blogspot.com <http://BF-x.blogspot.com> 
4.	IDNA-Ȿ-X.BLOGSPOT.COM <http://xn--idna--x-lt7e.BLOGSPOT.COM> 
5.	IDNA-%E2%B1%BE-X.BLOGSPOT.COM <http://BE-X.BLOGSPOT.COM> 

And they'd all resolve to  <http://xn--idna--x-l6c.blogspot.com> xn--idna--x-l6c.blogspot.com.

1.	I just checked on Chrome, and all of these work.
2.	Firefox is a bit odd: if I type in the #3, it fails; *but* it converts it in the address bar, so a subsequent enter goes to the right place. #4/#5 just fail.
3.	Don't know about other browsers.




 

Mark <https://google.com/+MarkDavis> 

 

— Il meglio è l’inimico del bene —

 

On Wed, Mar 5, 2014 at 2:32 PM, Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote:

there’s a gap between IDN and URI in that IRI -> URI would prefer to use the %xx percent-hex URL encoding in general.

 

What would be preferable would be to insure that DNS requests for %xx encoded names is an acceptable alternative to punycode.

 

 

From: Andre Schappo [mailto:A.Schappo@lboro.ac.uk <mailto:A.Schappo@lboro.ac.uk> ] 
Sent: Tuesday, March 04, 2014 3:51 PM
To: www International
Cc: Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs

 

① Is this document available online? I have looked round http://aptld.org but cannot find it.

② There are indeed barriers to the effective, real world use of IDNs. A fundamental problem is that IDNs, in general, are not properly catered for and not properly integrated into systems. One reason often quoted for treating IDNs differently is "Security". Well, I posit that any IDN security issues pale in comparison to the ubiquitous "… for further information please click here."

Here are some examples from Social Media:

Twitter

If the Unicode form is entered —

#test  http:// <http://北大.中国> 北大.中国

It is not recognised as a Domain Name & not displayed as clickable link

If the punycode form is entered —

#test http://xn--djry4l.xn--fiqs8s

It is now recognised as a Domain Name and displayed as a clickable link but displayed as punycode instead of Unicode

Sina Weibo

Same results —
#test# http:// <http://北大.中国> 北大.中国
#test# http://xn--djry4l.xn--fiqs8s

There is also the related issue of having to Percent Encode the Unicode pathname components of a URL.

③ In my experience, another fundamental problem is the lack of IT Internationalization teaching in Schools and Universities. Certainly in England, IT Internationalisation has not yet become an accepted part of the curriculum. We need to produce students that have an appreciation/understanding of IT Internationalisation in order to, amongst other goals, properly integrate IDNs into systems/apps/websites …etc…

For several years I have been teaching a module entitled "International Computing" which covers several aspects of IT i18n. One of the topics I cover is IDNs :) And I am keeping my students up to date with the idn new gTLDs as they are delegated to DNS Root :)

During my years teaching this module I have found few students (regardless of which country they come from) with even a basic appreciation of IT Internationalization because it is a topic that was never discussed/raised in their prior studies.

So, any initiative in "to improve the use of IDN TLDs in the real world" should get Universities onboard and encourage Universities/Schools to teach "IT Internationalization"

André
http://schappo.blogspot.co.uk

 

On 4 Mar 2014, at 12:14, Richard Ishida wrote:

 

I was contacted last week by Don Hollander, General Manager of the Asia Pacific Top Level Domain Association, who is trying to improve the use of IDN TLDs in the real world, and looking for support.

See the attached PDF (from him) outlining what are the barriers to the effective use of IDN TLDs and who can help address these issues.

He's hoping to create a community of interested stakeholders. He expects this community to include ICANN, many ccTLDs, ISOC, and hopefully commercial developers. He is also looking to set up some opportunities to meet and discuss how to move things forward.

If you are interested in getting involved, please raise your voice.

Don says "There is a HUGE population with interest in this - but it is not really the current 2Billion, but the next 2 Billion - those who aren’t yet connected."

RI
<Addressing the issue of Universal Acceptance of IDN TLDs-1.pdf>

 

马馬骉驫马馬骉驫马馬骉驫马馬骉驫
http://twitter.com/andreschappo
http://schappo.blogspot.co.uk
http://weibo.com/andreschappo
http://blog.sina.com.cn/andreschappo

 

 

--- End Message ---
--- Begin Message ---
maybe send this stuff out to the URI list?
 
 
 
From: Hollenbeck, Scott [mailto:shollenbeck@verisign.com] 
Sent: Friday, March 14, 2014 4:34 AM
To: Larry Masinter
Subject: RE: Universal Acceptance of IDN TLDs
 
Larry, my researcher also studied the geolocation attributes of the source IP addresses responsible for queries containing “%xx”, the networks from which those queries originated, and the most commonly encoded characters.
 
Geolocation: 33% from China, 21% from the USA, long tail distribution on the rest.
 
ASNs: More than 6,000 unique ASN sources.
 
Top 10 most common characters (in descending order): %20, %2A, %2C, %2F, %40, %7C, %5c, %24, %25, %29
 
Scott
 
From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: Wednesday, March 05, 2014 1:06 PM
To: Hollenbeck, Scott
Subject: FW: Universal Acceptance of IDN TLDs
 
What happens when DNS gets queries for %xx-encoded names?
How difficult would it be to deploy %xx-utf8 queries to the equivalent punycode?
Of the IDNs in use, how many would exceed the DNS length limit when encoded as %xx (3 bytes / UTF-8 byte)?
 
 
 
 
From: Larry Masinter 
Sent: Wednesday, March 05, 2014 2:40 PM
To: 'Mark Davis ☕'
Cc: Andre Schappo; www International; Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: RE: Universal Acceptance of IDN TLDs
 
The handling of %xx-encoded domain names in DNS servers would be a fallback for use in legacy systems that are not IDN-aware.
 
So the length limit argument doesn’t carry a lot of weight – it is strictly a transitional deployment enhancement for working around legacy components which extract domain names from URIs but rcan only process 7-bit URIs and not 8-bit IRIs.
 
You can deploy IDNs when all of  the applications you care about will work for the users you care about for the DNS names you want to use.
 
Components that handle IRIs directly and pull out domain names for future processing shouldn’t ever need the %xx encoding, although decoding it is also a good idea.
 
 
 
From: mark.edward.davis@gmail.com <mailto:mark.edward.davis@gmail.com>  [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Wednesday, March 05, 2014 2:16 PM
To: Larry Masinter
Cc: Andre Schappo; www International; Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs
 
If you mean having the DNS system natively accept %xx for domain labels as well as Punycode, I suspect that that ship has long since sailed. (That was one of the options discussed, but was turned down because of the length limitations.)
 
If on the other hand, you mean that client software should accept %xx notation as well as straight Unicode and punycode, that is another story. That can be handled by a client-side mapping, permitted by either IDNA2008 or UTS46. (And I agree that it's a good idea.)
 
With that, I could type in my address bar any of:
1.	xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
2.	IDNA-ȿ-x.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
3.	IDNA-%C8%BF-x.blogspot.com <http://BF-x.blogspot.com> 
4.	IDNA-Ȿ-X.BLOGSPOT.COM <http://xn--idna--x-lt7e.BLOGSPOT.COM> 
5.	IDNA-%E2%B1%BE-X.BLOGSPOT.COM <http://BE-X.BLOGSPOT.COM> 
And they'd all resolve to xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> .
1.	I just checked on Chrome, and all of these work.
2.	Firefox is a bit odd: if I type in the #3, it fails; *but* it converts it in the address bar, so a subsequent enter goes to the right place. #4/#5 just fail.
3.	Don't know about other browsers.


 
Mark <https://google.com/+MarkDavis> 
 
— Il meglio è l’inimico del bene —
 
On Wed, Mar 5, 2014 at 2:32 PM, Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote:
there’s a gap between IDN and URI in that IRI -> URI would prefer to use the %xx percent-hex URL encoding in general.
 
What would be preferable would be to insure that DNS requests for %xx encoded names is an acceptable alternative to punycode.
 
 
From: Andre Schappo [mailto:A.Schappo@lboro.ac.uk <mailto:A.Schappo@lboro.ac.uk> ] 
Sent: Tuesday, March 04, 2014 3:51 PM
To: www International
Cc: Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs
 
① Is this document available online? I have looked round http://aptld.org but cannot find it.

② There are indeed barriers to the effective, real world use of IDNs. A fundamental problem is that IDNs, in general, are not properly catered for and not properly integrated into systems. One reason often quoted for treating IDNs differently is "Security". Well, I posit that any IDN security issues pale in comparison to the ubiquitous "… for further information please click here."

Here are some examples from Social Media:

Twitter

If the Unicode form is entered —

#test  http:// <http://北大.中国> 北大.中国

It is not recognised as a Domain Name & not displayed as clickable link

If the punycode form is entered —

#test http://xn--djry4l.xn--fiqs8s

It is now recognised as a Domain Name and displayed as a clickable link but displayed as punycode instead of Unicode

Sina Weibo

Same results —
#test# http:// <http://北大.中国> 北大.中国
#test# http://xn--djry4l.xn--fiqs8s

There is also the related issue of having to Percent Encode the Unicode pathname components of a URL.

③ In my experience, another fundamental problem is the lack of IT Internationalization teaching in Schools and Universities. Certainly in England, IT Internationalisation has not yet become an accepted part of the curriculum. We need to produce students that have an appreciation/understanding of IT Internationalisation in order to, amongst other goals, properly integrate IDNs into systems/apps/websites …etc…

For several years I have been teaching a module entitled "International Computing" which covers several aspects of IT i18n. One of the topics I cover is IDNs :) And I am keeping my students up to date with the idn new gTLDs as they are delegated to DNS Root :)

During my years teaching this module I have found few students (regardless of which country they come from) with even a basic appreciation of IT Internationalization because it is a topic that was never discussed/raised in their prior studies.

So, any initiative in "to improve the use of IDN TLDs in the real world" should get Universities onboard and encourage Universities/Schools to teach "IT Internationalization"

André
http://schappo.blogspot.co.uk
 
On 4 Mar 2014, at 12:14, Richard Ishida wrote:
 
I was contacted last week by Don Hollander, General Manager of the Asia Pacific Top Level Domain Association, who is trying to improve the use of IDN TLDs in the real world, and looking for support.

See the attached PDF (from him) outlining what are the barriers to the effective use of IDN TLDs and who can help address these issues.

He's hoping to create a community of interested stakeholders. He expects this community to include ICANN, many ccTLDs, ISOC, and hopefully commercial developers. He is also looking to set up some opportunities to meet and discuss how to move things forward.

If you are interested in getting involved, please raise your voice.

Don says "There is a HUGE population with interest in this - but it is not really the current 2Billion, but the next 2 Billion - those who aren’t yet connected."

RI
<Addressing the issue of Universal Acceptance of IDN TLDs-1.pdf>
 
马馬骉驫马馬骉驫马馬骉驫马馬骉驫
http://twitter.com/andreschappo
http://schappo.blogspot.co.uk
http://weibo.com/andreschappo
http://blog.sina.com.cn/andreschappo
 
 
--- End Message ---
--- Begin Message ---
Larry, my researcher also studied the geolocation attributes of the source IP addresses responsible for queries containing “%xx”, the networks from which those queries originated, and the most commonly encoded characters.

 

Geolocation: 33% from China, 21% from the USA, long tail distribution on the rest.

 

ASNs: More than 6,000 unique ASN sources.

 

Top 10 most common characters (in descending order): %20, %2A, %2C, %2F, %40, %7C, %5c, %24, %25, %29

 

Scott

 

From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: Wednesday, March 05, 2014 1:06 PM
To: Hollenbeck, Scott
Subject: FW: Universal Acceptance of IDN TLDs

 

What happens when DNS gets queries for %xx-encoded names?

How difficult would it be to deploy %xx-utf8 queries to the equivalent punycode?

Of the IDNs in use, how many would exceed the DNS length limit when encoded as %xx (3 bytes / UTF-8 byte)?

 

 

 

 

From: Larry Masinter 
Sent: Wednesday, March 05, 2014 2:40 PM
To: 'Mark Davis ☕'
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: RE: Universal Acceptance of IDN TLDs

 

The handling of %xx-encoded domain names in DNS servers would be a fallback for use in legacy systems that are not IDN-aware.

 

So the length limit argument doesn’t carry a lot of weight – it is strictly a transitional deployment enhancement for working around legacy components which extract domain names from URIs but rcan only process 7-bit URIs and not 8-bit IRIs.

 

You can deploy IDNs when all of  the applications you care about will work for the users you care about for the DNS names you want to use.

 

Components that handle IRIs directly and pull out domain names for future processing shouldn’t ever need the %xx encoding, although decoding it is also a good idea.

 

 

 

From: mark.edward.davis@gmail.com <mailto:mark.edward.davis@gmail.com>  [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Wednesday, March 05, 2014 2:16 PM
To: Larry Masinter
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: Re: Universal Acceptance of IDN TLDs

 

If you mean having the DNS system natively accept %xx for domain labels as well as Punycode, I suspect that that ship has long since sailed. (That was one of the options discussed, but was turned down because of the length limitations.)

 

If on the other hand, you mean that client software should accept %xx notation as well as straight Unicode and punycode, that is another story. That can be handled by a client-side mapping, permitted by either IDNA2008 or UTS46. (And I agree that it's a good idea.)

 

With that, I could type in my address bar any of:

1.	xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
2.	IDNA-ȿ-x.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
3.	IDNA-%C8%BF-x.blogspot.com <http://BF-x.blogspot.com> 
4.	IDNA-Ȿ-X.BLOGSPOT.COM <http://xn--idna--x-lt7e.BLOGSPOT.COM> 
5.	IDNA-%E2%B1%BE-X.BLOGSPOT.COM <http://BE-X.BLOGSPOT.COM> 

And they'd all resolve to  <http://xn--idna--x-l6c.blogspot.com> xn--idna--x-l6c.blogspot.com.

1.	I just checked on Chrome, and all of these work.
2.	Firefox is a bit odd: if I type in the #3, it fails; *but* it converts it in the address bar, so a subsequent enter goes to the right place. #4/#5 just fail.
3.	Don't know about other browsers.




 

Mark <https://google.com/+MarkDavis> 

 

— Il meglio è l’inimico del bene —

 

On Wed, Mar 5, 2014 at 2:32 PM, Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote:

there’s a gap between IDN and URI in that IRI -> URI would prefer to use the %xx percent-hex URL encoding in general.

 

What would be preferable would be to insure that DNS requests for %xx encoded names is an acceptable alternative to punycode.

 

 

From: Andre Schappo [mailto:A.Schappo@lboro.ac.uk <mailto:A.Schappo@lboro.ac.uk> ] 
Sent: Tuesday, March 04, 2014 3:51 PM
To: www International
Cc: Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs

 

① Is this document available online? I have looked round http://aptld.org but cannot find it.

② There are indeed barriers to the effective, real world use of IDNs. A fundamental problem is that IDNs, in general, are not properly catered for and not properly integrated into systems. One reason often quoted for treating IDNs differently is "Security". Well, I posit that any IDN security issues pale in comparison to the ubiquitous "… for further information please click here."

Here are some examples from Social Media:

Twitter

If the Unicode form is entered —

#test  http:// <http://北大.中国> 北大.中国

It is not recognised as a Domain Name & not displayed as clickable link

If the punycode form is entered —

#test http://xn--djry4l.xn--fiqs8s

It is now recognised as a Domain Name and displayed as a clickable link but displayed as punycode instead of Unicode

Sina Weibo

Same results —
#test# http:// <http://北大.中国> 北大.中国
#test# http://xn--djry4l.xn--fiqs8s

There is also the related issue of having to Percent Encode the Unicode pathname components of a URL.

③ In my experience, another fundamental problem is the lack of IT Internationalization teaching in Schools and Universities. Certainly in England, IT Internationalisation has not yet become an accepted part of the curriculum. We need to produce students that have an appreciation/understanding of IT Internationalisation in order to, amongst other goals, properly integrate IDNs into systems/apps/websites …etc…

For several years I have been teaching a module entitled "International Computing" which covers several aspects of IT i18n. One of the topics I cover is IDNs :) And I am keeping my students up to date with the idn new gTLDs as they are delegated to DNS Root :)

During my years teaching this module I have found few students (regardless of which country they come from) with even a basic appreciation of IT Internationalization because it is a topic that was never discussed/raised in their prior studies.

So, any initiative in "to improve the use of IDN TLDs in the real world" should get Universities onboard and encourage Universities/Schools to teach "IT Internationalization"

André
http://schappo.blogspot.co.uk

 

On 4 Mar 2014, at 12:14, Richard Ishida wrote:

 

I was contacted last week by Don Hollander, General Manager of the Asia Pacific Top Level Domain Association, who is trying to improve the use of IDN TLDs in the real world, and looking for support.

See the attached PDF (from him) outlining what are the barriers to the effective use of IDN TLDs and who can help address these issues.

He's hoping to create a community of interested stakeholders. He expects this community to include ICANN, many ccTLDs, ISOC, and hopefully commercial developers. He is also looking to set up some opportunities to meet and discuss how to move things forward.

If you are interested in getting involved, please raise your voice.

Don says "There is a HUGE population with interest in this - but it is not really the current 2Billion, but the next 2 Billion - those who aren’t yet connected."

RI
<Addressing the issue of Universal Acceptance of IDN TLDs-1.pdf>

 

马馬骉驫马馬骉驫马馬骉驫马馬骉驫
http://twitter.com/andreschappo
http://schappo.blogspot.co.uk
http://weibo.com/andreschappo
http://blog.sina.com.cn/andreschappo

 

 

--- End Message ---
--- Begin Message ---
OK, more research done: I just did some testing with all of the xn-- domain names registered in .com. Unfortunately I can’t provide absolute numbers without getting clearance from my Legal department. What I can say is that approximately 14.5% of the registered names would produce “%xx” labels longer than 63 characters. Here’s what I did using PHP:

 

Started with a list of all xn-- names.

Stripped the .com characters from the end of the name.

Decoded the resulting a-labels using a punycode decoder, producing u-labels.

Used the PHP urlencode() function to convert Unicode characters to %xx characters.

Measured the length of the resulting character string and incremented counters appropriately.

 

Scott

 

From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: Wednesday, March 05, 2014 1:06 PM
To: Hollenbeck, Scott
Subject: FW: Universal Acceptance of IDN TLDs

 

What happens when DNS gets queries for %xx-encoded names?

How difficult would it be to deploy %xx-utf8 queries to the equivalent punycode?

Of the IDNs in use, how many would exceed the DNS length limit when encoded as %xx (3 bytes / UTF-8 byte)?

 

 

 

 

From: Larry Masinter 
Sent: Wednesday, March 05, 2014 2:40 PM
To: 'Mark Davis ☕'
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: RE: Universal Acceptance of IDN TLDs

 

The handling of %xx-encoded domain names in DNS servers would be a fallback for use in legacy systems that are not IDN-aware.

 

So the length limit argument doesn’t carry a lot of weight – it is strictly a transitional deployment enhancement for working around legacy components which extract domain names from URIs but rcan only process 7-bit URIs and not 8-bit IRIs.

 

You can deploy IDNs when all of  the applications you care about will work for the users you care about for the DNS names you want to use.

 

Components that handle IRIs directly and pull out domain names for future processing shouldn’t ever need the %xx encoding, although decoding it is also a good idea.

 

 

 

From: mark.edward.davis@gmail.com <mailto:mark.edward.davis@gmail.com>  [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Wednesday, March 05, 2014 2:16 PM
To: Larry Masinter
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: Re: Universal Acceptance of IDN TLDs

 

If you mean having the DNS system natively accept %xx for domain labels as well as Punycode, I suspect that that ship has long since sailed. (That was one of the options discussed, but was turned down because of the length limitations.)

 

If on the other hand, you mean that client software should accept %xx notation as well as straight Unicode and punycode, that is another story. That can be handled by a client-side mapping, permitted by either IDNA2008 or UTS46. (And I agree that it's a good idea.)

 

With that, I could type in my address bar any of:

1.	xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
2.	IDNA-ȿ-x.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
3.	IDNA-%C8%BF-x.blogspot.com <http://BF-x.blogspot.com> 
4.	IDNA-Ȿ-X.BLOGSPOT.COM <http://xn--idna--x-lt7e.BLOGSPOT.COM> 
5.	IDNA-%E2%B1%BE-X.BLOGSPOT.COM <http://BE-X.BLOGSPOT.COM> 

And they'd all resolve to  <http://xn--idna--x-l6c.blogspot.com> xn--idna--x-l6c.blogspot.com.

1.	I just checked on Chrome, and all of these work.
2.	Firefox is a bit odd: if I type in the #3, it fails; *but* it converts it in the address bar, so a subsequent enter goes to the right place. #4/#5 just fail.
3.	Don't know about other browsers.




 

Mark <https://google.com/+MarkDavis> 

 

— Il meglio è l’inimico del bene —

 

On Wed, Mar 5, 2014 at 2:32 PM, Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote:

there’s a gap between IDN and URI in that IRI -> URI would prefer to use the %xx percent-hex URL encoding in general.

 

What would be preferable would be to insure that DNS requests for %xx encoded names is an acceptable alternative to punycode.

 

 

From: Andre Schappo [mailto:A.Schappo@lboro.ac.uk <mailto:A.Schappo@lboro.ac.uk> ] 
Sent: Tuesday, March 04, 2014 3:51 PM
To: www International
Cc: Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs

 

① Is this document available online? I have looked round http://aptld.org but cannot find it.

② There are indeed barriers to the effective, real world use of IDNs. A fundamental problem is that IDNs, in general, are not properly catered for and not properly integrated into systems. One reason often quoted for treating IDNs differently is "Security". Well, I posit that any IDN security issues pale in comparison to the ubiquitous "… for further information please click here."

Here are some examples from Social Media:

Twitter

If the Unicode form is entered —

#test  http:// <http://北大.中国> 北大.中国

It is not recognised as a Domain Name & not displayed as clickable link

If the punycode form is entered —

#test http://xn--djry4l.xn--fiqs8s

It is now recognised as a Domain Name and displayed as a clickable link but displayed as punycode instead of Unicode

Sina Weibo

Same results —
#test# http:// <http://北大.中国> 北大.中国
#test# http://xn--djry4l.xn--fiqs8s

There is also the related issue of having to Percent Encode the Unicode pathname components of a URL.

③ In my experience, another fundamental problem is the lack of IT Internationalization teaching in Schools and Universities. Certainly in England, IT Internationalisation has not yet become an accepted part of the curriculum. We need to produce students that have an appreciation/understanding of IT Internationalisation in order to, amongst other goals, properly integrate IDNs into systems/apps/websites …etc…

For several years I have been teaching a module entitled "International Computing" which covers several aspects of IT i18n. One of the topics I cover is IDNs :) And I am keeping my students up to date with the idn new gTLDs as they are delegated to DNS Root :)

During my years teaching this module I have found few students (regardless of which country they come from) with even a basic appreciation of IT Internationalization because it is a topic that was never discussed/raised in their prior studies.

So, any initiative in "to improve the use of IDN TLDs in the real world" should get Universities onboard and encourage Universities/Schools to teach "IT Internationalization"

André
http://schappo.blogspot.co.uk

 

On 4 Mar 2014, at 12:14, Richard Ishida wrote:

 

I was contacted last week by Don Hollander, General Manager of the Asia Pacific Top Level Domain Association, who is trying to improve the use of IDN TLDs in the real world, and looking for support.

See the attached PDF (from him) outlining what are the barriers to the effective use of IDN TLDs and who can help address these issues.

He's hoping to create a community of interested stakeholders. He expects this community to include ICANN, many ccTLDs, ISOC, and hopefully commercial developers. He is also looking to set up some opportunities to meet and discuss how to move things forward.

If you are interested in getting involved, please raise your voice.

Don says "There is a HUGE population with interest in this - but it is not really the current 2Billion, but the next 2 Billion - those who aren’t yet connected."

RI
<Addressing the issue of Universal Acceptance of IDN TLDs-1.pdf>

 

马馬骉驫马馬骉驫马馬骉驫马馬骉驫
http://twitter.com/andreschappo
http://schappo.blogspot.co.uk
http://weibo.com/andreschappo
http://blog.sina.com.cn/andreschappo

 

 

--- End Message ---
--- Begin Message ---
Larry,

 

Right now we return rcode 3 when we receive a query for %xx-encoded names. I had one of my engineers do a little research into the DNS queries we received for com/net domain names that contain a “%” character in the qname. He looked at queries received from January 1, 2014 through March 4, 2014. The average number of unique qnames was 279,566 with a high of 337,498 and a low of 221,130. With repeated queries included we saw an average of 1,574,793 queries per day with a high of 1,817,813 and a low of 1,199,300.

 

I’m still looking into question #3. The answer to question #2 is probably academic because it would be impossible for us to deploy anything that isn’t part of the standards identified in our contracts with ICANN.

 

Scott

 

From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: Wednesday, March 05, 2014 1:06 PM
To: Hollenbeck, Scott
Subject: FW: Universal Acceptance of IDN TLDs

 

What happens when DNS gets queries for %xx-encoded names?

How difficult would it be to deploy %xx-utf8 queries to the equivalent punycode?

Of the IDNs in use, how many would exceed the DNS length limit when encoded as %xx (3 bytes / UTF-8 byte)?

 

 

 

 

From: Larry Masinter 
Sent: Wednesday, March 05, 2014 2:40 PM
To: 'Mark Davis ☕'
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: RE: Universal Acceptance of IDN TLDs

 

The handling of %xx-encoded domain names in DNS servers would be a fallback for use in legacy systems that are not IDN-aware.

 

So the length limit argument doesn’t carry a lot of weight – it is strictly a transitional deployment enhancement for working around legacy components which extract domain names from URIs but rcan only process 7-bit URIs and not 8-bit IRIs.

 

You can deploy IDNs when all of  the applications you care about will work for the users you care about for the DNS names you want to use.

 

Components that handle IRIs directly and pull out domain names for future processing shouldn’t ever need the %xx encoding, although decoding it is also a good idea.

 

 

 

From:  <mailto:mark.edward.davis@gmail.com> mark.edward.davis@gmail.com [ <mailto:mark.edward.davis@gmail.com> mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Wednesday, March 05, 2014 2:16 PM
To: Larry Masinter
Cc: Andre Schappo; www International; Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: Re: Universal Acceptance of IDN TLDs

 

If you mean having the DNS system natively accept %xx for domain labels as well as Punycode, I suspect that that ship has long since sailed. (That was one of the options discussed, but was turned down because of the length limitations.)

 

If on the other hand, you mean that client software should accept %xx notation as well as straight Unicode and punycode, that is another story. That can be handled by a client-side mapping, permitted by either IDNA2008 or UTS46. (And I agree that it's a good idea.)

 

With that, I could type in my address bar any of:

1.	xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
2.	IDNA-ȿ-x.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
3.	IDNA-%C8%BF-x.blogspot.com <http://BF-x.blogspot.com> 
4.	IDNA-Ȿ-X.BLOGSPOT.COM <http://xn--idna--x-lt7e.BLOGSPOT.COM> 
5.	IDNA-%E2%B1%BE-X.BLOGSPOT.COM <http://BE-X.BLOGSPOT.COM> 

And they'd all resolve to  <http://xn--idna--x-l6c.blogspot.com> xn--idna--x-l6c.blogspot.com.

1.	I just checked on Chrome, and all of these work.
2.	Firefox is a bit odd: if I type in the #3, it fails; *but* it converts it in the address bar, so a subsequent enter goes to the right place. #4/#5 just fail.
3.	Don't know about other browsers.




 

Mark <https://google.com/+MarkDavis> 

 

— Il meglio è l’inimico del bene —

 

On Wed, Mar 5, 2014 at 2:32 PM, Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote:

there’s a gap between IDN and URI in that IRI -> URI would prefer to use the %xx percent-hex URL encoding in general.

 

What would be preferable would be to insure that DNS requests for %xx encoded names is an acceptable alternative to punycode.

 

 

From: Andre Schappo [mailto: <mailto:A.Schappo@lboro.ac.uk> A.Schappo@lboro.ac.uk] 
Sent: Tuesday, March 04, 2014 3:51 PM
To: www International
Cc: Don Hollander;  <mailto:public-iri@w3.org> public-iri@w3.org
Subject: Re: Universal Acceptance of IDN TLDs

 

① Is this document available online? I have looked round http://aptld.org but cannot find it.

② There are indeed barriers to the effective, real world use of IDNs. A fundamental problem is that IDNs, in general, are not properly catered for and not properly integrated into systems. One reason often quoted for treating IDNs differently is "Security". Well, I posit that any IDN security issues pale in comparison to the ubiquitous "… for further information please click here."

Here are some examples from Social Media:

Twitter

If the Unicode form is entered —

#test  http:// <http://北大.中国> 北大.中国

It is not recognised as a Domain Name & not displayed as clickable link

If the punycode form is entered —

#test http://xn--djry4l.xn--fiqs8s

It is now recognised as a Domain Name and displayed as a clickable link but displayed as punycode instead of Unicode

Sina Weibo

Same results —
#test# http:// <http://北大.中国> 北大.中国
#test# http://xn--djry4l.xn--fiqs8s

There is also the related issue of having to Percent Encode the Unicode pathname components of a URL.

③ In my experience, another fundamental problem is the lack of IT Internationalization teaching in Schools and Universities. Certainly in England, IT Internationalisation has not yet become an accepted part of the curriculum. We need to produce students that have an appreciation/understanding of IT Internationalisation in order to, amongst other goals, properly integrate IDNs into systems/apps/websites …etc…

For several years I have been teaching a module entitled "International Computing" which covers several aspects of IT i18n. One of the topics I cover is IDNs :) And I am keeping my students up to date with the idn new gTLDs as they are delegated to DNS Root :)

During my years teaching this module I have found few students (regardless of which country they come from) with even a basic appreciation of IT Internationalization because it is a topic that was never discussed/raised in their prior studies.

So, any initiative in "to improve the use of IDN TLDs in the real world" should get Universities onboard and encourage Universities/Schools to teach "IT Internationalization"

André
http://schappo.blogspot.co.uk

 

On 4 Mar 2014, at 12:14, Richard Ishida wrote:

 

I was contacted last week by Don Hollander, General Manager of the Asia Pacific Top Level Domain Association, who is trying to improve the use of IDN TLDs in the real world, and looking for support.

See the attached PDF (from him) outlining what are the barriers to the effective use of IDN TLDs and who can help address these issues.

He's hoping to create a community of interested stakeholders. He expects this community to include ICANN, many ccTLDs, ISOC, and hopefully commercial developers. He is also looking to set up some opportunities to meet and discuss how to move things forward.

If you are interested in getting involved, please raise your voice.

Don says "There is a HUGE population with interest in this - but it is not really the current 2Billion, but the next 2 Billion - those who aren’t yet connected."

RI
<Addressing the issue of Universal Acceptance of IDN TLDs-1.pdf>

 

马馬骉驫马馬骉驫马馬骉驫马馬骉驫
 <http://twitter.com/andreschappo> http://twitter.com/andreschappo
 <http://schappo.blogspot.co.uk> http://schappo.blogspot.co.uk
 <http://weibo.com/andreschappo> http://weibo.com/andreschappo
 <http://blog.sina.com.cn/andreschappo> http://blog.sina.com.cn/andreschappo

 

 

--- End Message ---
--- Begin Message ---
What happens when DNS gets queries for %xx-encoded names?
How difficult would it be to deploy %xx-utf8 queries to the equivalent punycode?
Of the IDNs in use, how many would exceed the DNS length limit when encoded as %xx (3 bytes / UTF-8 byte)?
 
 
 
 
From: Larry Masinter 
Sent: Wednesday, March 05, 2014 2:40 PM
To: 'Mark Davis ☕'
Cc: Andre Schappo; www International; Don Hollander; public-iri@w3.org
Subject: RE: Universal Acceptance of IDN TLDs
 
The handling of %xx-encoded domain names in DNS servers would be a fallback for use in legacy systems that are not IDN-aware.
 
So the length limit argument doesn’t carry a lot of weight – it is strictly a transitional deployment enhancement for working around legacy components which extract domain names from URIs but rcan only process 7-bit URIs and not 8-bit IRIs.
 
You can deploy IDNs when all of  the applications you care about will work for the users you care about for the DNS names you want to use.
 
Components that handle IRIs directly and pull out domain names for future processing shouldn’t ever need the %xx encoding, although decoding it is also a good idea.
 
 
 
From: mark.edward.davis@gmail.com <mailto:mark.edward.davis@gmail.com>  [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Wednesday, March 05, 2014 2:16 PM
To: Larry Masinter
Cc: Andre Schappo; www International; Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs
 
If you mean having the DNS system natively accept %xx for domain labels as well as Punycode, I suspect that that ship has long since sailed. (That was one of the options discussed, but was turned down because of the length limitations.)
 
If on the other hand, you mean that client software should accept %xx notation as well as straight Unicode and punycode, that is another story. That can be handled by a client-side mapping, permitted by either IDNA2008 or UTS46. (And I agree that it's a good idea.)
 
With that, I could type in my address bar any of:
1.	xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
2.	IDNA-ȿ-x.blogspot.com <http://xn--idna--x-l6c.blogspot.com> 
3.	IDNA-%C8%BF-x.blogspot.com <http://BF-x.blogspot.com> 
4.	IDNA-Ȿ-X.BLOGSPOT.COM <http://xn--idna--x-lt7e.BLOGSPOT.COM> 
5.	IDNA-%E2%B1%BE-X.BLOGSPOT.COM <http://BE-X.BLOGSPOT.COM> 
And they'd all resolve to xn--idna--x-l6c.blogspot.com <http://xn--idna--x-l6c.blogspot.com> .
1.	I just checked on Chrome, and all of these work.
2.	Firefox is a bit odd: if I type in the #3, it fails; *but* it converts it in the address bar, so a subsequent enter goes to the right place. #4/#5 just fail.
3.	Don't know about other browsers.


 
Mark <https://google.com/+MarkDavis> 
 
— Il meglio è l’inimico del bene —
 
On Wed, Mar 5, 2014 at 2:32 PM, Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote:
there’s a gap between IDN and URI in that IRI -> URI would prefer to use the %xx percent-hex URL encoding in general.
 
What would be preferable would be to insure that DNS requests for %xx encoded names is an acceptable alternative to punycode.
 
 
From: Andre Schappo [mailto:A.Schappo@lboro.ac.uk <mailto:A.Schappo@lboro.ac.uk> ] 
Sent: Tuesday, March 04, 2014 3:51 PM
To: www International
Cc: Don Hollander; public-iri@w3.org <mailto:public-iri@w3.org> 
Subject: Re: Universal Acceptance of IDN TLDs
 
① Is this document available online? I have looked round http://aptld.org but cannot find it.

② There are indeed barriers to the effective, real world use of IDNs. A fundamental problem is that IDNs, in general, are not properly catered for and not properly integrated into systems. One reason often quoted for treating IDNs differently is "Security". Well, I posit that any IDN security issues pale in comparison to the ubiquitous "… for further information please click here."

Here are some examples from Social Media:

Twitter

If the Unicode form is entered —

#test  http:// <http://北大.中国> 北大.中国

It is not recognised as a Domain Name & not displayed as clickable link

If the punycode form is entered —

#test http://xn--djry4l.xn--fiqs8s

It is now recognised as a Domain Name and displayed as a clickable link but displayed as punycode instead of Unicode

Sina Weibo

Same results —
#test# http:// <http://北大.中国> 北大.中国
#test# http://xn--djry4l.xn--fiqs8s

There is also the related issue of having to Percent Encode the Unicode pathname components of a URL.

③ In my experience, another fundamental problem is the lack of IT Internationalization teaching in Schools and Universities. Certainly in England, IT Internationalisation has not yet become an accepted part of the curriculum. We need to produce students that have an appreciation/understanding of IT Internationalisation in order to, amongst other goals, properly integrate IDNs into systems/apps/websites …etc…

For several years I have been teaching a module entitled "International Computing" which covers several aspects of IT i18n. One of the topics I cover is IDNs :) And I am keeping my students up to date with the idn new gTLDs as they are delegated to DNS Root :)

During my years teaching this module I have found few students (regardless of which country they come from) with even a basic appreciation of IT Internationalization because it is a topic that was never discussed/raised in their prior studies.

So, any initiative in "to improve the use of IDN TLDs in the real world" should get Universities onboard and encourage Universities/Schools to teach "IT Internationalization"

André
http://schappo.blogspot.co.uk
 
On 4 Mar 2014, at 12:14, Richard Ishida wrote:
 
I was contacted last week by Don Hollander, General Manager of the Asia Pacific Top Level Domain Association, who is trying to improve the use of IDN TLDs in the real world, and looking for support.

See the attached PDF (from him) outlining what are the barriers to the effective use of IDN TLDs and who can help address these issues.

He's hoping to create a community of interested stakeholders. He expects this community to include ICANN, many ccTLDs, ISOC, and hopefully commercial developers. He is also looking to set up some opportunities to meet and discuss how to move things forward.

If you are interested in getting involved, please raise your voice.

Don says "There is a HUGE population with interest in this - but it is not really the current 2Billion, but the next 2 Billion - those who aren’t yet connected."

RI
<Addressing the issue of Universal Acceptance of IDN TLDs-1.pdf>
 
马馬骉驫马馬骉驫马馬骉驫马馬骉驫
http://twitter.com/andreschappo
http://schappo.blogspot.co.uk
http://weibo.com/andreschappo
http://blog.sina.com.cn/andreschappo
 
 
--- End Message ---