Re: internationalization of URIs

Martin Duerst <duerst@it.aoyama.ac.jp> Tue, 16 October 2007 00:20 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IhaAu-0005FO-Nk; Mon, 15 Oct 2007 20:20:44 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IhaAt-0005F3-4a for discuss-confirm+ok@megatron.ietf.org; Mon, 15 Oct 2007 20:20:43 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IhaAs-0005Dv-8a for discuss@apps.ietf.org; Mon, 15 Oct 2007 20:20:42 -0400
Received: from scmailgw1.scop.aoyama.ac.jp ([133.2.251.194]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IhaAh-0002QN-OD for discuss@apps.ietf.org; Mon, 15 Oct 2007 20:20:38 -0400
Received: from scmse1.scbb.aoyama.ac.jp (scmse1 [133.2.253.16]) by scmailgw1.scop.aoyama.ac.jp (secret/secret) with SMTP id l9G0JcNF027049 for <discuss@apps.ietf.org>; Tue, 16 Oct 2007 09:19:39 +0900 (JST)
Received: from (133.2.206.133) by scmse1.scbb.aoyama.ac.jp via smtp id 6072_7b89f8bc_7b7d_11dc_9a27_0014221fa3c9; Tue, 16 Oct 2007 09:19:38 +0900
X-AuthUser: duerst@it.aoyama.ac.jp
Received: from Tanzawa.it.aoyama.ac.jp ([133.2.210.1]:34051) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S181E4E> for <discuss@apps.ietf.org> from <duerst@it.aoyama.ac.jp>; Tue, 16 Oct 2007 09:16:05 +0900
Message-Id: <6.0.0.20.2.20071016083218.06e24710@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6J
Date: Tue, 16 Oct 2007 09:18:41 +0900
To: Thomas Narten <narten@us.ibm.com>, discuss@apps.ietf.org
From: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: internationalization of URIs
In-Reply-To: <200710151939.l9FJdIkM003350@localhost.localdomain>
References: <200710151939.l9FJdIkM003350@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e1b0e72ff1bbd457ceef31828f216a86
Cc:
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

Hello Thomas,

I have some good news for you:

For 'the rest of the URI', please have a look at RFC 3987
Internationalized Resource Identifiers (IRIs)
(http://www.ietf.org/rfc/rfc3987).

Work is currently going on to move this to Draft standard, please see:
http://www.ietf.org/internet-drafts/draft-duerst-iri-bis-01.txt

Please review and send your comments to public-iri@w3.org.


As for the "http:", that's indeed a piece that some might say is missing.
Even the IRI spec still

My current take on this is as follows:
1) Only very few URI schemes are actually used widely, mostly only "http:"
2) On most browsers, "http:" can be dropped
3) Things like "http:" look like secret incantations or garbage
   to most end users familiar with the Latin script, too
   (that doesn't make it easier to type for people not familiar
    with the Latin script, though)
4) I think an easy way to help would be to provide a drop-down
   menu at the start of the address/location field e.g. in a browser
   or a Web page editor, with some of the following selections:

   http (Web Page Address)
   ftp (File Exchange)
   mailto (Email Address)
   ...
   [of course the explanatory text being in the user interface language]

5) Some browsers might automatically transscribe character sequences
   in non-Latin scripts corresponding to things such as "http://" or
   "ftp://" to Latin automatically when they appear at the start of
   the browser address/location bar. So e.g. somebody in Greece
   would type something like "φτπ://", and it would automatically
   be converted to "ftp://".

   [neither 4) nor 5) are currently implemented as far as I know,
    but they wouldn't be rocket science, and they would be pretty
    local, low-key solutions, and other, similar ideas, may also
    come up]

6) The next step would be to come up with some matching mechanism
   (the easy part :-) and some political framework to create parallel
   names for the schemes (the tough part :-(. I haven't looked at
   the details, but it might be possible to reuse NAPTR RRs or
   some such for the technical part. (that idea is due to James
   Seng, in personal communication at the Kuala Lumpur ICANN
   meeting in 2004).


I'm sorry this is a bit browser-centric, but that's where I see
most users use most URIs/IRIs.

[Private rant: 1) and 2) together mean that non-ASCII TLDs are way
more important than getting scheme names internationalized, despite
of what some shy ICANN people might say ("we can't possibly move ahead
with non-ASCII TLDs unless we have parallel versions of scheme names")
or what some internationalization zealots might say ("everything
or nothing")]


Hope this helps.

Regards,    Martin.

At 04:39 07/10/16, Thomas Narten wrote:
>As some of you may know, as part of testing the readiness of IDNs,
>ICANN has inserted a set of internationalized versions of ".test" into
>the root zone of the DNS. See
>http://www.icann.org/announcements/announcement-15oct07.htm for
>details.
>
>One of the questions that this has prompted (again) is what about that
>pesky "http:", that still needs to typed in ascii. And what about the
>rest of the URL for that matter.
>
>I know this is not a new issue, but could someone summarize the
>landscape here on what "needs to be done" to internationalize the rest
>of the URI (other than the DNS name)? Is this considered to be
>completely an application issue? (I note that URIs are ascii, but
>there are escaping mechanisms to handle other characters.)
>
>Is there additional IETF work that needs to be done here? Or does this
>all fall under application-specific enablement?
>
>Or is even worse, in that this largely falls outside of applications
>and more into what is typically done by OS libraries and on the type
>of internationalization support the OS provides indirectly?
>
>Thomas


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp