Re: internationalization of URIs
Ted Hardie <hardie@qualcomm.com> Tue, 16 October 2007 05:01 UTC
Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IheYQ-0002Lh-F3; Tue, 16 Oct 2007 01:01:18 -0400
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IheYO-0002KY-I3 for discuss-confirm+ok@megatron.ietf.org; Tue, 16 Oct 2007 01:01:16 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IheYN-0002JQ-Jd for discuss@apps.ietf.org; Tue, 16 Oct 2007 01:01:15 -0400
Received: from ithilien.qualcomm.com ([129.46.51.59]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IheYH-0002vf-BC for discuss@apps.ietf.org; Tue, 16 Oct 2007 01:01:15 -0400
Received: from sabrina.qualcomm.com (sabrina.qualcomm.com [129.46.61.150]) by ithilien.qualcomm.com (8.13.6/8.12.5/1.0) with ESMTP id l9G50w94009097 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Mon, 15 Oct 2007 22:00:58 -0700
Received: from [98.207.5.180] (vpn-10-50-0-181.qualcomm.com [10.50.0.181]) by sabrina.qualcomm.com (8.13.6/8.13.6/1.0) with ESMTP id l9G50u3g000587; Mon, 15 Oct 2007 22:00:57 -0700
Mime-Version: 1.0
Message-Id: <p06240601c339e99bc2e9@[129.46.226.27]>
In-Reply-To: <200710151939.l9FJdIkM003350@localhost.localdomain>
References: <200710151939.l9FJdIkM003350@localhost.localdomain>
Date: Mon, 15 Oct 2007 22:01:02 -0700
To: Thomas Narten <narten@us.ibm.com>, discuss@apps.ietf.org
From: Ted Hardie <hardie@qualcomm.com>
Subject: Re: internationalization of URIs
Content-Type: text/plain; charset="us-ascii"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 3002fc2e661cd7f114cb6bae92fe88f1
Cc:
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org
At 3:39 PM -0400 10/15/07, Thomas Narten wrote: >As some of you may know, as part of testing the readiness of IDNs, >ICANN has inserted a set of internationalized versions of ".test" into >the root zone of the DNS. See >http://www.icann.org/announcements/announcement-15oct07.htm for >details. > >One of the questions that this has prompted (again) is what about that >pesky "http:", that still needs to typed in ascii. And what about the >rest of the URL for that matter. So, I've read Martin's answer, but I'd like to take a shot at this from a slightly different angle. Inside the IETF, we commonly treat IRIs as a presentation layer for URIs. There is a URI form for any IRI (and all URIs are also IRIs), so it is always possible to "stick to" the URI as the protocol element and as use IRIs as presentation elements. (The big exception to this is inside XML, where the "anyURI" element got deployed with a syntax that didn't really match URIs at all; the result is that those strings (which appear to be IRIs to the casual observer) are really protocol elements using different rules than those normally used by URIs.) When I read Martin's comments about drop-downs, elided scheme names, and similar tricks, my protocol-geek hat tightened on my head and gave me a pretty severe headache. Taking it off for a moment, though, showed me things are still okay. As presentation elements, things like drop-downs, inference of scheme by an initial www, and similar tricks are more reasonable. A big question, then, is whether we have all the bits we need to map between a presentation element and a protocol element, and whether all of those mechanisms need to be standardized. The answer to the first is almost certainly no. There are some contexts where the UI aspects of a decent presentation element are just beyond the IETF's expertise. Taking even a simple protocol element like the scheme portion of an HTTP URI and determining how best to represent that in, say, modern Mongolian as used by the Oirat is no easy task. The monk who developed it didn't have URIs in mind when Clear Script was being developed. Should we recommend they use the Latin letters in consequence? or the Cyrillic alphabet (as many other Mongolian speakers do)? Is either really the right choice? Especially, is it the right choice for the IETF to take on? If it is not clear, I think the answer to the question of whether all presentation elements need to be standardized is "not in the IETF, anyway". I think the IETF does need to make sure that presentation elements can use the UCS in useful and reasonable ways. We have worked on that, and there continues to be work on that, largely through the efforts of dedicated individuals at this point, rather than working groups. We also have agreed, as a community, to take on work on some work that does not rely on a presentation layer separation from the protocol. We have agreed to work on email addresses, as one example, and that working group decided not to use a pure presentation layer approach: This working group will address one basic approach to email internationalization. That approach is based on the use of an SMTP extension to enable both the use of UTF-8 in envelope address local- parts and optionally in domain-parts and the use of UTF-8 in mail headers -- both in address contexts and wherever encoded-words are permitted today. Its initial target will be a set of experimental RFCs that specify the details of this approach and provide the basis for generating and testing interoperable implementations. Its work will include examining whether "downgrading" -- transforming an internationalized message to one that is compatible with unextended SMTP clients and servers and unextended MUAs -- is feasible and appropriate and, if it is, specifying a way to do so. If it is not, the WG will evaluate whether the effort is worth taking forward. Other approaches may be considered by the formation of other working groups. (see http://www.ietf.org/html.charters/eai-charter.html for the full context). There will be consequences for lots of other protocol slots if this experiment succeeds, as there are lots of places for which there is a tacit assumption that the identifier can "look like" an email identifier (think SIP AoR s and certs, to take two examples). But the changes needed to those slots and the changes needed to the URIs which refer to those (or the IRI representation of those URIs) may not be quite the same. I doubt this has helped much, honestly, but hopefully the urge to correct my mistakes will prompt others to step in and say something more useful. regards, Ted
- internationalization of URIs Thomas Narten
- Re: internationalization of URIs Martin Duerst
- Re: internationalization of URIs Ted Hardie
- Re: internationalization of URIs Martin Duerst
- Re: internationalization of URIs Ted Hardie