[Uri-review] IDNA, IRI, HTML5 coordination

Larry Masinter <masinter@adobe.com> Wed, 16 September 2009 15:44 UTC

Return-Path: <masinter@adobe.com>
X-Original-To: uri-review@core3.amsl.com
Delivered-To: uri-review@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A62193A68ED; Wed, 16 Sep 2009 08:44:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.599
X-Spam-Level:
X-Spam-Status: No, score=-8.599 tagged_above=-999 required=5 tests=[AWL=-2.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X9yODnwygZSh; Wed, 16 Sep 2009 08:44:34 -0700 (PDT)
Received: from psmtp.com (exprod6ob117.obsmtp.com [64.18.1.38]) by core3.amsl.com (Postfix) with ESMTP id E9C553A6987; Wed, 16 Sep 2009 08:42:47 -0700 (PDT)
Received: from source ([192.150.11.134]) by exprod6ob117.postini.com ([64.18.5.12]) with SMTP ID DSNKSrEHpFHLKZ+O2qr8amn/BhC2uYn0dTqa@postini.com; Wed, 16 Sep 2009 08:44:52 PDT
Received: from inner-relay-1.corp.adobe.com ([153.32.1.51]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id n8GFacao002888; Wed, 16 Sep 2009 08:36:39 -0700 (PDT)
Received: from nahub01.corp.adobe.com (nahub01.corp.adobe.com [10.8.189.97]) by inner-relay-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id n8GFhJiq015339; Wed, 16 Sep 2009 08:43:19 -0700 (PDT)
Received: from excas03.corp.adobe.com (10.8.189.123) by nahub01.corp.adobe.com (10.8.189.97) with Microsoft SMTP Server (TLS) id 8.1.375.2; Wed, 16 Sep 2009 08:43:19 -0700
Received: from nambx04.corp.adobe.com ([10.8.127.98]) by excas03.corp.adobe.com ([10.8.189.123]) with mapi; Wed, 16 Sep 2009 08:43:19 -0700
From: Larry Masinter <masinter@adobe.com>
To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Date: Wed, 16 Sep 2009 08:43:17 -0700
Thread-Topic: IDNA, IRI, HTML5 coordination
Thread-Index: Aco18Wfr0CuHNytoSpuP9YKGT8AcWwAJDx3g
Message-ID: <8B62A039C620904E92F1233570534C9B0118DBB46FA0@nambx04.corp.adobe.com>
References: <0F9C0B65969D644EA7B34DF381465F66F451C8@RY02MAIL.citc.gov.sa> <4AAE260B.7080806@it.aoyama.ac.jp> <017C69CB-CD3E-4692-82AE-6514B3D20DBA@google.com> <0F9C0B65969D644EA7B34DF381465F66F45238@RY02MAIL.citc.gov.sa> <92EF5D61-AB67-4A67-9DF5-AB208A2B5795@google.com>
In-Reply-To: <92EF5D61-AB67-4A67-9DF5-AB208A2B5795@google.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Mailman-Approved-At: Wed, 16 Sep 2009 08:55:52 -0700
Subject: [Uri-review] IDNA, IRI, HTML5 coordination
X-BeenThere: uri-review@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Proposed URI Schemes <uri-review.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/uri-review>, <mailto:uri-review-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/uri-review>
List-Post: <mailto:uri-review@ietf.org>
List-Help: <mailto:uri-review-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/uri-review>, <mailto:uri-review-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2009 15:44:36 -0000

Goal: bring together and coordinate the definitions
of what is used for resource identification in the web and elsewhere
(IRIs as the evolution of URL, URI, IRI, HREF, Web Address, etc.)
within W3C, IETF and their specifications. See "design goals" below.

Goal of this message: lay out the concerned groups, start discussion
of process for coordination.

I've bcc'd everyone except the public-iri@w3.org mailing list,
archive http://lists.w3.org/Archives/Public/public-iri/ as the
list proposed for discussion:


My suggestion for how to get all of these groups to coordinate
is to start an IETF working group with a charter to bring these
specifications into alignment. I can't think of any other process
which can accomplish the goal.

PLEASE, PLEASE: if you're going to post an opinion, please at least
cc public-iri@w3.org and try to keep discussion there.

PLEASE: Separate 'process' issues (should there be a working group?
Who else needs to be involved? What's the timing and when?) from
technical issues.

Thanks,

Larry

=================
(Incomplete) list of specifications, groups, chairs, editors: 

HTTP:

[HTTPBIS-URI] HTTP URI scheme def in HTTPBIS draft:
      http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-07#section-9.2
[HTTP-RFC] current HTTP URI scheme definition
      in RFC 2616 http://tools.ietf.org/html/rfc2616#section-3.2.2
[HTTPBIS-WG] IETF HTTPBIS working group
    charter: http://tools.ietf.org/wg/httpbis/charters
             http://www.ietf.org/dyn/wg/charter/httpbis-charter.html
    mailing list: ietf-http-wg@w3.org,
          archives http://lists.w3.org/Archives/Public/ietf-http-wg/ 
    chair:   Mark Nottingham <mnot@mnot.net>
    editors: Roy Fielding <fielding@gbiv.com>,
             Julian Reschke <julian.reschke@greenbytes.de>, (others)
     
IDNA:
[IDNABIS-*] definitions, policies, standards for how Internationalized 
      Domain Names should be handled in Internet applications
      http://tools.ietf.org/html/draft-ietf-idnabis-defs/ 
[IDNABIS-WG]  IETF IDNABIS working group
      charter: http://www.ietf.org/dyn/wg/charter/idnabis-charter.html
      chair: Vint Cerf <vint@google.com>
      editor: John C Klensin <klensin@jck.com>

IRI:
[IRIBIS-6] Revision under preparation:
      http://tools.ietf.org/html/draft-duerst-iri-bis-06
[IRIBIS-LMM] ("Experimental" draft attempting to satisfy IDNABIS and HTML requirements)
      http://larry.masinter.net/iribis-hack.html
      (http://tools.ietf.org/rfcdiff?url1=draft-duerst-iri-bis.txt&url2=http://larry.masinter.net/iribis-hack.txt)
     discussion on: public-iri@w3.org (among others)
     (other)editors: Martin Dürst <duerst@it.aoyama.ac.jp>
                     Michel SUIGNARD <Michel@suignard.com>

Mailto URI:

[MAILTO-RFC] Mailto: URI scheme
      Current: http://tools.ietf.org/html/rfc2368
[MAILTO-BIS] In preparation
     http://tools.ietf.org/html/draft-duerst-mailto-bis-06
   (other) editors (including) Martin Dürst (duerst@it.aoyama.ac.jp)
   discussion on: uri@w3.org

URI:

[URI-RFC] URI spec
      http://tools.ietf.org/html/rfc3986
   mailing list: uri@w3.org 
   (other) editors: Roy Fielding <fielding@gbiv.com>, Tim Berners-Lee <timbl@w3.org>
[URIREG-RFC] 
      URI guidelines: policies and procedures for registering new URI schemes 
      http://tools.ietf.org/html/rfc4395
     editors: Tony Hansen <tony@att.com>
     mailing list for URI review: uri-review@ietf.org 

HTML5:
[HTML5-CURRENT]   HTML5 definition of "URLs"
     http://dev.w3.org/html5/spec/Overview.html#urls
[WEBADDRESS] Attempt to split out "Web Address" component:
     http://www.w3.org/html/wg/href/draft
[HTML-WG] W3C Working Group
     charter: http://www.w3.org/html/wg/
     URL/IRI issue: http://www.w3.org/html/wg/tracker/issues/56
     chairs: Paul Cotton <paul.cotton@microsoft.com>
             Maciej Stachowiak <mjs@apple.com>
             Sam Ruby <rubys@intertwingly.net>
     editor: Ian Hickson <ian@hixie.ch>


Other interested groups:

IETF Applications area
      mailing list: apps-discuss@ietf.org
      area directors: Lisa Dusseault <lisa.dusseault@gmail.com>; 
          Alexey Melnikov <alexey.melnikov@isode.com>

W3C TAG (architectural issue around URIs in W3C specs)
     mailing list: www-tag@w3.org
     archive http://lists.w3.org/Archives/Public/www-tag/
     issue: http://www.w3.org/2001/tag/group/track/issues/27
     chair: Noah Mendelsohn <noah_mendelsohn@us.ibm.com>

[WHATWG]  http://www.whatwg.org/


(Have I missed any groups, specs? I'll update this list
and set it up somewhere)

==============================================
Some design goals:

I’ve tried to write down some of the design goals which I think are important; these may be in conflict, but I've tried to propose priorities which make sense to me. Does anyone disagree with any of these? Think some are missing?


Consistent Terminology: Multiple definitions of the same terms in different documents are to be avoided; even consistent definitions are problematic. Where possible, newer documents should reference older specs.

Security: Avoiding security problems (e.g., difficulties due to spoofing, renaming, misuse of DNS) is a high priority; avoiding security problems is a higher priority than being consistent with existing applications.

Uniform behavior: Optional interpretation rules for resource identifiers which give different results depending on the processing model chosen are to be avoided.

Consistency of web and other Internet applications:  Interoperability between web applications (browsers, proxies, spiders, etc.) and other Internet applications which use resource identifiers (email, directory services) is important, and should be given equal (or nearly equal) priority as interoperability between web browsers. Recommended practice for web applications and other Internet applications should be the same – those creating web content should not be encouraged to create Resource Identifiers (whether called URLs, URIs, IRIs, Web Addresses) which would not function in other applications.

Consistency of specifications with implementations:  When existing specifications do not match the common practice of existing applications, it is appropriate to update the existing specification, even if long standing.

Improve interoperability: When existing implementations disagree, document existing practice, but recommend (normatively) the behavior that will best lead to improved interoperability.

Separate “specification of what a conservative producer should send” from “advice for what a liberal consumer should accept”: for robustness, the specification of a “conforming” resource identifier should produce can be (if necessary) more restrictive than the specification of what some common applications accept.

Minimize options and specifications: The split between URI and IRI as separate protocol elements was unfortunate – to have two separate normative terms, “URI” and “IRI” to describe two variations of “resource identifiers”, but having unnecessary multiple non-terminals and terms is harmful. Adding additional terms such as “LEIRI” and “Web Address” or HREF should be avoided, if at all possible.. (In some ways, “URI” was the term used to unify “URL” and “URN”).

Unless necessary for other reasons above, avoid making existing, conforming, and widely implemented behavior non-conforming: Applications which accept URIs but not IRIs should not be made “non-conforming” by a redefinition of terms.

============================

Some issues (I'm sure there are many more)

•	Can IRI -> URI transformation be scheme independent? ((optional processing would allow
      non-uniform behavior, not meet IDNA requirements))
•	Use of term “URL”  ((ambiguous terms))
•	handling of extra processing rules for XML vs HTML5 vs. IRI document
•	Whether HTML5 references anything other than IRIbis
•	Updating the URI scheme registry to be clear that "URI schemes" are the same as “URL schemes” 
      and “IRI schemes”
•	Can different URI schemes allow different I18N processing rules?