Revision of Chris & Peter's URN paper
Chris Weider <clw@merit.edu> Tue, 19 October 1993 21:49 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa19794; 19 Oct 93 17:49 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa19790; 19 Oct 93 17:49 EDT
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa21488; 19 Oct 93 17:49 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA11564 on Tue, 19 Oct 93 12:04:37 -0400
Received: from merit.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA11560 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Tue, 19 Oct 93 12:04:27 -0400
Return-Path: <clw@merit.edu>
Received: by merit.edu (5.65/1123-1.0) id AA06476; Tue, 19 Oct 93 12:04:46 -0400
Date: Tue, 19 Oct 1993 12:04:46 -0400
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Chris Weider <clw@merit.edu>
Message-Id: <9310191604.AA06476@merit.edu>
To: uri@bunyip.com
Subject: Revision of Chris & Peter's URN paper
Gang: Here is a revision of the URN draft that we submitted earlier this year. It has a number of changes from the earlier draft. In addition, Peter has seen this revision but has not yet had an opportunity to comment; Alan suggested that I send it out anyway in time for Houston. Changes from earlier draft: 1) The multiple colon syntax has been removed; individual fields are now separated by colons, with positional semantics indicating the individual fields and a 'colon-counting' parse technique. 2) The characters < and > are use as delimiters for the URN. Unlike the URL, this is built into the syntax of the URN; I still feel there is a need for termination characters especially when these are going to be cut and pasted. 3) (Major change) A fifth field has been added, specifying the encoding scheme used for the opaque string. It seems to me that it would be unwise to limit the potential character sets usable for the opaque string, particularly if we're planning for the future. However, we still have those pesky mailers to deal with... so I have defined one encoding scheme, ASCII encoded ASCII. While this may seem rather redundant, think about other possibilities such as ASCII encoded UNICODE, ASCII encoded binary checksums, etc. Points emphasized more in the current draft: 1) The fact that the naming authority identifier may be hierarchical in nature, and multi-leveled. Apparently this was not brought out enough in the previous draft as we kept getting questions about it. 2) The fact that the primary function of the URN is to provide a persistant (location independent) identifier. Controversial points still in the draft: 1) The fact that the naming authorities have complete control over the opaque string assignment, BUT that they are encouraged not to have semantically meaningful subparts in the opaque string, which would give the human reader an indication of its use. I sent out a message about the dangers of placing comparative attributes (such as version number) in the URN itself; since I didn't get any response, I must assume that everyone agrees with me (wry grin)... Well, with that, here's the latest draft. Chris INTERNET--DRAFT Chris Weider IETF URI Working Group Merit Network, Inc. Peter Deutsch Bunyip Information Systems, Inc. October, 1993 Uniform Resource Names Status of this Memo In this paper, the authors propose an identifier, called the Uniform Resource Name (URN), which is designed to provide persistent naming for resources and objects on the Internet. This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. This Internet Draft expires March 20, 1994. 1: Introduction A Uniform Resource Name (URN) is an identifier which can be used to uniquely identify a resource, and is designed to provide persistent naming for networked objects. This name would stay the same no matter what the current location(s) of the object was. 2: Motivation This work comes out of the discussions held at the Uniform Resource Identifier meetings at the IETF, and from further discussions among interested parties. Currently, the only standard identification scheme for resources on the Net is the Uniform Resource Locator (URL) [Berners-Lee 1993]. This "Locator" is designed to provide a uniform way of specifying location and retrieval information for networked objects. The URL, however, will not provide a stable, long-lived reference to a resource as the resources have a bad habit of moving out from under the locator. Also, a given resource may have multiple URLs if it resides at a number of different locations on the net, or is available under a number of different access methods. Thus it is difficult to tell, given two different URLs, whether the resources they point to are the same or different without retrieving both of them. The Uniform Resource Name, or URN, has been designed to alleviate these problems. INTERNET--DRAFT Uniform Resource Names Weider, Deutsch 3: The Uniform Resource Name (URN) 3.1 Functionality The URN is designed to provide persistent naming for objects on the net. It is intended to be used in conjunction with a directory service, which can provide a URN -> URL mapping [Weider 1993]. This URN-URL architecture allows permanent references to be made to resources without worrying about their current locations. It is also intended to provide some detection of duplicates in responses to queries of various resource location services. 3.2 What URNs are *not* URNs are not required to be human-readable in the sense that a human could look at the URN and determine anything about the contents of the resource. While the Naming Authority (q.v.) has the final determination of the contents (subject to the syntax constraints), the Naming Authority is STRONGLY discouraged from placing metainformation about the resource into the resource's URN, as the URNs are not expected to be read, and because this paper will specify only five consistent components of the URN. Although there have been a number of proposals placing extensive semantics on the contents of the URN [Spero 1992, Kunze 1993], it was decided by the authors of all the proposals that all metainformation should be conveyed using another mechanism, and that the Naming Authority should assume that humans will never look at the contents of the URN to determine qualities of the resource they are retrieving, and would not be required to guess from a given URN the URN of a document which might be related. 3.3 Components of the URN There are five components to the URN, separated by colons; the keyword 'URN', a code specifying the character set encoding of the rest of the URN, a naming authority scheme identifier, a naming authority identifier, and an opaque string. The URN is surrounded by the characters '<' and '>', which are part of the syntax. Each part is described below. No component of the URN can contain the characters ':', '>', ' ', or '\' unless they are escaped by a backslash character '\'. 3.3.1 URN examples <URN:ASCII:IANA:merit.edu:1929642> <URN::ISBN_Publisher_ID:0_201_12:xyzx\:mnopq> <URN:ASCII:IANA:12456:1\:<\>\:2345> 3.3.2 The character set encoding code This string identifies the encoding scheme used for the rest of the URN. There is only one defined at this time, ASCII, which indicates that the rest of the URN is ASCII encoded ASCII. If this component is empty, the default encoding scheme is assumed. The default encoding scheme is ASCII. INTERNET--DRAFT Uniform Resource Names Weider, Deutsch 3.3.3 The naming authority scheme identifier The naming authority scheme identifier is a string which is the name of a protocol or organization which guarantees the uniqueness of the naming authority identifier which follows. Naming authority scheme identifiers defined at this time are IANA ISBN_Publisher_ID 3.3.4 The naming authority identifier This string, along with the naming authority scheme identifier, identifies a naming authority that may assign URNs to resources. This string may have internal syntax depending on the naming authority scheme identifier associated with it; for example, the naming authority identifier space associated with IANA may be hierarchical and multi-leveled. 3.3.5 The Opaque String The opaque string component of the URN is any string the Naming Authority wishes to assign to a given resource, subject only to the constraints of the character encoding scheme. As mentioned above, the Naming Authority should not assume that a human will ever read the URN. Also, the Naming Authority, in assigning an opaque string to a given resource, should keep the following guidelines in mind: 1: A given opaque string should be case-insensitive (for compatibility with very old systems). 2: A given opaque string, once assigned, should never be reused. These are expected to be persistent names for resources (think in terms of decades). 3: In assigning an opaque string, and thus creating a URN, the Naming Authority should make provisions for a URN -> URL mapping function. This need be nothing more than finding an organization which is already providing this service for other URNs and making arrangements to have them translate for the new URN, or could be as involved as creating a new software agent to provide this service. Remember that a name is no good without some way of getting a location. 4: URNs will be returned as pointers from a resource location service. (See [Weider 1993]). Consequently, a Naming Authority should give some thought to the assignation of new URNs for resources which are derived in some fashion from other resources to which that Authority has already assigned URNs. For example, should the Postscript version and the ASCII version of a paper have the same URN? While there are no universally applicable answers to questions like these (for example, should the Russian and English versions of a scientific paper have the same URN?) an Authority should keep in mind that users will want to weed out duplicate resources in the lists of URNs returned by a resource location service, and consequently will be doing a lot of equality testing on the URNs. INTERNET--DRAFT Uniform Resource Names Weider, Deutsch 4: Setting up as a Naming Authority There are 2 scheme identifiers listed here; others will no doubt be suggested and added as this draft circulates. They are: IANA ISBN_Publisher_ID To set one's organization up as a Naming Authority, one can use the ISBN publisher ID one has been assigned, or one can apply for an Enterprise Number from the IANA (Internet Assigned Number Authority) if the organization does not already have one. The general syntax is listed in section 5. 5: Syntax Below is a BNF like description of the syntax of the URN. Spaces have been used here to separate components for readability, spaces are NOT ALLOWED in a syntactically correct URN unless they are escaped with the '\' character. Square brackets '[' and ']' are used to indicate optional parts; a vertical line "|" indicates alternatives. Single letters and digits stand for themselves. All words of more than one letter are either expanded further in the syntax or represent themselves. urn <URN: Encoding_Scheme:Authority_Id : opaque_string > Authority_Id Scheme_ID : [Individual ] Scheme_ID IANA | ISBN_Publisher_ID | ISSN Individual xalphas xalphas xalpha [ xalphas ] xalpha a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | - | _ | . | @ The allowed characters in the opaque string are determined by the character set encoding code. For the code ASCII, the allowable characters are xalphas as above, the character ':' encoded as '\:', the character '>' encoded as '\>', and the character '\' encoded as '\\'. 6: References [Kunze 1993] Kunze, John, Resource Citations for Electronic Discovery and Retrieval, March, 1993. Circulated to ietf-uri mailing list. [Spero 1992] Spero, Simon, Uniform Resource Numbers, November 1992. Circulated to ietf-uri mailing list. [Weider 1993] Weider, Chris and Deutsch, Peter. A Vision of an Integrated Internet Information Service, March, 1993. Available as ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-vision-00.txt INTERNET--DRAFT Uniform Resource Names Weider, Deutsch 7: Author's addresses Chris Weider clw@merit.edu Merit Network, Inc. 2901 Hubbard, Pod G Ann Arbor, MI 48109 Phone: (313) 747-2730 Fax: (313) 747-3185 Peter Deutsch peterd@bunyip.com Bunyip Information Systems 310 St-Catherine St West suite 202, Montreal, Quebec H2X 2A1 CANADA
- Revision of Chris & Peter's URN paper Chris Weider
- Re: Revision of Chris & Peter's URN paper Simon E Spero
- Re: Revision of Chris & Peter's URN paper Bob Deen
- Re: Revision of Chris & Peter's URN paper Dirk Herr-Hoyman
- Re: Revision of Chris & Peter's URN paper Mitra