Re-write of (formerly) URM paper!

Michael Mealling <michael@fuzzl.oit.gatech.edu> Thu, 04 November 1993 06:02 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa22170; 4 Nov 93 1:02 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa22166; 4 Nov 93 1:02 EST
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa02109; 4 Nov 93 1:02 EST
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA20045 on Wed, 3 Nov 93 22:36:35 -0500
Received: from gatech.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA20041 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Wed, 3 Nov 93 22:36:24 -0500
Received: from fuzzl.oit.gatech.edu.noname (fuzzl.oit.gatech.edu) by gatech.edu with SMTP id AA11294 (5.65c/Gatech-10.0-IDA for <uri@bunyip.com>); Wed, 3 Nov 1993 22:40:50 -0500
Received: by fuzzl.oit.gatech.edu.noname (4.1/SMI-4.1) id AA02103; Wed, 3 Nov 93 22:39:00 EST
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Michael Mealling <michael@fuzzl.oit.gatech.edu>
Message-Id: <9311040339.AA02103@fuzzl.oit.gatech.edu.noname>
Subject: Re-write of (formerly) URM paper!
To: uri@bunyip.com
Date: Wed, 03 Nov 1993 22:38:59 -0500
X-Mailer: ELM [version 2.4 PL22]
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Content-Length: 11294

This is a rewrite of my URI paper. Most of it concerns deleting URMs entirely
and more thoroughly discussing URCs which were formerly URTs. Anyway,
have a read and let me know what you think:

Michael Mealling

Michael.Mealling@OIT.gatech.edu

Georgia Tech

October, 1993

Uniform Resource Characteristics

(NOTE: This paper makes the assumption that the intended audience has working
knowledge of URIs and the past work of the URI-WG of the IETF. It also uses
names for things that as yet have not been agreed upon within the working group.
If you don't agree with what a specific entity is called then please insert 
your favorite TLA where needed.) 

1: Introduction

Currently, there are two issues facing the URI working groups: encoding of
meta-information and Uniform Resource Name (URN) to Uniform Resource
Location (URL) resolution. The first is causing considerable trouble within the
working groups because meta-information is by far some of the most important
information to the user. The second, while not as volatile as meta-information, 
will soon be very important as many people start using the new URN 
specifications in real applications. (NOTE: For the rest of this paper the act 
of resolution will be depicted with the "->" notation, i.e.: URN->URL means 
URN to URL resolution.) 

Presented here is a set of items that should offer an acceptable solution to
both problems. For meta-information the author proposes the creation of an 
additional URI entity called a Uniform Resource Characteristic (URC). This 
entity will be used to encode meta-information such as filesize, type, title, 
author and version. The URC does not completely solve the above problems 
though. How do you associate meta-information with a given URL or URN? The 
simplist way to do this is to abstractly call a URN and a URL meta-information 
as well. This allows you to put URNs and URLs into a URC. The URC then solves 
two problems: URN/URL/URC encapsulation and URN/URL/meta-information resolution 
and transport. It does this by exploiting the fact that all URIs start with the 
common identifier "UR*:". This causes a URC to be a template that is useable as 
a whois++ template. The only additional aspect of URC that is needed for a 
useable structure is to specify how a given URC specifies internal relationships
between specific URIs and meta-information. 

3: The Uniform Resource Characteristic(URC)

3.1: Functionality

A URC is a method for showing relationships between URNs, URLs and meta-
information in an encapsulated entity that can be passed around as one token. It
utilizes simple parsing rules based on URNs having precedence over URLs and URLs
having precedence over various pieces of meta-information. This allows a URC to
contain most of the information needed about a given network resource in one
cacheable chunk of data.

The format of a URC exploits the format of each individual component. URNs and
URCs both start with an identifier ending with a colon. By allowing for a "URL:"
wrapper for URLs we end up with a list of components that naturally fall into
template format. Currently the IETF URI Working Group has specified that URLs
must have "URL:".

We can use the template format to our advantage by using the whois++ server
protocol as a way to resolve URNs to URLs and to cache meta-information with
URLs to make network access more efficient. Also, with the use of centroids, the
URCs can be searched globally (this depends on whether the IIIR group decides if
centroids scale or not).[1-Fullton] The use of whois++ is only a recommendation
and is an implementation issue only. The URC can be used by different resouces 
and directory lookup services but the fact that it is a standard thing and 
has structure is what gives it value.

3.2: URC Contents

A URC can contain any number of URNs, URLs and specific meta-information
delimited by the associated wrapper for each entity. The order of each of these 
in the file is important since that is how a client would determine which entity
corresponds with which other entity.

What is not apparent is why multiple URNs could be allowed in the same URC. This
is useful for caching information about related resources. For example, the URC 
for The Declaration of Independence could also include a URN (and associated 
URLs and URCs) for the Federalist Papers. This saves the user from going back 
to the network to retrieve meta-information that is closely related to what 
they have already received.

Multiple URNs in a URC is a very flexible section of the implementation rules of
a URC. Some clients may wish to ignore any other occurrences of URNs while 
others may wish to parse a very large URC with large numbers of related URNs. 
This is left up to client implementations. The only requirement is that they 
must at least be able to handle the file. There is no requirement that they 
keep or use any of the supplied information.

NOTE: There has been some violent disagreement with the above statement
concerning multiple URNs in the same URC. It is only offered for comment.
Nothing is lost by disallowing multiple URNs and only a small gain is 
accomplished.  Take it or leave it...

3.3: Ordering Rules

3.3.1 URI Rules of Precedence

In order for the numerous UR* in a URC to make sense, there must be order to the
sequence of items. The order that makes the most sense is based partly on 
expected time to live and partly on an arbitrary precedence scheme. A URN is 
meant to be unique over all time and eternity; therefore, the first occurrence 
of a URN must have precedence over all other UR* in that URC. A URL is meant 
to be unique to the location of the document. The document itself may change, 
which would cause its meta-information to change, but not it's URL. Thus, a 
URL has precedence over a URC. Note there are situations where a URL can 
change but the meta-information may not. This case doesn't really matter since 
it would just mean updating the URC anyway. It is useful to think of a URC 
being even more transient than a URL since a URL can stay the same but the URC 
changes. Peter Deutsch said it nicely when he considered it to have a zero 
time-to-live. Finally, another occurrence of a URN denotes a new resource that 
has precedence over subsequent UR* in the URC.

Also, a URC does not need 1 or more of any URN, URL or URC to be a URC. A
URC can be made up of just URLs and meta-information without a corresponding
URN. Conversely, a URC can have only URNs and URLs, or just URNs, or just
URLs, or even just a collection of meta-information. You can even have a null 
URC which contains nothing.

3.3.2 URI Combination Rules

With a URC having some internal structure, certain scenarios become apparent 
when certain combinations of UR*s occur. Listed below are several different 
combinations of URNs, URLs and URCs that denote different resource 
relationships:

   URN,URL and meta-info denote one specific instantiation of a network
   resource at a specific location on the net. This is useful for pointing a 
   client at the closest source for a resource.

   URN and meta-information denote meta-information about a URN that is
   global to all occurrences of that URN. If a URL comes after that
   meta-information then any meta-information after that URL modify the
   global meta-information only for that URL. This DOES NOT mean that the
   meta-information associated with the URN has the unchangeability attribute
   of a URN. 

   URN and URLs denote multiple resource locations with no
   meta-information.

   URL and meta-information specifies a location with its associated
   meta-information but with no URN. This is useful for resources that are to
   transient too deserve a URN or for users who know of a specific resource but
   don't know the URN for it.

   URN, URL, meta-information, URN and other UR* denotes a URN that
   has "related" URNs to show relationship between wholly different resources.
   This is used to cache closely related objects to reduce calls to the 
   network for related meta-information. 

   URNs only denotes a set of related resources.

   URLs only denotes a set of resources that describe some URN that we don't
   know about and that hopefully we can find.

   Meta-information only denotes a template that is used by a user to find a
   resource when they don't know a URL or URN. This is useful for users who
   are comfortable with the library based method of finding something by title
   and author. They simply build a template with those two pieces of
   information and pass it some resolver. There is no gaurantee that it will 
   work but it's a good first try.

3.5 Example URCs

The following example URCs are not exhaustive. They are only used to give a hard
example of a URC in order to show the structure:

URN:1:IANA:626::Dir:6345
Author: Michael Mealling
Subject: OIT Computing Resources
URL:gopher://gopher.gatech.edu:2048/11/Computing.Resources
Cost: US$0.00
MIME_Format: application/postscript
URL:http://www.gatech.edu/Computing.Resources
Cost: US$0.50
MIME_Format: text/plain

4: whois++ servers as URN->URC servers

Since a URC is simply a template and whois++ was specifically built to to handle
anything in template form it seems logical to use whois++ as a resolution 
scheme. It allows the resolver to handle update records within the protocol
instead of as a separate function. It also has the added function of allowing 
for centroids that make global searches of meta-information easier and faster 
(NOTE: This assumes that centroids will scale!). For more information this 
paper differs to the whois++ specification [1-Fullton], the IETF WNILS-Working 
Group and to a new paper by Mitra concerning URN->URC resolution[Mitra].

5: The problem of standardized meta-information

The problem of standardizing meta-information tags in the template format is a
VERY thorny problem. Currently the author believes that the IAFA templates with
the Non-Existant Data Elements Working Group modifications are some of the best
work in this area with the corrolary that this subject is far from being 
rectified. This should be punted to the IETF IIIR Working Group. 

5: References as example URCs

[1-Weider 93] 
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-iiir-transponders-00.txt
Author:Weider, Chris 
Title: Resource Transponders 
Date: March 1993. 
[2-Weider 93] 
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-iiir-vision-00.txt 
Author: Weider, Chris and Deutsch, Peter. 
Title: A Vision of an Integrated Internet Information Service 
Date: March, 1993 
[3-Weider 93] 
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-resource-names-01.txt
Author: Weider, Chris and Deutsch, Peter 
Title: Uniform Resource Names 
Date: Oct, 1993. 
[Berners-Lee 1993] 
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-01.txt 
Author: Berners-Lee, Tim 
Title: Uniform Resource Locators 
Date: March, 1993 
[1-Fullton] 
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-wnils-whois-01.txt 
Author: Fullton, Jim, Wieder Chris and Spero, Simon 
Title: Architecture of the Whois++ Index Service 
Date: March, 93 
[Mitra] 
URL:ftp://ftp.path.net/pub/ietf/urn2urc-02.txt 
Author: Mitra 
Title: URN to URC resolution scenario 
Date: November, 1993