New URM paper with additions!
Michael Mealling <ccoprmm@oit.gatech.edu> Wed, 20 October 1993 00:31 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa21285; 19 Oct 93 20:31 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa21280; 19 Oct 93 20:31 EDT
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa25842; 19 Oct 93 20:31 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA12649 on Tue, 19 Oct 93 17:00:52 -0400
Received: from oit.gatech.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA12645 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Tue, 19 Oct 93 17:00:36 -0400
Received: by oit.oit.gatech.edu (5.67a/OIT-4.2) id AA10993; Tue, 19 Oct 1993 17:00:29 -0400
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Michael Mealling <ccoprmm@oit.gatech.edu>
Message-Id: <199310192100.AA10993@oit.oit.gatech.edu>
Subject: New URM paper with additions!
To: uri@bunyip.com
Date: Tue, 19 Oct 1993 17:00:29 -0400
X-Mailer: ELM [version 2.3 PL11]
This is a new version of the paper I submitted in Amsterdam. I will be in Houston this time to defend it. It is also available via <http://www.gatech.edu/urm.paper>. If you have any questions please copy the list so everyone can benefit. ---- Michael Mealling Michael.Mealling@OIT.gatech.edu Georgia Tech July, 1993 Uniform Resource Identifiers: The Grand Menagerie (NOTE: This paper makes the assumption that the intended audience has working knowledge of URIs and the past work of the URI-WG of the IETF. It also uses names for things that as yet have not been agreed upon within the working group. If you don't agree with what a specific entity is called then please insert your favorite TLA where needed.) 1: Introduction Currently, there are two issues facing the URI working groups: encoding of meta-information and Uniform Resource Name (URN) to Uniform Resource Location (URL) resolution. The first is causing considerable trouble within the working groups because meta-information is by far some of the most important information to the user. The second, while not as volatile as meta-information, will soon be very important as many people start using the new URN specifications in real applications. (NOTE: For the rest of this paper the act of resolution will be depicted with the "->" notation, i.e.: URN->URL means URN to URL resolution.) Presented here is a set of items that should offer an acceptable solution to both problems. For meta-information the author proposes the creation of an additional URI entity called Uniform Resource Meta-information (URM). This entity will be used to encode meta-information such as filesize, type, title, author and version. The URM does not completely solve this problem though. How do you associate a URM with a given URL or URN? This is where a Uniform Resource Template comes into play. A URT is simply a template with only three valid attributes: URN, URL and URM. The URT solves two problems: URN/URL/URM encapsulation and URN/URL/URM resolution and transport. Resolution and transport exploits the fact that a URT is a template that can easily be used by a whois++[1-Fullton] server to search for a given URN. 2: The Uniform Resource Meta-information (URM) 2.1 Functionality The URM is designed to provide for a non-persistent meta-information encoding scheme. It is meant to be used in conjunction with items called transponders [1-Weider 1993] and other network resources that maintain and use information that describe the resource itself. URMs are meant to be used specifically in conjunction with URLs as a locally cached entity used to cut down on the number of times a client requests information from the network. They are meant to be human readable as well as machine readable. This means that certain fields can have specific internal syntax, but that this internal syntax is not to be defined here. This allows for machine readable data to co-exist beside human readable data. One caveat to this is the possibility of having encoded data within the URM. While this probably will happen due to transmission encoding problems it should not be encouraged. 2.2 URM Sections Explanation A URM (like URLs and URNs) has distinct sections to it: the wrapper, the encoding format scheme, and the list of encoded items. The syntax is: URM:Format_Scheme::"Data_Item"::"Data_Item"...::"Data_Item"::: 2.2.1 The wrapper The wrapper consists of the 4 character header "URM:" and a 3 character trailing ":::" with items in between these two. Note that, unlike a URN in which the trailing colons are required, a URM doesn't really need to have the trailing colons. The end of a URM can just as easily be a double quote followed by a carriage return. The 3 colons are simply meant as a standardization. If the working group decides that the ending wrapper is not needed, then dropping it from this specification alters nothing. 2.2.2 The Format_Scheme The Format_Scheme is made of of three fields: the Format, language, and character set specifiers. Their format is: Format:language.character_set Format is a single identifier that is made up of allowed meta-information encoding schemes. Recognizing both that other encoding schemes exist but that too many encoding schemes renders a URM useless it is suggested that a very limited number of encoding schemes be allowed and that those allowed be registered with the IANA. This is for discussion among the IIIR working groups [2-Weider] . This paper puts forth one as a good solution to most encoding problems. The IAFA working group of the IETF has developed a very large list of field names and allowed data elements that are used to describe the various attributes of an item on an FTP site. This list is comprehensive enough to be used as a URM encoding scheme. This paper suggests that the identifier string 'IAFA' be used as a Format_Scheme. This is dependent on the work of the "Data Elements Working Group" that may or may not exist at this writing. Realizing that the data elements may or may not be an attainable goal some have put forth the use of SGML as a method of encoding information without breaking everything. This would simply mean that, instead of "IAFA", the identifier would be "SGML". This paper only makes a few suggestions as to URM encoding schemes. URMs are a method to allow us to "black box" meta-information so the Working Groups can get something out that is useful. Language specifies which language the resulting encoded information is in. This is specified in one of two formats. The first uses ISO 639 country and ISO 3316 language codes. The second uses the value "MIME" as a specifier which denotes that the value within the double quotes is a MIME encoded set of meta-information. This allows for other character formats to be encoded 7-bit clean to allow for easy transmission. The actual format of the data within the MIME package should only be one of the allowed Formats from the initial portion of the Format_Scheme. Below are two examples: The non-MIME format is: languagecode_countrycode For example, British English would be represented as: en_UK Character_set should be the ISO name for each allowed character set. An example would be (NOTE: This makes the URM non-7bit-clean!): IAFA:en_US.iso88591 The following example forces the encoded MIME data between the double quotes to be a MIME encoded IAFA template. This assumes some sort of cooperation with the MIME working groups to specify what a MIME encoded IAFA template looked like. IAFA:MIME 2.2.3 The list of encoded items The list consists of one or more data items surrounded by quotation marks and separated by double colons. This is the section where the actual data is encoded. White space of any type is allowed here. If quotation marks are needed within these items, then they should be quoted with a '\' in the C style of special character quoting. It should be noted that some transport protocols put restrictions on white space and non-printable characters. These should be taken into account when transporting URMs around the net. An example follows: URM:IAFA:en_US.iso88591::"Author: John Doe"::" Title: \"My Book\" "::" Format: PostScript "::: (Note the Carriage Return at the beginning and end of some fields. This is simply an illustration of the inclusion of non-printable characters.) 2.4. Syntax Specifics Below is a BNF-like syntax for a URM. Where spaces are allowed they are listed in addition to other characters. Square brackets '[' and ']' are used to indicate optional parts. Single letters and digits stand for themselves. All words of more than one letter are either expanded further in the syntax or represent themselves. urm URM:Format_Scheme::Item[::Items]::: Format_Scheme Format:Language:Character_Set Format ascii Language isoLanguageCode_isoCountryCode isoLanguageCode ascii isoCountryCode ascii Character_Set ascii Items Item [Items] Item "xalphas" alphas alpha[alphas] xalphas xalpha[xalpha] xalpha alpha[:] alpha any character defined in any iso recognized character set except for a ':' ascii any printable ascii character except ':' 3: The Uniform Resource Template (URT) 3.1: Functionality A URT is a method for showing relationships between URNs, URLs, and URMs and encapsulating them all within an entity that can be passed around as one token. It utilizes simple parsing rules based on URNs having precedence over URLs and URLs having precedence over URMs. This allows a URT to contain most of the information needed about a given network resource in one cache-able chunk of data. The format of a URT exploits the format of each individual component. URNs and URMs both start with an identifier ending with a colon. By allowing for a "URL:" wrapper for URLs we end up with a list of components that naturally fall into template format. We can use the template format to our advantage by using the whois++ server protocol as a way to resolve URNs to URLs and to cache meta-information with URLs to make network access more efficient. Also, with the use of centroids, the URTs can be searched globally (this depends on whether the IIIR group decides if centroids scale or not).[1-Fullton] NOTE: Many on the mailing list have expressed concerns about requiring the URL: wrapper for URLs. Tim Berners Lee has pointed out that URN: and URM: are nothing more than other transport schemes similar to http: and gopher: in URLs. This is an acceptable change since it would only require every client to know which items were URNs and which were URLs. The only problem the author sees with this is that new URIs would require software updates in order to know what the new URI was. 3.2: URT Contents A URT can contain any number of URNs, URLs and URMs delimited by the associated wrapper for each URI. The order of each of these in the file is important since that is how a client would determine which URL went with which URN. What is not apparent is why multiple URNs should be allowed in the same file. This is useful for caching information about related resources. For example, the URT for The Declaration of Independence could also include a URN (and associated URLs and URMs) for the Federalist Papers. This saves the user from going back to the network to retrieve meta-information that is closely related to what they have already received. Multiple URNs in a URT is a very flexible section of the implementation rules of a URT. Some clients may wish to ignore any other occurrences of URNs while others may wish to parse a very large URT with large numbers of related URNs. This is left up to client implementations. The only requirement is that they must at least be able to handle the file. There is no requirement that they keep or use the additional information. 3.3: Ordering Rules 3.3.1 URI Rules of Precedence In order for the numerous UR* in a URT to make sense, there must be order to the sequence of items. The order that makes the most sense is based on expected time to live. A URN is meant to be unique over all time and eternity; therefore, the first occurrence of a URN must have precedence over all other UR* in that URT. A URL is meant to be unique to the location of the document. The document itself may change, which would cause its meta-information to change, but not it's URL. Thus, a URL has precedence over a URM. Finally, another occurrence of a URN denotes a new resource that has precedence over subsequent UR* in the URT. Also, a URT does not need 2 or more of any URN, URL or URM to be a URT. A URT can be made up of just URLs and URMs without a corresponding URN. Conversely, a URT can have only URNs and URLs, or just URNs, or just URLs, or even just a collection of URMs. You can even have a null URT which contains nothing. 3.3.2 URI Combination Rules With a URT having some internal structure, certain scenarios become apparent when certain combinations of UR*s occur. Listed below are several different combinations of URNs, URLs and URMs that denote different resource relationships: URN,URL and URMs denote one specific instantiation of a network resource at a specific location on the net. This is useful for pointing a client at the closest source for a resource. URN and URMs denote meta-information about a URN that is global to all occurrences of that URN. If a URL comes after that URM then any URMs after that URL modify the global URM only for that URL. URN and URLs denote multiple resource locations with no meta-information. URL and URMs specifies a location with its associated meta-information but with no URN. This is useful for resources that are to transient too deserve a URN. URN, URL, URM, URN and UR* denotes a URN that has "related" URNs to show relationship between wholly different resources. This is used to cache closely related objects to reduce calls to the network for related meta-information. 3.5 Example URTs The following example URTs are not exhaustive. They are only used to give a hard example of a URT in order to show the structure: URN:IANA:626::Dir:6345::: URL:gopher://gopher.gatech.edu:2048/11/Computing.Resources URM:IAFA:en_US.iso88591::" Author: Michael Mealling "::" Subject: OIT Computing Resources "::: URL:http://www.gatech.edu/Computing.Resources URM:IAFA:en_US.iso88591::" Author: Michael Mealling "::" Subject: OIT Computing Resources (OIT Home Page) "::" Size: 16k "::: 4: whois++ servers as URN->URT servers Since a URT is simply a template and whois++ was specifically built to to handle anything in template form it seems logical to use whois++ as a resolution scheme. It allows the resolver to handle update records within the protocol instead of as a separate function. It also has the added function of allowing for centroids that make global searches of meta-information easier and faster (NOTE: This assumes that centroids will scale!). For more information this paper differs to the whois++ specification [1-Fullton] and the WNILS-WG. 5: References [1-Weider 93] Weider, Chris. Resource Transponders, March 1993. Available as ftp://cnri.reston.va.us/internet-drafts/draft-ietf-iiir-transponders-00.txt [2-Weider 93] Weider, Chris and Deutsch, Peter. A Vision of an Integrated Internet Information Service, March, 1993. Available as ftp://cnri.reston.va.us/internet-drafts/draft-ietf-iiir-vision-00.txt [3-Weider 93] Weider, Chris and Deutsch, Peter. Uniform Resource Names, Oct, 1993. Available as ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-resource-names-01.txt [Berners-Lee 1993] Berners-Lee, Tim. Uniform Resource Locators, March, 93. Available as ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-01.txt [1-Fullton] Fullton, Jim, Wieder Chris and Spero, Simon. Architecture of the Whois++ Index Service, March, 93. Available as ftp://cnri.reston.va.us/internet-drafts/draft-ietf-wnils-whois-01.txt -- ------------------------------------------------------------------------------ Michael Mealling ! Hypermedia WWW, WAIS, and gopher will be Georgia Institute of Technology ! here soon via MIME. Your view of the Michael.Mealling@oit.gatech.edu ! internet is about to change completely!
- New URM paper with additions! Michael Mealling
- Re: New URM paper with additions! Mitra
- Re: New URM paper with additions! Michael Mealling
- Re: New URM paper with additions! Dirk Herr-Hoyman
- Re: New URM paper with additions! Mitra
- Re: New URM paper with additions! Mitra