Uniform Resource Modifier: a meta-information encoding scheme
Michael Mealling <ccoprmm@oit.gatech.edu> Wed, 07 July 1993 23:01 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa16121; 7 Jul 93 19:01 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa16117; 7 Jul 93 19:01 EDT
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa29952; 7 Jul 93 19:01 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA08908 on Wed, 7 Jul 93 18:11:42 -0400
Received: from oit.gatech.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA08904 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Wed, 7 Jul 93 18:11:35 -0400
Received: by oit.gatech.edu (5.67a/OIT-4.1) id AA07381; Wed, 7 Jul 1993 18:11:33 -0400
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Michael Mealling <ccoprmm@oit.gatech.edu>
Message-Id: <199307072211.AA07381@oit.gatech.edu>
Subject: Uniform Resource Modifier: a meta-information encoding scheme
To: uri@bunyip.com
Date: Wed, 07 Jul 1993 18:11:33 -0400
X-Mailer: ELM [version 2.3 PL11]
(Note: this document is also available <A HREF=http://www.gatech.edu/urm.paper> here</A>.) Michael Mealling Georgia Tech July, 1993 Uniform Resource Modifier In this paper, the author proposes a method for encoding information concerning the content and/or format of a network resource in the context of the URI method of naming and locating resources. 1: Introduction A Uniform Resource Modifier (URM) is a way of encoding what is called a resource's metainformation [1-Weider 93]. This includes information such as the authors name, the resources data format, and its expiration date. 2: Motivation This paper was needed to separate the function of content/format specification of a resource from it's location and naming functions. The naming function is taken care of by the Uniform Resource Name (URN) [3-Weider 93]. It's purpose is to uniquely name a resource. The location function is handled by the Uniform Resource Locator (URL) [Berners-Lee 1993]. It's purpose is to actually gain access to the resource. Neither of these items gives the user any clues as to the size, format, content, etc of the resource. These are vital pieces of information that are contained outside of the resource itself. The Uniform Resource Modifier (URM) is a method for encoding this meta-information in a way that will work together with the URL/URN encoding schemes. It is designed to be extensible and flexible since many methods could be developed for representing meta-information. 3: The Uniform Resource Modifier (URM) 3.1 Functionality The URM is designed to provide for a non-persistent meta-information encoding scheme. It is meant to be used in conjunction with items called transponders [1-Weider 1993] and other network resources that need typing information. URMs are meant to be used specifically in conjunction with URLs as a locally cached entity that is used to give typing information to network clients. URMs are NOT persistent and may change. They are meant to be human readable as well as machine readable. This means that certain fields can have specific internal syntax but that this internal syntax is not to be defined here and is not required. This allows for machine readable data to co-exist beside human readable data. 3.2 URM Sections Explanation A URM (like URLs and URNs) have distinct sections to them: the wrapper, the encoding format scheme, and the list of encoded items. The syntax is: URM:Format_Scheme::"Data_Item"::"Data_Item"::...::"Data_Item"::: 3.2.1 The wrapper The wrapper consists of the 4 character header "URM:" and the 3 trailing colons. These give separation from other Resource Identifiers and follows the convention of URLs and URNs. This also allows all three identifiers to be encoded into an easily manipulated template. This template is a subject for further investigation. 3.2.2 The Format_Scheme The Format_Scheme is made of of three fields: the Format, language, and character set specifiers. Their format is: Format:language.character_set Format is a single identifier that is made up of allowed meta-information encoding schemes. Recognizing that other encoding schemes exist but that too many encoding schemes renders a URM useless it is suggested that a very limited number of encoding schemes be allowed and that those allowed be registered with the IANA. This is for discussion among the IIIR working groups [2-Weider] . This paper puts forth one as a good solution to most encoding problems. The IAFA working group of the IETF has developed a very large list of field names and allowed data elements that are used to describe the various attributes of an item on an FTP site. This list is comprehensive enough to be used as a URM encoding scheme. This paper suggests that the identifier string 'IAFA' be used as a Format_Scheme. Language specifies which language the resulting encoded information is in. This is specified in the standard format using ISO 639 country and ISO 3316 language codes. The format is: languagecode_countrycode For example, British English would be represented as: en_UK Character_set should be the ISO name for each allowed character set. An example would be: IAFA:en_US.iso88591 3.2.3 The list of encoded items The list consists of one or more data items surrounded by quotation marks and separated by double colons. This is the section where the actual data is encoded. White space of any type is allowed here. If quotation marks are needed within these items then they should be quoted with a '\' in the C style of special character quoting. It should be noted that some transport protocols put restrictions on white space and non-printable characters. These should be taken into account when transporting URMs around the net. An example follows: URM:IAFA:en_US.iso88591::" Author: John Doe "::" Title: \"My Book\" "::" Format: PostScript "::: (Note the Carriage Return at the beginning and end of some fields. This is simply an illustration of the inclusion of non-printable characters.) 4. Syntax Specifics Below is a BNF like syntax for a URM. Where spaces are allowed they are listed in addition to other characters. Square brackets '[' and ']' are used to indicate optional parts. Single letters and digits stand for themselves. All words of more than one letter are either expanded further in the syntax or represent themselves. urm URM:Format_Scheme::Item[::Items]::: Format_Scheme Format:Language:Character_Set Format alphas Language isoLanguageCode_isoCountryCode isoLanguageCode alphas isoCountryCode alphas Character_Set alphas Items Item [Items] Item "xalphas" alphas alpha[alphas] xalphas xalpha[xalpha] xalpha alpha[:] alpha any character defined in any iso recognized character set except for a ':' 5: References [1-Weider 93] Weider, Chris. Resource Transponders, March 1993. Available as ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-transponders-00.txt [2-Weider 93] Weider, Chris and Deutsch, Peter. A Vision of an Integrated Internet Information Service, March, 1993. Available as ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-vision-00.txt [3-Weider 93] Weider, Chris. Uniform Resource Names, May, 1993. Available as ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-uri-resource-names-00.txt [Berners-Lee 1993] Berners-Lee, Tim. Uniform Resource Locators, March, 93. Available as ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-uri-url-00.txt -- ------------------------------------------------------------------------------ Michael Mealling ! Hypermedia WWW, WAIS, and gopher will be Georgia Institute of Technology ! here soon via MIME. Your view of the Michael.Mealling@oit.gatech.edu ! internet is about to change completely!
- Uniform Resource Modifier: a meta-information enc… Michael Mealling
- Re: Uniform Resource Modifier: a meta-information… Peter Deutsch