URL_Specification

URL_Specification_Snapshot

Tim Berners-Lee <timbl@ptpc00.cern.ch> Mon, 28 March 1994 15:09 UTC
Date: Mon, 28 Mar 1994 16:09:08 +0200
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Tim Berners-Lee <timbl@ptpc00.cern.ch>
Message-Id: <9403281409.AA00416@ptpc00.cern.ch>
To: uri@bunyip.com
Subject: URL_Specification_Snapshot
Uniform Resource Locators (URL)                 Tim Berners-Lee
draft-ietf-uri-url-03.{ps,txt}                URI working Group
Expires 21 September 1994                         21 March 1994


                  Uniform Resource Locators (URL)
                
                  A Syntax for the Expression of
             Access Information of Objects on the Network


                         ABOUT THIS DOCUMENT
                                   
   This document specifies a Uniform Resource Locator (URL), the
   syntax and semantics of formalized information for location and
   access of resources on the Internet.
   
   This document was written by the URI working group of the Internet
   Engineering Task Force.  Comments may be addressed to the editor,
   Tim Berners-Lee <timbl@info.cern.ch>, or to the URI-WG
   <uri@bunyip.com>. Discussions of the group are archived at 
   
  <http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>

   This document is bound by the Requirements Specification in
   preparation.
   
   The work is derived from concepts introduced by the World-Wide Web
   global information initiative,  whose use of  such objects dates
   from 1990 and is described in "Universal Resource identifeirs for
   the World-Wide Web", RFCXXX .
   
   This document is available in hypertext form, with links to
   background information, as: 
   
  <http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html>

   .
   
  STATUS OF THIS MEMO
  
   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups.  Note that other groups may also distribute
   working documents as Internet Drafts.  
   
   Internet Drafts are working documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a
   "working draft" or "work in progress".  
   
   Distribution of this document is unlimited. 



Berners-Lee                                                          1

Recommendations

   This section describes the syntax for "Uniform Resource Locators"
   (URLs): that is, basically physical addresses of objects which are
   retrievable using protocols already deployed on the net.  The
   generic syntax provides a framework for new schemes for names to be
   resolved using as yet undefined protocols.  
   
   The syntax is described in two parts. Firstly, we give the syntax
   rules of a completely specified name; secondly, we give the rules
   under which parts of the name may be omitted in a well-defined
   context. 
   
  URL SYNTAX  
  
   A complete URL consists of a naming scheme specifier followed by a
   string whose format is a function of the naming scheme. For
   locators of information on the internet, a common syntax is used
   for the  IP address part. A BNF description of the URL syntax is
   given in an a later section. The components are as follows. 
   Fragment identifiers and partial URLs are not involved in the basic
   URL definition. 
   
    PrePrefix
    
   To be a Uniform Resource Locator as currently defined by the URI
   working group, the whole string must start with a constant prefix
   "URL:". Note that to save space in this document, some URLs may
   have been quoted throughout without this preprefix. 
   
    Scheme  
    
   Within the URL of a object, the first element is the name of the
   scheme, separated from the rest of the object by a colon. The rest
   of the URL follows the colon in a format depending on the scheme. 
   
    Internet protocol parts  
    
   Those schemes which refer to internet protocols mostly have a
   common syntax for the rest of the object name. This starts with a
   double slash "//" to indicate its presence, and continues until the
   following slash "/".  Within that section are   
   
  An optional user name, 
                          if required (as it is with a few FTP
                         servers). The password, is present, follows
                         the user name, separated from it by a colon;
                         the user name and optional password are
                         followed by a commercial at sign "@".  The
                         user of user name and passwords which are
                         public is discouraged. 
                         
  The internet domain name 



Berners-Lee                                                          2

                          of the host in RFC1037 format (or,
                         optionally and less advisably, the IP address
                         as a set of four decimal digits) 
                         
  The port number,        if it is not the default number for the
                         protocol, is given in decimal notation after
                         a colon. 
                         
  Path                    The rest of the locator is known as the
                         "path". It may define details of how the
                         client should communicate with the server,
                         including information to be passed
                         transparently to the server without any
                         processing by the client.   
                         
   The path is interpreted in a manner dependent on the scheme being
   used. Generally, the slash  "/" (ASCII 2F hex) denotes a level in a
   hierarchical structure, the higher level part to the left of the
   slash. 
   
  ENCODING PROHIBITED CHARACTERS
  
   When a system uses a local addressing scheme, it is useful to
   provide a mapping from local addresses into URLs so that references
   to objects within the addressing scheme may be referred to
   globally, and possibly accessed through gateway servers.
   
   Any mapping scheme may be defined provided it is unambiguous,
   reversible, and provides valid URLs. It is recommended that where
   hierarchical aspects to the local naming scheme exist, they be
   mapped onto the hierarchical URL path syntax in order to allow the
   partial form to be used.  
   
   The following encoding method shall be used for mapping WAIS, FTP,
   Prospero and Gopher addresses onto URLs. Where the local naming
   scheme uses characters which are not allowed in the URL,  these may
   be represented in the URL by a percent sign "%" followed by two
   hexadecimal digits (0-9, A-F) giving the ISO Latin 1 code for that
   character.  Character codes other than those allowed by the syntax
   shall not be used unencoded in a URL. 
   
   The same encoding method may be used for encoding characters whose
   use, although technically allowed in a URL, would be unwise due to
   problems of corruption by imperfect gateways or misrepresentation
   due to the use of variant character sets, or which would simply be
   awkward in a given environment.  Because a % sign always indicates
   an encoded character, a URL may be made safer simply by encoding
   any characters considered unsafe, while leaving already encoded
   characters still encoded.  Similarly, in cases where a larger set
   of characters is acceptable, % signs can be selectively and
   reversibly expanded.
   
Specific Schemes  



Berners-Lee                                                          3

   The mapping for some existing standard and experimental protocols
   is outlined in the BNF syntax definition .  Notes on particular
   protocols follow. The schemes covered are 
   
  http                    Hypertext Transfer Protocol 
                         
  ftp                     File Transfer protocol 
                         
  gopher                  The Gopher protocol 
                         
  mailto                  Electronic mail address 
                         
  mid                     Message identifiers for electronic mail 
                         
  cid                     Content identifiers for MIME body part 
                         
  news                    Usenet news 
                         
  nntp                    Usenet news for local NNTP access only 
                         
  prospero                Access using the prospero protocols 
                         
  telnet , rlogin and tn3270 
                          Reference to interactive sessions 
                         
  wais                    Wide Area Information Servers 
                         
   Other schemes may be specified by future specifications
   
   New schemes may be registered at a later time.
   
  FTP  
  
   The ftp: prefix indicates that the FTP protocol is used, as defined
   in RFC957 or any successor. The port number, if present, gives the
   port of the FTP server if not the FTP default.  
   
      User name and password
      
   The syntax allows for the inclusion of a user name and even a
   password for those systems which do not use the anonymous FTP
   convention. The default, however, if no user or password is
   supplied, will be to use that convention, viz. that the user name
   is "anonymous" and the password the user's Internet-style mail
   address .
   
   Where possible, this mail address should correspond to a usable
   mail address for the user, and preferably give a DNS host name
   which resolves to the IP address of the client. Note that servers
   currently vary in their treatment of the anonymous password.  
   
      Path
      



Berners-Lee                                                          4

   The FTP protocol allows for a sequence of CWD commands (change
   working directory) and a TYPE command prior to  service commands
   such as RETR (retrieve) or NLIST  (etc) which actually access a
   file. 
   
   The arguments of any CWD commands are successive segment parts of
   the URL delimited by slash, and the final segment is suitable as
   the filename argument to the RETR command for retrieval or the
   directory argument to NLIST.
   
   For some file systems (Unix in particular), the "/" used to denote
   the hierarchical structure of the URL corresponds to the delimiter
   used to construct a file name hierarchy, and thus, the filename
   will look the same as the URL path. This does NOT mean that the URL
   is a Unix filename. 
   
        Note: Retrieving subsequent URLs from the same host
        
   There is no common hierarchical model to the FTP protocol, so if a
   directory change command has been given, it is impossible in
   general to deduce what sequence should be given to navigate to
   another directory for a second retrieval, if the paths are
   different.  The only reliable algorithm is to disconnect and
   reestablish the control connection.   
   
      Data type
      
   The data content type of a file can only, in the general FTP case,
   be deduced from the name, normally the suffix of the name. This is
   not standardized. An alternative is for it to be transferred in
   information outside the URL. A suitable FTP transfer type (for
   example binary "I"  or text "A") must in turn be deduced from the
   data content type.  It is recommended that conventions for suffixes
   of public archives be established, but it is outside the scope of
   this standard.
   
   An FTP URL may optionally specify the FTP data transfer type by
   which an object is to be retrieved. Two of the methods correspond
   to the FTP "Data Types" ASCII and IMAGE for the retrieval of a
   document, as specified in FTP by the TYPE command . One method
   indicates directory access.
   
   The data type is specified by a suffix to the URL. Possible
   suffixes are: 
   
  ;type = <type-code>     Use FTP type as given  to perform data
                         transfer. 
                         
  ;type=d                 Use FTP directory list commands to read
                         directory 
                         
   The type code is in the format defined in RFC959. 
   



Berners-Lee                                                          5

      Transfer Mode
      
   Stream Mode is always used.
   
  HTTP  
  
   The HTTP protocol specifies that the path is handled transparently
   by those who handle URLs, except for the servers which de-reference
   them.   The path is passed by the client to the server with any
   request, but is not otherwise understood by the client.  The
   fragmentid part is not sent with the request.  The search part, if
   present, is sent. Spaces and control characters in URLs must be
   escaped for transmission in HTTP.
   
  GOPHER
  
   Gopher selector strings are, in general, interpreted as a sequence
   of 8-bit bytes which may contain any characters other than tab,
   return, or linefeed. It is necessary to encode any characters
   disallowed in a URL, including spaces and other binary data not in
   the allowed character set, using the standard convention of the "%"
    character followed by two hexadecimal digits.
   
    Note that slash "/" in gopher selector strings may not correspond 
   to a level in a  hierarchical structure.
   
    The format of a gopher URL is: 
   
      1. A single-character field to denote the Gopher type of the
      resource to which the URL refers. 
      
      2. The gopher selector string.  Note that some gopher selector
      strings begin with a copy of the gopher type character, in which
      case that character will occur twice consecutively. Also note
      that the gopher selector string may be an empty string since
      this is how  gopher clients refer to the top-level directory on
      a gopher server. 
      
   If the URL does not refer to a Gopher+ item and if there is no
   gopher search  string then parts 3, 4, 5, and 6 of the URL are
   optional  
   
      3. An encoded tab character (%09) to seperate the gopher
      selector string from the optional search string (see 4 below).  
      
      4. The gopher search string.  If the URL refers to a search to
      be submitted to a gopher search engine, the  search string is
      required. Otherwise this is an empty string. 
      
      5. An encoded tab character (%09) to seperate the gopher search
      string from the optional gopher+ string (see 6 below).
      [suggestion: Note that if the URL refers to a  gopher+ item and
      does not have a gopher search string, there will be two  encoded



Berners-Lee                                                          6

      tab characters in a row.] 
      
      6. The Gopher+ string. Gopher+ strings consist of a one or more 
      characters and are used to represent information required for
      retrieval  of the Gopher+ item. Gopher+ items may have alternate
      views, arbitrary sets  of attributes, and may have electronic
      forms associated with them. To  accomodate the various Gopher+
      objects, the Gopher+ string in the URL must  accomodate a
      mapping of the information a Gopher+ client sends to the server.
      This makes this section a bit long since we basically cover the
      entire Gopher+ protocol here. 
      
   When a Gopher server returns a directory listing to a client,
   Gopher+ items are tagged with either a "+" (denoting gopher+ items)
   or a "?" (denoting items  which have a +ASK form associated with
   them). A Gopher+ string which is only a  "+" refers to the default
   view (data representation) of the item. To retrieve  this item a
   gopher+ client should send 
   
       a_gopher_selector<tab>+<cr><lf>

   to the gopher+ server.
   
   Note that items which have a +ASK asssociated with them (ie.
   Gopher+ items  tagged with a "?") require the client to fetch the
   item's +ASK attribute to  get the form definition, and then ask the
   user to fill out the form and return  the user's responces along
   with the selector string to retrieve the item.  Gopher+ clients
   know how to do this but depend on the "?" tag in the gopher+  item
   description to know when to handle this case. The "?" is used in
   the Gopher+ string to be consistent with Gopher+ protocol's use of
   this symbol.
   
   To refer to the Gopher+ attributes of an item, the Gopher+ string
   might consist of "!" or "$". "!" refers to the all of a gopher+
   item's attributes. "$" refers to all the item attributes for all
   items in a Gopher directory. To retrieve an item or directory's
   attributes, a gopher client will send: 
   
       a_gopher_selector<tab>!<cr><lf>

   for items or 
   
       a_gopher_selector<tab>$<cr><lf>

   for directories to the gopher+ server.
   
   To refer to specific attributes, the Gopher+ string is
   "!attribute_name" or "$attribute_name". For example, to refer to
   the attribute containing the  abstract of an item, the Gopher+
   string would be "!+ABSTRACT". To refer to  several attributes,
   clients send the server the attribute names seperated by spaces so
   it is neccesary to seperate the attribute names with coded spaces.



Berners-Lee                                                          7

   To retrieve a collection of item attributes specified with a
   gopher+ string of "!+ABSTRACT%20+SMELL" a gopher client would send 
   
       a_gopher_selector<tab>!+ABSTRACT +SMELL<cr><lf>

   to the gopher server.
   
   Gopher+ allows for optional alternate data representations
   (alternate views) of items. To retrieve a Gopher+ alternate view,
   the gopher+ client sends the appropriate view and language
   identifier (found in the item's +VIEW attribute). To refer to a
   specific Gopher+ alternate view, the URL's Gopher+ string would be
   in the form "+view_name%20language_name". For example, a gopher+
   string of "+application/postscript%20Es_ES" refers to the spanish
   language postscript alternate view of a gopher+ item. To retrieve
   this alternate view the client would send 
   
       a_gopher_selector<tab>+application/postscript Es_ES<cr><lf>

   to the gopher server.
   
   The gopher+ string for a URL that refers to an item referenced by
   an ASK form  filled out with specific values is essentially a coded
   version of what the  client sends to the server. The gopher+ string
   will be of the form  
   
  +%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value%0D%0A.%0D%0
A 

   To retrieve this item, the gopher client sends: 
   
       a_gopher_selector<tab>+<tab>1<cr><lf>
       +-1<cr><lf>
       ask_item1_value<cr><lf>
       ask_item2_value<cr><lf>
       .<cr><lf>

   to the gopher server.
   
   For a really complex example, consider a URL that refers to an
   alternate view of an item that is referenced with a filled-out
   Gopher +ASK form. The  gopher+ string will be of the form:  
   
 
    +view_name%20language_name%091%0D%0A+-1%0D%0Aask_item1_value%0D%0A
    ask_item2_value%0D%0A.%0D%0A 

   To retrieve this item, the gopher client sends: 
   
       a_gopher_selector<tab>+view_name language_name<tab>1<cr><lf>
       +-1<cr><lf>
       ask_item1_value<cr><lf>
       ask_item2_value<cr><lf>



Berners-Lee                                                          8

       .<cr><lf>

   to the gopher server. 
   
    Summary: gopher+ string part of Gopher URL
    
   To refer to an item which has an ASK form associated with it where
   the  intent is to allow the user to enter values into the form as
   part of the  retrieval process: 
   
   %3F [was: ?]  


   To refer to all or specific attributes of a gopher item: 
   
   ![attribute_name][%20attribute_name][%20attribute_name]...


   To refer to all or specific attributes of a gopher directory: 
   
   $[attribute_name][%20attribute_name][%20attribute_name]...


   To refer to the content of a gopher+ item (including an item
   referred to by specific values in a filled-out ASK form): 
   
   +[view_name[%20language_name]]
    [%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value...%0D%0A.
%0D%0A]



    Overall summary and examples
    
   The general format of a Gopher URL path refering to a gopher  type
   "T" item is: 
   
  gopher://host [port]/T[gopher_selector]%09[search_string]%09[gopher+
_string]


      Examples:
      
   An example of a URL pointing to a gopher type 0 item (a document)
   is: 
   
  gopher://host [port]/0a_gopher_selector


   An example of a URL pointing to a gopher type 7 item (a search
   engine) where the string foobar is to be submitted to the search
   engine is: 
   



Berners-Lee                                                          9

  gopher://host [port]/7a_gopher_selector%09foobar


   An example of a URL pointing to a Gopher+ type 0 item (a document)
   is: 
   
  gopher://host [port]/0a_gopher_selector%09%09some_gplus_stuff


   An example of a URL pointing to a Gopher+ type 0 (document) item's
   attribute  information is: 
   
  gopher://host [port]/0a_gopher_selector%09%09!


   An example of a URL pointing to a Gopher+ document's spanish
   postscript representation is: 
   
  gopher://host [port]/0a_gopher_selector%09%09+application/postscript
%20Es_ES

   .
   
  MAILTO
  
   This allows a URL to specify an RFC822 addr-spec mail address. 
   Note that use of % , for example as used in forming a gatewayed
   mail address, requires conversion to %25 in a URL.
   
  NEWS
  
   The news locators refer to either news group names or article
   message identifiers which must conform to the rules for a
   Message-Idof RFC 1036 (Horton 1987).  A message identifier may be
   distinguished from a news group name by the presence of the
   commercial at "@" character. These rules imply that within an
   article, a reference to a news group or to another article will be
   a valid URL (in the partial form). 
   
   A news URL may be dereferenced using NNTP (RFC977, Kantor 86)  (The
   ARTICLE by message-id command ) or using any other protocol for the
   conveyance of usenet news articles, or by reference to a body of
   news articles already received. 
   
      Note1: 
      
   Among URLs the "news" URLs are anomalous in that they are
   location-independent. They are unsuitable as URN candidates because
   the NNTP architecture relies on the expiry of articles and
   therefore a small number of articles being available at any time. 
   When a news: URL is quoted, the assumption is that the reader will
   fetch the article or group from his or her local news host.  News
   host names are NOT part of news URLs. 



Berners-Lee                                                          10

      Note 2:
      
   An outstanding problem is that the message identifier is
   insufficient to allow the retrieval of an expired article, as no
   algorithm exists for deriving an archive site and file name. The
   addition of the date and news group set to the article's URL would
   allow this if a directory existed of archive sites by news group.
   Suggested subject of study in conjunction with NNTP working group. 
   Further extension possible may be to allow the naming of subject
   threads as addressable objects.
   
    NNTP
    
   This is an alternative form of reference for news articles,
   specifically to be used with NNTP servers, and particularly those
   incomplete server implementations which do not allow retrieval by
   message identifier.  In all other cases the "news" scheme should be
   used.
   
   The news server name, newsgroup name, and index number of an
   article within the newsgroup on that particular server are given.  
   The NNTP protocol must be used. 
   
      Note1.
      
   This form of URL is not of global accessability, as typically NNTP
   servers only allow access from local clients.   Note that the
   article numbers within groups vary from server to server.
   
   This form or URL should not be quoted outside this local area.  It
   should not be used within news articles for wider circulation than
   the one server.  This is a local identifier for a resource which is
   often available globally, and so is not recommended except in the
   case in which incomplete NNTP implementations on the local server
   force its adoption.
   
  PROSPERO  
  
   The Prospero (Neuman, 1991) directory service is used to resolve
   the URL yielding an access method for the object (which can then
   itself be represented as a URL if translated). The host part
   contains a host name or internet address.  The port part is
   optional.  
   
   The path part contains a host specific object name and an optional
   version number. If present, the version number is separated from
   the  host specific object name by the characters "%00" (percent
   zero zero), this being an escaped string terminator (null).
   External Prospero links are represented as URLs of the underlying
   access method and are not represented as Prospero URLs.
   
  TELNET, RLOGIN, TN3270  
  



Berners-Lee                                                          11

   The use of URLs to represent interactive sessions is a convenient
   extension to their uses for objects.  This allows access to
   information systems which only provide an interactive service, and
   no information server. As information within the service cannot be
   addressed individually or, in general, automatically retrieved,
   this is a less desirable, though currently common, solution.
   
  WAIS  
  
   The current WAIS implementation public domain requires that a
   client know the "type" of a object prior to retrieval. This value
   is returned along with the internal object identifier in the search
   response. It has been encoded into the path part of the URL in
   order to make the URL sufficient for the retrieval of the object.
   Within the WAIS world, names do not of course need to be prefixed
   by "wais:"  (by the partial form rules).
   
   The wpath of a WAIS URL consists of encoded fields of the WAIS
   identifier, in the same order as inthe WAIS identifier. For each
   field, the identifier field number is the digits before the equals
   sign, and the field contents follow, encoded in the conventional
   encoding, terminated by ";".
   
    
   
  REGISTRATION OF NAMING SCHEMES  
  
   A new naming scheme may be introduced by defining a mapping onto a
   conforming URL syntax, using a new prefix. Experimental prefixes
   may be used by mutual agreement between parties, and must start
   with the characters "x-".  The scheme name "urn:" is reserved for
   the work in progress on a scheme for more persistent names.  
   
   It is proposed that the Internet Assigned Numbers Authority (IANA)
   perform the function of registration of new schemes. Any submission
   of a new URI scheme must include a definition of an algorithm for
   the retrieval of any object within that scheme. The algorithm must
   take  the URI and produce either a set of URL(s) which will lead to
   the desired object, or the object itself, in a well-defined or
   determinable format.
   
   It is recommended that those proposing a new scheme demonstrate its
   utility and operability by the provision of a gateway which will
   provide images of objects in the new scheme for clients using an
   existing protocol. If the new scheme is not a locator scheme, then
   the properties of names in the new space should be clearly defined.
    It is likewise recommended that, where a protocol allows for
   retrieval by URL, that the client software have provision for being
   configured to use specific gateway locators for indirect access
   through new naming schemes.
   
BNF for specific URL schemes




Berners-Lee                                                          12

   This is a BNF-like description of the Uniform Resource Locator
   syntax. A vertical  line "|"  indicates alternatives, and
   [brackets]  indicate optional parts.  Spaces are represented by the
   word "space", and the vertical line character by "vline".   Single
   letters stand for single letters. All words of more than one letter
   below are entities described somewhere in this description.  
   
   The current IETF URI working group preference  is for the
   prefixedurl production. (Nov 1993. July 93: url).
   
   The "national" and "punctuation" characters do not appear in any
   productions and therefore may not appear in URLs.
   
   The "afsaddress" is left in as historical note, but is not a url
   production 
   
  prefixedurl             u r l : url 
                         
  ur l                    httpaddress | ftpaddress | newsaddress |
                         nntpaddress | prosperoaddress | telnetaddress
                          | gopheraddress | waisaddress |
                         mailtoaddress  | midaddress | cidaddress 
                         
  scheme                  ialpha   
                         
  httpaddress             h t t p :   / / hostport [  / path ] [ ?
                         search ]   
                         
  ftpaddress              f t p : / / login / path [ ! ftptype ] 
                         
  afsaddress              a f s : / / cellname / path   
                         
  newsaddress             n e w s : groupart   
                         
  nntpaddress             n n t p : group /  digits 
                         
  midaddress              m i d  :  addr-spec 
                         
  cidaddress              c i d : content-identifier 
                         
  mailtoaddress           m a i l t o : : xalphas @ hostname 
                         
  waisaddress             waisindex | waisdoc  
                         
  waisindex               w a i s : / / hostport / database [ ? search
                         ]   
                         
  waisdoc                 w a i s : / / hostport / database / wtype  /
                         wpath 
                         
  wpath                  digits = path ;  [ wpath ] 
                         
  groupart                * | group | article   



Berners-Lee                                                          13

  group                   ialpha [ . group ]   
                         
  article                 xalphas @ host   
                         
  database                xalphas   
                         
  wtype                   xalphas   
                         
  prosperoaddress         prosperolink   
                         
  prosperolink            p r o s p e r o : / / hostport / hsoname [ %
                          0 0 version [ attributes ] ]   
                         
  hsoname                 path   
                         
  version                 digits   
                         
  attributes              attribute [ attributes ]   
                         
  attribute               alphanums   
                         
  telnetaddress           t e l n e t : / / login 
                         
  gopheraddress           g o p h e r : / / hostport [/ gtype  [
                         selector ] ] [ ? search ]   
                         
  login                   [ user [ : password ] @ ] hostport 
                         
  hostport                host [ : port ]   
                         
  host                    hostname | hostnumber   
                         
  ftptype                 A | I | D 
                         
  cellname                hostname   
                         
  hostname                ialpha [  .  hostname ] 
                         
  hostnumber              digits . digits . digits . digits 
                         
  port                    digits   
                         
  selector                path   
                         
  path                    void |  segment  [  / path ] 
                         
  segment                 xpalphas 
                         
  search                  xalphas [ + search ]   
                         
  user                    xalphas  
                         
  password                xalphas 



Berners-Lee                                                          14

  fragmentid              xalphas   
                         
  gtype                   xalpha   
                         
  xalpha                  alpha | digit | safe | extra | escape   
                         
  xalphas                 xalpha [ xalphas ]   
                         
  xpalpha                 xalpha | +   
                         
  xpalphas                xpalpha [ xpalpha ]   
                         
  ialpha                  alpha [ xalphas ] 
                         
  alpha                   a | b | c | d | e | f | g | h | i | j | k |
                         l | m | n | o  | p | q | r | s | t | u | v |
                         w | x | y | z | A | B | C  | D | E | F | G |
                         H | I | J | K | L | M | N | O | P |  Q | R |
                         S | T | U | V | W | X | Y | Z   
                         
  digit                   0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9   
                         
  safe                    $ | - | _ | @ | . | &  | + | - 
                         
  extra                   " |  ' | ( | )  | ,  | space   
                         
  reserved                 =  |  ;  |  /  |  #  |  ? |  : 
                         
  escape                  % hex hex   
                         
  hex                     digit | a | b | c | d | e | f | A | B | C |
                         D | E | F   
                         
  national                { | } | vline | [ | ] | \ | ^ | ~   
                         
  punctuation             < | > 
                         
  digits                  digit [ digits ]   
                         
  alphanum                alpha | digit   
                         
  alphanums               alphanum [ alphanums ] 
                         
  void                   
                         
   (end of URL BNF)      
   
Security considerations  

   The URL scheme does not in itself pose a security threat. Users
   should beware that there is no general guarantee that a URL which
   at one time points to a given object continues to do so, and does
   not even at some later time point to a different object due to the



Berners-Lee                                                          15

   movement of objects on servers.
   
   A URL-related security threat is that it is sometimes possible to
   construct a URL such that an attempt to perform a harmless
   idempotent operation such as the retrieval of the object will in
   fact cause a possibly damaging remote operation to occur.  The
   unsafe URL is typically constructed by specifying a port number
   other than that reserved for the network protocol in question.  The
   client unwittingly contacts a server which is in fact running a
   different protocol.  The content of the URL contains instructions
   which when interpreted according to this other protocol cause an
   unexpected ooperation. An example has been the use of gopher URLs
   to cause a rude message to be sent via a SMTP server.  Caution
   should be used when using any URL which specifies a port number
   other than the default for the protocol, especially when it is a
   number within the reserved space.
   
   Care should be taken when URLs contain embedded encoded delimiters
   for a given protocol (for example,  CR and LF characters for telnet
   protocols) that these are not unencoded before transmission.  This
   would violate the protocol but could be used to simulate an extra
   operation or parameter, again causing an unexpected and possible
   harmful remote operation to be performed.
   
   The use of URLs containing passwords is clearly unwise.
   
Acknowledgements  

   This paper builds on the basic W3 design and much discussion of
   these issues by many people on the network. The discussion was
   particularly stimulated by articles by Clifford Lynch (1991),
   Brewster Kahle (1991) and Wengyik Yeong (1991b). Contributions from
   John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti (MSEN)
   and later the IETF URL BOF and URI working group have been
   incorporated into this issue of this paper.  
   
   The draft url4  (Internet Draft 00) was generated from url3
   following discussion and overall approval of the URL working group
   on 29 March 1993. The paper url3 had been generated from udi2 in
   the light of discussion at the UDI BOF meeting at the Boston IETF
   in July 1992. Draft url4 was Internet Draft 00. Draft url5
   incorporated changes suggested by Clifford Neuman, and draft url6
   (ID 01) incorporated character group changes and a few other fixes
   defined by the IETF URI WG in submitting it as a proposed standard.
    URL7 (Internet Draft 02) incorporated changes introduced at the
   Amsterdam IETF and refined in net discussion. 
   
   The draft 03 includes changes made at Houston in Nov 93, and on the
   net before Seattle March 1994.
   
                              APPENDICES
                                   
   The following are not formally part of this document.



Berners-Lee                                                          16

Wrappers for URIs in plain text

   This section does not formally form part of the URL specification .
   
   URIs, including URLs, will ideally be transmitted though protocols
   which accept them and data formats which define a context for them.
    However, in practice nowadays there are many occasions when URLs
   are included in plain ASCII non-marked-up text such as electronic
   mail and usenet news messages.
   
   In this case, it is convenient to have a separate wrapper syntax to
   define delimiters which will enable the human or automated reader
   to recognize that the URI is a URI.
   
   The recommendation is that the angle brackets (less than and
   greater than signs) of the ASCII set be used for this purpose.
   
   These wrappers do not form part of the URL, are not mandatory, and
   should not be used in contexts (such as SGML parameters, HTTP
   requests, etc) in which delimiters are already specified. 
   
    Example
    
Yes, Jim, I found it under <ftp://info.cern.ch/pub/www/doc> but
    you can probably pick it up from <ftp://ds.internic.net/rfc>.



                              REFERENCES
                                   
  Alberti, R., et.al.  (1991) 
                          "Notes on the Internet Gopher  Protocol"
                         University of Minnesota, December 1991, 
                         <ftp://boombox.micro.umn.edu/pub/gopher/
                         gopher_protocol> . See also 
                         <gopher://gopher.micro.umn.edu/00/Information
                         About Gopher/About Gopher> 
                         
  Berners-Lee, T ., (1991) 
                          "Hypertext Transfer Protocol (HTTP)" , CERN,
                         December 1991, as updated from time to time, 
                         <ftp://info.cern.ch/pub/www/doc/http-spec.txt
                         > 
                         
  Crocker                "Standard for ARPA Internet Text Messages" .
                         David H. Crocker, RFC822,  
                         
  Davis, F, et  al., (1990) 
                          "WAIS Interface Protocol: Prototype 
                         Functional Specification", Thinking Machines
                         Corporation,  April 23, 1990 
                         <ftp://quake.think.com/pub/wa
                         is/doc/protspec.txt> 



Berners-Lee                                                          17

  International Standards Organization, (1991) 
                          Information and  Documentation - Search and
                         Retrieve Application Protocol  Specification
                         for open Systems Interconnection, ISO-10163 
                         
  Horton (1987)           M. Horton, R. Adams, "Standard for
                         interchange of USENET messages", Internet RFC
                         1036 , 12/01/1987. 
                         
  Huitema, C., (1991)     "Naming: strategies and techniques", 
                         Computer Networks and ISDN Systems 23 (1991)
                         107-110. 
                         
  Kahle, Brewster, (1991)  
                         "Document Identifiers,  or  International
                         Standard Book Numbers for the Electronic
                         Age",
                         <ftp:
                         //quake.think.com/pub/wais/doc/doc-ids.txt> 
                         
  Kantor, B., and Lapsley, P., (1986) 
                         "A proposed standard for  the stream-based
                         transmission of news" , Internet RFC-977,
                         February 1986.
                         <ftp://ds.internic.net/rfc/rfc977.txt> 
                         
  Kunze, 1994             J. Kunze, Requirements for URLs, to be
                         published. 
                         
  Lynch, C., Coallition for Networked Information: (1991)   
                         "Workshop on ID and Reference Structures for
                         Networked Information", November 1991. See
                         <wais://quake.think.com/wais-discussion-ar
                         chives?lynch> 
                         
  Mockapetris, P., (1987) 
                          "Domain names + concepts and  facilities",
                         RFC-1034, USC-ISI, November 1987, 
                         <ftp://ds.internic.net/rfc/rfc1034.txt> 
                         
  Neuman, B. Clifford, (1992) 
                          "Prospero: A Tool for Organizing  Internet
                         Resources", Electronic Networking: Research,
                         Applications and Policy, Vol 1 No 2, Meckler
                         Westport CT  USA.  See also 
                         <ftp://prospero.isi.edu/pub/prospero/oir.ps> 
                         
  Postel, J. and Reynolds, J. (1985) 
                         "File Transfer Protocol  (FTP)", Internet
                         RFC-959, October 1985.
                         <ftp://ds.internic.net/rfc/rfc959.txt> 
                         
  Sollins 1994            K. Sollins and L. Masinter, Requiremnets for



Berners-Lee                                                          18

                         URNs, to be published. 
                         
  Yeong, W., (1991a)      "Towards Networked Information Retrieval", 
                         Technical report 91-06-25-01, June 1991,
                         Performance Systems International, Inc. 
                         <ftp://uu.psi.com/wp/nir.txt> 
                         
  Yeong, W., (1991b),     "Representing Public Archives in the 
                         Directory", Internet Draft, November 1991,
                         now expired. 
                         
   .
   
                          EDITOR'S ADDRESS  
                                   
                           Tim Berners-Lee  
                Address:   World-Wide Web project  
                           CERN,
                           1211 Geneva 23,
                           Switzerland
 
                Telephone: +41 (22)767 3755
                Fax:       +41 (22)767 7155 
                Email:     timbl@info.cern.ch





   



























Berners-Lee                                                          19
URL_Specification_Snapshot Tim Berners-Lee