Larry Masinter <masinter@parc.xerox.com> Sat, 26 March 1994 10:37 UTC
To: timbl@ptpc00.cern.ch
Cc: uri@bunyip.com
In-Reply-To: timbl@ptpc00.cern.ch's message of Fri, 25 Mar 1994 17:07:03 -0800 <94Mar25.170711pst.2732@golden.parc.xerox.com>
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Larry Masinter <masinter@parc.xerox.com>
Fake-Sender: masinter@parc.xerox.com
Message-Id: <94Mar25.200229pst.2732@golden.parc.xerox.com>
Date: Fri, 25 Mar 1994 20:02:17 -0800
Some wording suggestions. I hope you don't think it is too
presumptuous of me to suggest such massive rewrites, but I believe all
of my suggestions are of wording and not of content, with the
exception of the suggestion to use:
	type=ascii
	type=binary
	type=directory

instead of type=a, type=i, type=d

================================================================
> An optional user name,
>                         if this must be quoted to the server,
>                         followed by  a commercial at sign "@".  (Use
>                         of this field is discouraged. Provision of
>                         encoding a password after the user name,
>                         delimited by a colon, could  be made but
>                         obviously is only useful when the password is
>                         public, in  which case it should not be
>                         necessary, so that is also discouraged.)
                         
a) I think "supplied" is better than "quoted".
b) there are cases (e.g., "guest" password "guest") where the password
   is present and necessary, even though it is public. I'd like to
   avoid the "should".
			

How about:

<<================================================================
An optional user name (with optional password)
			if required (as it is with a few FTP servers).
			The password, if present, follows the user
			name delimited by a colon; the user name
			and optional password are followed by a
			commercial at-sign "@". The use of user name
			and passwords (which are public) is
			discouraged.
================================================================>>

>   The path is interpreted in a manner dependent on the protocol being
>   used. However, when it contains slashes, these must imply a
>   hierarchical structure.

How about just taking out "must", especially since it is violated
later on by Gopher.

<<================================================================
While the protocol determines the interpretation of the path,
generally, the slash "/" denotes a level in a hierarchical structure.
================================================================>>

>   The following encoding method shall be used for mapping WAIS, FTP,
>   Prospero and Gopher addresses onto URLs. Where the local naming
>   scheme uses ASCII characters which are not allowed in the URL,
>   these may be represented in the URL by a percent sign "%" followed

I'd suggest "... scheme uses characters which are not allowed in a URL, ..."
lest we debate whether %FF is ASCII, and change "the" to "a".

>   (Note: If a new naming scheme is introduced which encodes binary
>   data as opposed to text, then a more compact encoding such as pure
>   hexadecimal or base 64 would be more appropriate.)

At this point, I think this is irrelevant and distracting, and I
suggest you take it out.

>  mid                     Message identifiers for electroni mail
                                                            ^c

>   The schemes for x.500, network management database and whois++ have
>   not been specified and may be the subject of futher study.

I don't like to see temporal references to activities in standard
documents, since the document will last longer than the activities.
How about just:

<<================================================================
   Other schemes may be specified by future specifications.
================================================================>>

>   The url: prefix is reserved for use in encoding a Uniform Resource
>   Name when that has been developed by the IETF working group.

You mean 'urn:', don't you? I'd just take this out. The URN standard
will reserve it, this document doesn't need to.

>    The ftp: prefix indicates a file which is to be picked up from the
>    file system of the given host. The FTP protocol is used, as defined
>   in RFC957 or any successor. The port number, if present, gives the
>   port of the FTP server if not the FTP default. (A client may in
>   practice use local file access to retrieve objects which are
>   available though more efficient means such as local file open or
>   NFS mounting, where this is available and equivalent).


Not really. FTP urls are perfectly useful for retrieving data from FTP
servers that do things like automatically uncompressing data or making
.tar.Z files. As we've seen, most anonymous FTP servers do NOT export
their root file system, and the advice about smart clients is
misplaced and confusing. It is RFC 959, not 957. How about:

<<================================================================
 The ftp: prefix indicates data to be retrieved using the FTP protocol
 (RFC 957).  The port number, if present, gives the port of the FTP
 server if not the FTP default.
================================================================>>

>      Path
>
>   The FTP protocol allows for a sequence of CWD commands (change
>   working directory) prior to a RETR (retrieve) which actually
>   accesses a file.  The arguments of any CWD commands are successive
>   segment parts of the URL, and the filename argument to the RETR
>   command is the final segment of the URL path.

RETR (retrieve) or NLST (list). This should also mention 'type'.

>        Note
>
>   In the case in which the file system of the server is known or
>   guessed by the client, the path may possibly converted into a
>   filename.  This may (in some cases)  allow the file to be retrieved
>   in one RETR command with no CWD command. In the case of unix, the
>   filename will in fact look the same as the URI path.  This must NOT
>   be taken to indicate that the URL is a unix filename.   In
>   practice, as many FTP servers in fact have or emulate unix file
>   systems, it may in fact be time-efficient to attempt first a direct
>   retrieval guessing unix syntax, and, if that fails, to attempt the
>   official sequence of succession of directory changes followed by a
>   RETR command.

>   There is no common hierarchical model to the FTP protocol, so if a
>   directory change command has been given, it is impossible in
>   general to deduce what sequence should be given to navigate to
>   another directory for a second retrieval, if the paths are
>   different.  The only reliable algorithm is to disconnect and
>   reestablish the control connection.  However, if no directory
>   changes have been made, but direct retrieval has been done, then
>   the control connection may be kept.  Another possible
>   uninvestigated method is to use CDUP on the trial assumption of a
>   hierarchical structure to return a point in common between the
>   first and second URLs.

>   (This note previously read ...) 

I would take out all of the advice about how to write efficient
clients; I think it is misplaced in a specification of the MEANING of
URLs. (Also, you spelled URL as URI once.) You don't want to leave the
'this note previously read' section in the final draft, of course.
That would leave it with:

<<================================================================
     Path

  The path in a FTP URL specifies a sequence of CWD (change working
  directory) commands, followed either by a TYPE (data type) and RETR
  (retrieve) command, or else a NLST (name list) command.

  The arguments of any CWD commands are successive segments of the URL
  delimited by slash "/", the filename argument to the RETR command is
  the final segment of the URL path.

  For some file systems (Unix in particular), the "/" used to denote
  the hierarchical structure of the URL corresponds to the delimiter
  used to construct a file name hierarchy, and thus, the filename will
  look the same as the URL path. This does NOT mean that the URL is a
  Unix filename.

  (A note that for clients retrieving subsequent URLs from the same
  host: There is no common hierarchical model to the FTP protocol, so
  if a CWD command has been given, it is impossible in general to
  deduce what sequence should be given to navigate to another
  directory for a second retrieval, if the paths are different.  The
  only reliable algorithm is to disconnect and reestablish the control
  connection.)

>      Data type
>      
>   The data format of a file can only, in the general FTP case, be
>   deduced from the name, normally the suffix of the name. This is not
>   standardized. An alternative is for it to be transferred in
>   information outside the URL. The transfer mode (binary or text)
>   must in turn be deduced from the data format.  It is recommended
>   that conventions for suffixes of public archives be established,
>   but it is outside the scope of this paper.

I think, given the potential confusion of `media type' vs `FTP
data type' that it's good to be explicit. Also, the FTP paramater
in RFC959 allowed for E (EBCDIC) as well as A and I, and various
paramaters having to do with Telnet format effectors, non-print, local
byte size, etc. I don't think we need any of those, but we should say
so. I suggest:

<<<================================================================
   FTP data type or directory specification

   The FTP protocol specification allowed for a wide variety of data
representation types and transfer modes. The URL specification
currently only allows three modifiers to the path:

	;type=ascii
	;type=binary
	;type=directory

The first two correspond to the "TYPE A" and "TYPE I" directives in
the FTP protocol; the second designates that the last component of the
URL is not a path name but a file group designator, to be used by the
"NLST" command. The type specification is optional; if omitted, it is
assumed that type=directory for URLs that end with a slash (i.e., the
last slash-separated segment is empty), and that the client must guess
the type for all others, based on the last component of the path.


================================================================>>>
...

>  GOPHER
>  
>   Gopher selector strings may contain any characters other than tab,
>   return, or  linefeed, so it is important to encode all disallowed
>   characters and encode any  space characters so these characters are
>   not altered during transport of the  URL. Note that since gopher
>   selector string are opaque and in many cases map to  native file
>   system of the gopher server, so encoding of disallowed characters
>   in the selector string is to map to binary codes rather than ISO
>   character  sets. In other words, the "%" character followed by two
>   hexadecimal digits is  used to encode binary data. Clients shall
>   not interpret gopher selector strings. While many Gopher servers
>   map to Unix file systems, you cannot assume that "/"  characters
>   imply a heirarchy since Gopher servers on non-Unix file systems may
>    use the "/" as part of a file name.

Just some word-smithing; avoiding repetition.
<<================================================================
  GOPHER
  
   Gopher selector strings are, in general, interpreted as a sequence
   of 8-bit bytes which may contain any characters other than tab,
   return, or linefeed. It is necessary to encode any disallowed
   characters, including spaces and other binary data not in the
   allowed character set, using the standard convention of the "%"
   character followed by two hexadecimal digits.

   Note that slash "/" in gopher selector strings may not correspond
   to a hierarchical structure.
================================================================>>>

>      3. An encoded tab character (%09) to seperate the gopher
>      selector string from the optional search string (see 4 below).
>      
>      4. If the URL does not refer to a Gopher+ item and if there is
>      no gopher search  string then parts 3, 4, 5, and 6 of the URL
>      are optional
>
>      4.) The gopher search string.  If the URL refers to a search to
>      be submitted to a gopher search engine, the  search string is
>      required. Otherwise this is an empty string.
      

This is backward, and you have two number (4)s. You want:

<<<================================================================
       If the URL does not refer to a Gopher+ item and there is no
       gopher search string, then parts 3, 4, 5 and 6 are optional.

	3. An encoded tab ... etc..

	...

================================================================>>>


>  MAILTO
>  
>   This allows a URL to specify an RFC822 addr-spec mail address.
>   Note that use of % , for example as used in forming a gatewayed
>   mail address, requires conversion to %25 in a URL.
>   
>   This semantics may be considered to be that the object referred to
>   by the mailto: URL is the set of messages sent to or from that
>   address. There is no algorithm to retrieve this set, but the SMTP
>   protocol allows messages to be added to it, and any given user may
>   be aware of a subset of its members.

I already sent a correction to this in a previous message.

I'm running out of steam, and coming to things that I've already sent
mail about, so I think I'll send this off now.
Larry Masinter