Larry Masinter <masinter@parc.xerox.com> Sat, 26 March 1994 10:37 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa00901; 26 Mar 94 5:37 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa00897; 26 Mar 94 5:37 EST
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa03273; 26 Mar 94 5:37 EST
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA05040 on Fri, 25 Mar 94 23:03:32 -0500
Received: from alpha.Xerox.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA05036 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Fri, 25 Mar 94 23:03:01 -0500
Received: from golden.parc.xerox.com ([13.1.100.139]) by alpha.xerox.com with SMTP id <14502(3)>; Fri, 25 Mar 1994 20:02:24 PST
Received: by golden.parc.xerox.com id <2732>; Fri, 25 Mar 1994 20:02:29 -0800
To: timbl@ptpc00.cern.ch
Cc: uri@bunyip.com
In-Reply-To: timbl@ptpc00.cern.ch's message of Fri, 25 Mar 1994 17:07:03 -0800 <94Mar25.170711pst.2732@golden.parc.xerox.com>
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Larry Masinter <masinter@parc.xerox.com>
X-Orig-Sender: Larry Masinter <masinter@parc.xerox.com>
Fake-Sender: masinter@parc.xerox.com
Message-Id: <94Mar25.200229pst.2732@golden.parc.xerox.com>
Date: Fri, 25 Mar 1994 20:02:17 -0800
Some wording suggestions. I hope you don't think it is too presumptuous of me to suggest such massive rewrites, but I believe all of my suggestions are of wording and not of content, with the exception of the suggestion to use: type=ascii type=binary type=directory instead of type=a, type=i, type=d ================================================================ > An optional user name, > if this must be quoted to the server, > followed by a commercial at sign "@". (Use > of this field is discouraged. Provision of > encoding a password after the user name, > delimited by a colon, could be made but > obviously is only useful when the password is > public, in which case it should not be > necessary, so that is also discouraged.) a) I think "supplied" is better than "quoted". b) there are cases (e.g., "guest" password "guest") where the password is present and necessary, even though it is public. I'd like to avoid the "should". How about: <<================================================================ An optional user name (with optional password) if required (as it is with a few FTP servers). The password, if present, follows the user name delimited by a colon; the user name and optional password are followed by a commercial at-sign "@". The use of user name and passwords (which are public) is discouraged. ================================================================>> > The path is interpreted in a manner dependent on the protocol being > used. However, when it contains slashes, these must imply a > hierarchical structure. How about just taking out "must", especially since it is violated later on by Gopher. <<================================================================ While the protocol determines the interpretation of the path, generally, the slash "/" denotes a level in a hierarchical structure. ================================================================>> > The following encoding method shall be used for mapping WAIS, FTP, > Prospero and Gopher addresses onto URLs. Where the local naming > scheme uses ASCII characters which are not allowed in the URL, > these may be represented in the URL by a percent sign "%" followed I'd suggest "... scheme uses characters which are not allowed in a URL, ..." lest we debate whether %FF is ASCII, and change "the" to "a". > (Note: If a new naming scheme is introduced which encodes binary > data as opposed to text, then a more compact encoding such as pure > hexadecimal or base 64 would be more appropriate.) At this point, I think this is irrelevant and distracting, and I suggest you take it out. > mid Message identifiers for electroni mail ^c > The schemes for x.500, network management database and whois++ have > not been specified and may be the subject of futher study. I don't like to see temporal references to activities in standard documents, since the document will last longer than the activities. How about just: <<================================================================ Other schemes may be specified by future specifications. ================================================================>> > The url: prefix is reserved for use in encoding a Uniform Resource > Name when that has been developed by the IETF working group. You mean 'urn:', don't you? I'd just take this out. The URN standard will reserve it, this document doesn't need to. > The ftp: prefix indicates a file which is to be picked up from the > file system of the given host. The FTP protocol is used, as defined > in RFC957 or any successor. The port number, if present, gives the > port of the FTP server if not the FTP default. (A client may in > practice use local file access to retrieve objects which are > available though more efficient means such as local file open or > NFS mounting, where this is available and equivalent). Not really. FTP urls are perfectly useful for retrieving data from FTP servers that do things like automatically uncompressing data or making .tar.Z files. As we've seen, most anonymous FTP servers do NOT export their root file system, and the advice about smart clients is misplaced and confusing. It is RFC 959, not 957. How about: <<================================================================ The ftp: prefix indicates data to be retrieved using the FTP protocol (RFC 957). The port number, if present, gives the port of the FTP server if not the FTP default. ================================================================>> > Path > > The FTP protocol allows for a sequence of CWD commands (change > working directory) prior to a RETR (retrieve) which actually > accesses a file. The arguments of any CWD commands are successive > segment parts of the URL, and the filename argument to the RETR > command is the final segment of the URL path. RETR (retrieve) or NLST (list). This should also mention 'type'. > Note > > In the case in which the file system of the server is known or > guessed by the client, the path may possibly converted into a > filename. This may (in some cases) allow the file to be retrieved > in one RETR command with no CWD command. In the case of unix, the > filename will in fact look the same as the URI path. This must NOT > be taken to indicate that the URL is a unix filename. In > practice, as many FTP servers in fact have or emulate unix file > systems, it may in fact be time-efficient to attempt first a direct > retrieval guessing unix syntax, and, if that fails, to attempt the > official sequence of succession of directory changes followed by a > RETR command. > There is no common hierarchical model to the FTP protocol, so if a > directory change command has been given, it is impossible in > general to deduce what sequence should be given to navigate to > another directory for a second retrieval, if the paths are > different. The only reliable algorithm is to disconnect and > reestablish the control connection. However, if no directory > changes have been made, but direct retrieval has been done, then > the control connection may be kept. Another possible > uninvestigated method is to use CDUP on the trial assumption of a > hierarchical structure to return a point in common between the > first and second URLs. > (This note previously read ...) I would take out all of the advice about how to write efficient clients; I think it is misplaced in a specification of the MEANING of URLs. (Also, you spelled URL as URI once.) You don't want to leave the 'this note previously read' section in the final draft, of course. That would leave it with: <<================================================================ Path The path in a FTP URL specifies a sequence of CWD (change working directory) commands, followed either by a TYPE (data type) and RETR (retrieve) command, or else a NLST (name list) command. The arguments of any CWD commands are successive segments of the URL delimited by slash "/", the filename argument to the RETR command is the final segment of the URL path. For some file systems (Unix in particular), the "/" used to denote the hierarchical structure of the URL corresponds to the delimiter used to construct a file name hierarchy, and thus, the filename will look the same as the URL path. This does NOT mean that the URL is a Unix filename. (A note that for clients retrieving subsequent URLs from the same host: There is no common hierarchical model to the FTP protocol, so if a CWD command has been given, it is impossible in general to deduce what sequence should be given to navigate to another directory for a second retrieval, if the paths are different. The only reliable algorithm is to disconnect and reestablish the control connection.) > Data type > > The data format of a file can only, in the general FTP case, be > deduced from the name, normally the suffix of the name. This is not > standardized. An alternative is for it to be transferred in > information outside the URL. The transfer mode (binary or text) > must in turn be deduced from the data format. It is recommended > that conventions for suffixes of public archives be established, > but it is outside the scope of this paper. I think, given the potential confusion of `media type' vs `FTP data type' that it's good to be explicit. Also, the FTP paramater in RFC959 allowed for E (EBCDIC) as well as A and I, and various paramaters having to do with Telnet format effectors, non-print, local byte size, etc. I don't think we need any of those, but we should say so. I suggest: <<<================================================================ FTP data type or directory specification The FTP protocol specification allowed for a wide variety of data representation types and transfer modes. The URL specification currently only allows three modifiers to the path: ;type=ascii ;type=binary ;type=directory The first two correspond to the "TYPE A" and "TYPE I" directives in the FTP protocol; the second designates that the last component of the URL is not a path name but a file group designator, to be used by the "NLST" command. The type specification is optional; if omitted, it is assumed that type=directory for URLs that end with a slash (i.e., the last slash-separated segment is empty), and that the client must guess the type for all others, based on the last component of the path. ================================================================>>> ... > GOPHER > > Gopher selector strings may contain any characters other than tab, > return, or linefeed, so it is important to encode all disallowed > characters and encode any space characters so these characters are > not altered during transport of the URL. Note that since gopher > selector string are opaque and in many cases map to native file > system of the gopher server, so encoding of disallowed characters > in the selector string is to map to binary codes rather than ISO > character sets. In other words, the "%" character followed by two > hexadecimal digits is used to encode binary data. Clients shall > not interpret gopher selector strings. While many Gopher servers > map to Unix file systems, you cannot assume that "/" characters > imply a heirarchy since Gopher servers on non-Unix file systems may > use the "/" as part of a file name. Just some word-smithing; avoiding repetition. <<================================================================ GOPHER Gopher selector strings are, in general, interpreted as a sequence of 8-bit bytes which may contain any characters other than tab, return, or linefeed. It is necessary to encode any disallowed characters, including spaces and other binary data not in the allowed character set, using the standard convention of the "%" character followed by two hexadecimal digits. Note that slash "/" in gopher selector strings may not correspond to a hierarchical structure. ================================================================>>> > 3. An encoded tab character (%09) to seperate the gopher > selector string from the optional search string (see 4 below). > > 4. If the URL does not refer to a Gopher+ item and if there is > no gopher search string then parts 3, 4, 5, and 6 of the URL > are optional > > 4.) The gopher search string. If the URL refers to a search to > be submitted to a gopher search engine, the search string is > required. Otherwise this is an empty string. This is backward, and you have two number (4)s. You want: <<<================================================================ If the URL does not refer to a Gopher+ item and there is no gopher search string, then parts 3, 4, 5 and 6 are optional. 3. An encoded tab ... etc.. ... ================================================================>>> > MAILTO > > This allows a URL to specify an RFC822 addr-spec mail address. > Note that use of % , for example as used in forming a gatewayed > mail address, requires conversion to %25 in a URL. > > This semantics may be considered to be that the object referred to > by the mailto: URL is the set of messages sent to or from that > address. There is no algorithm to retrieve this set, but the SMTP > protocol allows messages to be added to it, and any given user may > be aware of a subset of its members. I already sent a correction to this in a previous message. I'm running out of steam, and coming to things that I've already sent mail about, so I think I'll send this off now.