Re: URL spec changes

Tim Berners-Lee <timbl@ptpc00.cern.ch> Mon, 28 March 1994 15:46 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa29469; 28 Mar 94 10:46 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa29465; 28 Mar 94 10:46 EST
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa16890; 28 Mar 94 10:45 EST
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA23512 on Mon, 28 Mar 94 09:14:36 -0500
Received: from dxmint.cern.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA23508 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Mon, 28 Mar 94 09:14:26 -0500
Received: from ptpc00.cern.ch by dxmint.cern.ch (5.65/DEC-Ultrix/4.3) id AA12869; Mon, 28 Mar 1994 16:14:23 +0200
Received: by ptpc00.cern.ch (NX5.67d/NX3.0S) id AA00427; Mon, 28 Mar 94 16:13:50 +0200
Date: Mon, 28 Mar 1994 16:13:50 +0200
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Tim Berners-Lee <timbl@ptpc00.cern.ch>
Message-Id: <9403281413.AA00427@ptpc00.cern.ch>
Received: by NeXT.Mailer (1.95)
Received: by NeXT Mailer (1.95)
To: Larry Masinter <masinter@parc.xerox.com>
Subject: Re: URL spec changes
Cc: uri@bunyip.com
Reply-To: timbl@www0.cern.ch

New hypertext and text and postscript versions of the URL spec are online,
as <URL:http://info.cern.ch/hypertext/WWW/Addressing/Addressing.html>
and <URL:ftp://info.cern.ch/pub/www/doc/draft-uri-url-03.{txt,ps}>
taking into account the following.
__________________________________________________________________

> From: Larry Masinter <masinter@parc.xerox.com>
> Date: 	Fri, 25 Mar 1994 20:02:17 PST
> 

> Some wording suggestions. I hope you don't think it is too
> presumptuous of me to suggest such massive rewrites,

Larry,
  I am grateful for the effort you have put in reading through
the document!  I have put in almost all of your comments. Not
this first one.

> but I believe all
> of my suggestions are of wording and not of content, with the
> exception of the suggestion to use:
> 	type=ascii
> 	type=binary
> 	type=directory
> 

> instead of type=a, type=i, type=d

Maybe a show of hands on Tuesday?
BTW "tenex" isn't mentioned in RFC959 -- there are RFCs about
it RFC458, RFC478, RFC571.  The alternatives to A and I modes
are LOCAL(byte) and EBCDIC. This is from RFC959:

            <type-code> ::= A [<sp> <form-code>]
                          | E [<sp> <form-code>]
                          | I
                          | L <sp> <byte-size>

Wanna put those in? Why not just leave the field opaque to it? 


	; type = <type-code>

The URL isn't a recommended use profile for FTP.  It's an
encoding.
Why reencode something which has already a standard encoding?
After all, we've almost done it with %09 in gopher.

> ================================================================
> > An optional user name,
> >                         if this must be quoted to the server,
> >                         followed by  a commercial at sign "@".  (Use
> >                         of this field is discouraged. Provision of
> >                         encoding a password after the user name,
> >                         delimited by a colon, could  be made but
> >                         obviously is only useful when the password is
> >                         public, in  which case it should not be
> >                         necessary, so that is also discouraged.)
>                          

> a) I think "supplied" is better than "quoted".
> b) there are cases (e.g., "guest" password "guest") where the password
>    is present and necessary, even though it is public. I'd like to
>    avoid the "should".
> 			
> 

> How about:
> 

> <<================================================================
> An optional user name (with optional password)
> 			if required (as it is with a few FTP servers).
> 			The password, if present, follows the user
> 			name delimited by a colon; the user name
> 			and optional password are followed by a
> 			commercial at-sign "@". The use of user name
> 			and passwords (which are public) is
> 			discouraged.

Slight change to

An optional user name,	 if required (as it is with a few FTP servers). The  
password, is present, follows the user name, separated from it by a colon;  
the user name and optional password are followed by a commercial at sign "@".   
The user of user name and passwords which are public is discouraged.

[as "delimited" sometimes something on the end not the beginning, and
as the internal (private use) of URLs containing private passwords
is not what we are discouraging. We aren't stating that passwords
are public.]


> <<================================================================
> While the protocol determines the interpretation of the path,
> generally, the slash "/" denotes a level in a hierarchical structure.
> ================================================================>>

Ok.


> I'd suggest "... scheme uses characters which are not allowed in a URL,..."
> lest we debate whether %FF is ASCII, and change "the" to "a".


OK


> >   (Note: If a new naming scheme is introduced which encodes binary
> >   data as opposed to text, then a more compact encoding such as pure
> >   hexadecimal or base 64 would be more appropriate.)
> 

> At this point, I think this is irrelevant and distracting, and I
> suggest you take it out.

Gone into hyperspace.

> >  mid                     Message identifiers for electroni mail
>                                                             ^c

Thanks

> >   The schemes for x.500, network management database and whois++ have
> >   not been specified and may be the subject of futher study.
> 

> I don't like to see temporal references to activities in standard
> documents, since the document will last longer than the activities.
> How about just:
> 

> <<================================================================
>    Other schemes may be specified by future specifications.
> ================================================================>>

OK



> >   The url: prefix is reserved for use in encoding a Uniform Resource
> >   Name when that has been developed by the IETF working group.
> 

> You mean 'urn:', don't you?

	Yes -- oops
	
>	I'd just take this out. The URN standard
> will reserve it, this document doesn't need to.

	OK.
		
	But now we have the "url:" prefix, encoding a URN in the same
	space as the URN is out of the question.  There is no document
	but an understanding that the prfixes "URL", "URN"
	and "TelephoneNumber" are distinct.  WWW can still use
	its URI concept via a simple mapping.]
____________________________________________________________________
> >    The ftp: prefix indicates a file which is to be picked up from the
> >    file system of the given host. The FTP protocol is used, as defined
> >   in RFC957 or any successor. The port number, if present, gives the
> >   port of the FTP server if not the FTP default. (A client may in
> >   practice use local file access to retrieve objects which are
> >   available though more efficient means such as local file open or
> >   NFS mounting, where this is available and equivalent).
> 

> 

> Not really. FTP urls are perfectly useful for retrieving data from FTP
> servers that do things like automatically uncompressing data or making
> .tar.Z files. As we've seen, most anonymous FTP servers do NOT export
> their root file system, and the advice about smart clients is
> misplaced and confusing. It is RFC 959, not 957. How about:

I've taken it out, although for example on for my anonymous FTP
server, equiavlent local access it available to me and I use it.
But people will figure that out.


> <<================================================================
>  The ftp: prefix indicates data to be retrieved using the FTP protocol
					^^^^^^^^^
>  (RFC 957).  The port number, if present, gives the port of the FTP
>  server if not the FTP default.
> ================================================================>>

OK - I have left "FTP used" rather than "retrieved" as FTP objects can be
stored and replaced too.

> >      Path
> >
> >   The FTP protocol allows for a sequence of CWD commands (change
> >   working directory) prior to a RETR (retrieve) which actually
> >   accesses a file.  The arguments of any CWD commands are successive
> >   segment parts of the URL, and the filename argument to the RETR
> >   command is the final segment of the URL path.
> 

> RETR (retrieve) or NLST (list). This should also mention 'type'.

OK.  remmeber though that we are defining a sequence which will work
 to retrieve a file, but we are leaving it up to the user what [s[he
 does with it.  Like [s]he may want to rename it.. We can't define that
 here.

...]
> 

> >   (This note previously read ...) 

> 

> I would take out all of the advice about how to write efficient
> clients; I think it is misplaced in a specification of the MEANING of
> URLs. (Also, you spelled URL as URI once.) You don't want to leave the
> 'this note previously read' section in the final draft, of course.
> That would leave it with:
> 

[...]

OK -- though I'll stick to a heading of "note" as its document style.


> >      Data type
> >      

> >   The data format of a file can only, in the general FTP case, be
> >   deduced from the name, normally the suffix of the name. This is not
> >   standardized. An alternative is for it to be transferred in
> >   information outside the URL. The transfer mode (binary or text)
> >   must in turn be deduced from the data format.  It is recommended
> >   that conventions for suffixes of public archives be established,
> >   but it is outside the scope of this paper.
> 

> I think, given the potential confusion of `media type' vs `FTP
> data type' that it's good to be explicit.

	I have used the phrase "content type" as it is that used
	by MIME.  "media type" sounds like tape vs disk to me.
	
> Also, the FTP paramater
> in RFC959 allowed for E (EBCDIC) as well as A and I, and various
> paramaters having to do with Telnet format effectors, non-print, local
> byte size, etc. I don't think we need any of those, but we should say
> so. I suggest:

I guess this can be discussed but as I said above, to make the spec
transparent to this definition is safest and cleanest.


> ================================================================>>>
> ...
> 

> >  GOPHER
> >  

> >   Gopher selector strings may contain any characters other than tab,
> >   return, or  linefeed, so it is important to encode all disallowed
> >   characters and encode any  space characters so these characters are
> >   not altered during transport of the  URL. Note that since gopher
> >   selector string are opaque and in many cases map to  native file
> >   system of the gopher server, so encoding of disallowed characters
> >   in the selector string is to map to binary codes rather than ISO
> >   character  sets. In other words, the "%" character followed by two
> >   hexadecimal digits is  used to encode binary data. Clients shall
> >   not interpret gopher selector strings. While many Gopher servers
> >   map to Unix file systems, you cannot assume that "/"  characters
> >   imply a heirarchy since Gopher servers on non-Unix file systems may
> >    use the "/" as part of a file name.
> 

> Just some word-smithing; avoiding repetition.
> <<================================================================
>   GOPHER
>   

>    Gopher selector strings are, in general, interpreted as a sequence
>    of 8-bit bytes which may contain any characters other than tab,
>    return, or linefeed. It is necessary to encode any disallowed
>    characters, including spaces and other binary data not in the
>    allowed character set, using the standard convention of the "%"
>    character followed by two hexadecimal digits.
> 

>    Note that slash "/" in gopher selector strings may not correspond
>    to a hierarchical structure.

OK (basically... I put "disallowed in a URL", and "level in a"  
hierarchical...

> 

[..]
> This is backward, and you have two number (4)s. You want:
> 

> <<<================================================================
>        If the URL does not refer to a Gopher+ item and there is no
>        gopher search string, then parts 3, 4, 5 and 6 are optional.
> 

> 	3. An encoded tab ... etc..


OK
> 

> 

> >  MAILTO
> >  

> >   This allows a URL to specify an RFC822 addr-spec mail address.
> >   Note that use of % , for example as used in forming a gatewayed
> >   mail address, requires conversion to %25 in a URL.
> >   

> >   This semantics may be considered to be that the object referred to
> >   by the mailto: URL is the set of messages sent to or from that
> >   address. There is no algorithm to retrieve this set, but the SMTP
> >   protocol allows messages to be added to it, and any given user may
> >   be aware of a subset of its members.
> 

> I already sent a correction to this in a previous message.


OK. Incorporated in previous message of Friday afternoon.

> I'm running out of steam, and coming to things that I've already sent
> mail about, so I think I'll send this off now.

Thanks for all the input, Larry.


Tim