URL-Reference / "empty URL" question

Klaus Weide <kweide@tezcat.com> Mon, 12 May 1997 22:05 UTC

Received: from cnri by ietf.org id aa19547; 12 May 97 18:05 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa19798; 12 May 97 18:05 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id RAA29741 for uri-out; Mon, 12 May 1997 17:35:43 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com []) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id RAA29724 for <uri@services.bunyip.com>; Mon, 12 May 1997 17:35:33 -0400 (EDT)
Received: from huitzilo.tezcat.com (kweide@huitzilo.tezcat.com []) by mocha.bunyip.com (8.8.5/8.8.5) with ESMTP id RAA15387 for <uri@bunyip.com>; Mon, 12 May 1997 17:35:30 -0400 (EDT)
Received: from localhost (kweide@localhost) by huitzilo.tezcat.com (8.8.5/8.8.5/tezcat-96091001) with SMTP id QAA02368; Mon, 12 May 1997 16:35:11 -0500 (CDT)
Date: Mon, 12 May 1997 16:35:10 -0500
From: Klaus Weide <kweide@tezcat.com>
Reply-To: Klaus Weide <kweide@tezcat.com>
To: fielding@ics.uci.edu
cc: uri@bunyip.com
Subject: URL-Reference / "empty URL" question
Message-ID: <Pine.SUN.3.95.970512144020.17794C-100000@huitzilo.tezcat.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk

The following question came up for Lynx development.  I am not so sure
it is actually an URI topic, but the current url-syntax draft is the
only reference I could find which is specific about this.  I would
especially like to hear the draft authors' opinion, whether I am reading
something into it that it wasn't meant to say.

Newer versions of Lynx now try to support the meaning of fragment-only
URL-References as given by draft-fielding-url-syntax-05: (within HTML)
they refer to the current document (and are not resolved with respect
to a Content-Base header, BASE tag etc.)

Consider a document retrieved by the client by accessing from
<http://a.host/a.file.html>, which contains the following three hyperlinks:

 <A NAME="top">Top<A>
 <A NAME=link-1" HREF="http://a.host/a.file.html"    >link one   </A>
 <A NAME=link-2" HREF="http://a.host/a.file.html#top">link two   </A>
 <A NAME=link-3" HREF="#top"                         >link three </A>

For simplicity I use absolute URLs, and also assume there is no
Content-Base header or BASE tag in effect.

Now this document is received under some condition that makes it
uncachable, say a "Cache-Control: no-cache" header.  (Something like
this is needed to make a difference between link-2 and link-3 observable 
rather than purely theoretical.)

(At least with the Lynx code currently under development,) activating
("following") link-1 will result in a new network request.  Activating
link-3 will not, but will just change the view of the current document,
in accordance with the last sentence of (from section 3 of the draft):

   A URL reference which does not contain a URL is a reference to the
   current document.  In other words, an empty URL reference within a
   document is interpreted as a reference to the start of that document,
   and a reference containing only a fragment identifier is a reference
   to the identified fragment of that document.  Traversal of such a
   reference should not result in an additional retrieval action.

The question is, what happens with link-2 - should following it result
in a new request, as for link-1, or just repositioning within the
already loaded document as for link-3?

It appears to me that this should go with the link-1 case.  The fact
that the URL of the target coincides with that of the current document
should be taken as pure coincidence, and no meaning should be inferred
from it.  The draft does not say anything about this case, but the
presence or absense of a fragment in a URL-Reference should not change
whether a new request of the resource is implied.

The opposite oppinion is demonstrated on the test page
<http://www.wfbr.edu/lynx/no-cache.html>.  The reason given is
essentially (translated to my example above) that link-2 and link-3
resolve to the same thing, so they must have the same effect.

My reading of the draft is that they do not resolve to the same thing,
and that implementing things this way (first "resolve" a given
URL-Reference into a "full" URL-Reference with a non-empty absolute
URL, then do all further processing with that) actually contradicts
the draft - although it probably is used by a lot of implementations.
The relevant (for my example) steps of the algorithm in 6.2:
   2) If the path component is empty and the scheme, site, and query
      components are undefined, then it is a reference to the current
      document and we are done.
   3) If the scheme component is defined, indicating that the reference
      starts with a scheme name, then the reference is interpreted as an
      absolute URL and we are done.  Otherwise, [...]

The first "we are done" is not followed by anything like constructing a
new URL-Reference from the original one by combining the current
document's URL with the given fragment.  The reference in this case
already IS what it is and need not be further "resolved".
This also seems to be the only interpretation which would make the
distinction in the last paragraph of section 3 meaningful:

  However, if the URL reference occurs in a 
  [specific context, like FORM, IMG,  ... ]                  then
  an empty URL reference represents the URL of the current document
  and should be replaced by that URL when transformed into a request.

Any comments appreciated..