Re: #295: Applying original fragment to "plain" redirected URI (also #43)

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Wed, 04 January 2012 10:29 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0848C21F85E4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 4 Jan 2012 02:29:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.787
X-Spam-Level:
X-Spam-Status: No, score=-1.787 tagged_above=-999 required=5 tests=[AWL=4.312, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, J_CHICKENPOX_15=0.6, J_CHICKENPOX_16=0.6, J_CHICKENPOX_17=0.6, J_CHICKENPOX_18=0.6, J_CHICKENPOX_19=0.6, J_CHICKENPOX_48=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CvfiERMmJ4Ah for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 4 Jan 2012 02:29:26 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 19F4021F85E1 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 4 Jan 2012 02:29:26 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.69) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1RiO5F-0000Y6-G8 for ietf-http-wg-dist@listhub.w3.org; Wed, 04 Jan 2012 10:28:37 +0000
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.69) (envelope-from <duerst@it.aoyama.ac.jp>) id 1RiO58-0000XB-LS for ietf-http-wg@listhub.w3.org; Wed, 04 Jan 2012 10:28:30 +0000
Received: from scintmta02.scbb.aoyama.ac.jp ([133.2.253.34]) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <duerst@it.aoyama.ac.jp>) id 1RiO55-0005rq-W6 for ietf-http-wg@w3.org; Wed, 04 Jan 2012 10:28:30 +0000
Received: from scmse02.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id q04ARxx9004896 for <ietf-http-wg@w3.org>; Wed, 4 Jan 2012 19:27:59 +0900
Received: from (unknown [133.2.206.133]) by scmse02.scbb.aoyama.ac.jp with smtp id 2de9_9578_c63e1a36_36be_11e1_b3e1_001d096c5782; Wed, 04 Jan 2012 19:27:59 +0900
Received: from [IPv6:::1] ([133.2.210.1]:41817) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S1585AAB> for <ietf-http-wg@w3.org> from <duerst@it.aoyama.ac.jp>; Wed, 4 Jan 2012 19:28:03 +0900
Message-ID: <4F0429A8.3090008@it.aoyama.ac.jp>
Date: Wed, 04 Jan 2012 19:27:52 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: Julian Reschke <julian.reschke@gmx.de>
CC: httpbis Group <ietf-http-wg@w3.org>
References: <6A53E99A-019D-4F6D-A33D-24524CD34E17@mnot.net> <4EFDFA17.4080804@gmx.de> <4F031419.1050708@gmx.de>
In-Reply-To: <4F031419.1050708@gmx.de>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=133.2.253.34; envelope-from=duerst@it.aoyama.ac.jp; helo=scintmta02.scbb.aoyama.ac.jp
X-W3C-Hub-Spam-Status: No, score=-3.8
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RP_MATCHES_RCVD=-1.935
X-W3C-Scan-Sig: maggie.w3.org 1RiO55-0005rq-W6 99df48a3b05976dce126878c1442d638
X-Original-To: ietf-http-wg@w3.org
Subject: Re: #295: Applying original fragment to "plain" redirected URI (also #43)
Archived-At: <http://www.w3.org/mid/4F0429A8.3090008@it.aoyama.ac.jp>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/11975
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Resent-Message-Id: <E1RiO5F-0000Y6-G8@frink.w3.org>
Resent-Date: Wed, 04 Jan 2012 10:28:37 +0000

Hello Julian, others,

On 2012/01/03 23:43, Julian Reschke wrote:
> On 2011-12-30 18:51, Julian Reschke wrote:
>> ...
>> Indeed; see my tests at
>> <http://greenbytes.de/tech/tc/httpredirects/#l-fragments> (note that
>> Safari appears to have funny issues filling the iframes; but navigating
>> to the linked resource gets you proper results).
>> ...
>
> I just realized that the rule we would need to describe *almost* is the
> one define in the URI spec
> (<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.5.2>) as
> "relative resolution":

> "Almost", because it doesn't use Base.fragment when R.frament is undefined.
>
> a) Should we try describe the algorithm based on RFC 3986 ("do relative
> resolution as defined by ..., then, if the result doesn't have a
> fragment, add the one from the Base URI")?

I'm not at all sure that this description is correct. It would mean that 
I can have something like:
Request URI:   http://1.example.org/path1/file1.ext
Redirect URI:  http://2.example.org#frag2

and the result would be:
http://2.example.org/path1/file1.ext#frag2

As you can see in the result, there is a mixture of components from the 
request URI (1) and the redirect URI (2). The way that relative 
resolution works otherwise is that in the result, all components from 
(2) precede components from (1).

Below is the change to the algorithm that I'd think is correct. In 
logical terms, it's straightforward: Use the fragment from the "base" 
only if nothing before the fragment is coming from the "resource". 
However, in terms of actual code, there are quite a few places to 
change. This is because the if/else hierarchy gets deeper and deeper for 
the later parts of the URI. In the algorithm, scheme is set in two 
locations, authority in three, and so on. The structure of the code gets 
even more regular if you change
    if (R.path == "") then
to
    if (R.path != "") then
(which is equivalent to "if defined(R.path) then") and exchange the 
respective code blocks. The only irregularity in the structure then is the
    if (R.path starts-with "/") then
condition; this could be regularized by separating path (without the 
actual final name of the resource) and pure resource (file) name.

 >>>>
    -- The URI reference is parsed into the five URI components
    --
    (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);

    -- A non-strict parser may ignore a scheme in the reference
    -- if it is identical to the base URI's scheme.
    --
    if ((not strict) and (R.scheme == Base.scheme)) then
       undefine(R.scheme);
    endif;

    if defined(R.scheme) then
       T.scheme    = R.scheme;
       T.authority = R.authority;
       T.path      = remove_dot_segments(R.path);
       T.query     = R.query;
       T.fragment  = R.fragment;               -- this line added
    else
       if defined(R.authority) then
          T.authority = R.authority;
          T.path      = remove_dot_segments(R.path);
          T.query     = R.query;
          T.fragment  = R.fragment;            -- this line added
       else
          if (R.path == "") then
             T.path = Base.path;
             if defined(R.query) then
                T.query = R.query;
                T.fragment = R.fragment;       -- this line added
             else
                T.query = Base.query;
                if defined(R.fragment) then    -- this line added
                   T.fragment  = R.fragment;   -- this line added
                else                           -- this line added
                   T.fragment = Base.fragment; -- this line added
                endif;                         -- this line added
             endif;
          else
             if (R.path starts-with "/") then
                T.path = remove_dot_segments(R.path);
             else
                T.path = merge(Base.path, R.path);
                T.path = remove_dot_segments(T.path);
             endif;
             T.query = R.query;
             T.fragment = R.fragment;          -- this line added
          endif;
          T.authority = Base.authority;
       endif;
       T.scheme = Base.scheme;
    endif;

    -- T.fragment = R.fragment;                -- this line commented out
 >>>>

It's also possible to rewrite this as:

 >>>>
    -- The URI reference is parsed into the five URI components
    --
    (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);
    T.fragment = undefined;                    -- this line added

    -- A non-strict parser may ignore a scheme in the reference
    -- if it is identical to the base URI's scheme.
    --
    if ((not strict) and (R.scheme == Base.scheme)) then
       undefine(R.scheme);
    endif;

    if defined(R.scheme) then
       T.scheme    = R.scheme;
       T.authority = R.authority;
       T.path      = remove_dot_segments(R.path);
       T.query     = R.query;
    else
       if defined(R.authority) then
          T.authority = R.authority;
          T.path      = remove_dot_segments(R.path);
          T.query     = R.query;
       else
          if (R.path == "") then
             T.path = Base.path;
             if defined(R.query) then
                T.query = R.query;
             else
                T.query = Base.query;
                if not defined(R.fragment) then  -- this line added
                   T.fragment = Base.fragment;   -- this line added
                endif;                           -- this line added
             endif;
          else
             if (R.path starts-with "/") then
                T.path = remove_dot_segments(R.path);
             else
                T.path = merge(Base.path, R.path);
                T.path = remove_dot_segments(T.path);
             endif;
             T.query = R.query;
          endif;
          T.authority = Base.authority;
       endif;
       T.scheme = Base.scheme;
    endif;

    if not defined(T.fragment) then            -- this line added
       T.fragment = R.fragment;
    endif;                                     -- this line added
 >>>>

This localizes the changes better and can probably serve as the base (no 
pun intended) for spec text.


> b) Is this potentially an erratum for RFC 3986?

I would say NO. My understanding is that something like
    <a href="">a link</a>
always refers to the resource itself, not a subresource. If the erratum 
went through, there would be no short way to refer to a resource itself.

Regards,    Martin.