Re: [apps-discuss] [link-relations] Fwd: I-D Action: draft-ohye-canonical-link-relation-00.txt

Julian Reschke <julian.reschke@gmx.de> Sun, 03 July 2011 07:56 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9CB4621F86F1 for <apps-discuss@ietfa.amsl.com>; Sun, 3 Jul 2011 00:56:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.299
X-Spam-Level:
X-Spam-Status: No, score=-104.299 tagged_above=-999 required=5 tests=[AWL=-2.300, BAYES_00=-2.599, J_CHICKENPOX_39=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eYdFbLtTYLTa for <apps-discuss@ietfa.amsl.com>; Sun, 3 Jul 2011 00:56:12 -0700 (PDT)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by ietfa.amsl.com (Postfix) with SMTP id 6BB1721F86EE for <apps-discuss@ietf.org>; Sun, 3 Jul 2011 00:56:12 -0700 (PDT)
Received: (qmail invoked by alias); 03 Jul 2011 07:56:07 -0000
Received: from p508FBCDC.dip.t-dialin.net (EHLO [192.168.178.36]) [80.143.188.220] by mail.gmx.net (mp056) with SMTP; 03 Jul 2011 09:56:07 +0200
X-Authenticated: #1915285
X-Provags-ID: V01U2FsdGVkX18VimYpohYVGr3N4AfhWVBKp77MfZ6W0nMzTY/G2j 0Fy2I1UYwUPICt
Message-ID: <4E10208C.6090209@gmx.de>
Date: Sun, 03 Jul 2011 09:55:56 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Maile Ohye <maileko@gmail.com>
References: <4E083D3F.6030200@gmx.de> <4E0D3EA5.7010803@gmail.com> <4E0DCFEF.20206@gmx.de> <4E0DEA77.3050007@gmail.com> <4E0E0E76.2080007@gmail.com> <4E0E995A.7060800@gmail.com> <4E0F1058.3050201@gmail.com> <1309613470.2807.17.camel@mackerel> <4E0F1F2F.8020504@gmail.com> <CAGKau1GyaxpgZsZmUcqZp1iUG6wrvSG3LHM3Pq52AjXfZz900Q@mail.gmail.com>
In-Reply-To: <CAGKau1GyaxpgZsZmUcqZp1iUG6wrvSG3LHM3Pq52AjXfZz900Q@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
X-Y-GMX-Trusted: 0
Cc: "link-relations@ietf.org" <link-relations@ietf.org>, joachim@kupke.za.net, IETF Apps Discuss <apps-discuss@ietf.org>, Bjartur Thorlacius <svartman95@gmail.com>
Subject: Re: [apps-discuss] [link-relations] Fwd: I-D Action: draft-ohye-canonical-link-relation-00.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2011 07:56:13 -0000

On 2011-07-03 03:23, Maile Ohye wrote:
> 1. OPEN. F. Ellermann:
> A relative canonical URL can't be a good idea.  If there is more thanone
> "content URL" (in the terminology of the draft) this would resultin more
> than one canonical URL, defeat the purpose, and worse, thiscould make
> googlebot angry.
> --response by F. Ellermann: “... But now I see that relativecan be
> perfectly fine if and only if all incarnations exist on the sameserver,
> e.g., http://example/xyzzy.html?any-querycould get a relativecanonical
> URL xyzzy.html?default or similar.  A better example in thedraft
> explaining when a relative canonical URL is okay could help.”
> --response by M. Ohye: “We could add a relative URL in the Examples section:
> Then duplicate content URIs such as:
> http://www.example.com/page.php?item=purse&category=bags
> <http://www.example.com/page.php?item=purse&category=bags>
> http://www.example.com/page.php?item=purse&category=bags&sid=1234
> <http://www.example.com/page.php?item=purse&category=bags&sid=1234>
> may designate the canonical link relation in HTML as specified in
>    [RFC5988]:
> <link rel="canonical"
>            href="http://www.example.com/page.php?item=purse" />
> or <link rel="canonical” href="page.php?item=purse" />
> or alternatively, in the HTTP Header... “

+1

> 2. OPEN. F. Ellermann:
> The draft could s/SHOULD NOT/MUST NOT/, I don't see any good reasonto
> violate a SHOULD NOT, and if that's correct MUST NOT is clearer.
> --seconded by M. Yevstifeyev
> --response by M. Ohye and J. Kupke: “We prefer SHOULD NOT for a few
> reasons. 1) If we outlawed multiple canonicals using MUST NOT, we would
> effectively call the HTML invalid. In reality, the HTML will still be
> processed, though it’s likely that search engines will ignore both/all
> rel=canonicals. 2) Worse, for the cases where somebody might
> rel=canonical to a 404, etc., if we use MUST NOT, it would place a huge
> (and entirely unrealistic) burden on the site owner to ensure that
> search engines recrawl pages in such an order that all rel=canonical
> sources are updated before a page may become a 404.”

I'm not sure I understand the response. Is there a use case where an 
author would legitimately add multiple instances of the link relation?

 > ...
> 4. OPEN. M. Yevstifeyev:
>>  The canonical link relation specifies the preferred version of a URI
> I think some introductory text on linking, probably based on RFC 5988,
> should go here.
> --response by J. Reschke "Why? It defines a link relation as defined by
> RFC 5988, so why repeat text from over there?"
> --response by M. Yevstifeyev "It should be mentioned (1) what is link
> relation at all and (2) that RFC 5988 is a specification of that
> technology which this document depends on.  RFC 5988 is first mentioned
> in Examples."
> --response by M. Ohye. “We could modify to:
> “The canonical link relation (Link Relation Types reference <xref
> target="RFC5988"/>) specifies the preferred version of a URI...”

+1

> 5. OPEN. M. Yevstifeyev:
>>  Presence of the canonical link relation indicates to applications,
> such as search engines, that they MAY:
> I wonder why it's MAY; in this case implementations (explicitly, those
> apps which interpret Link: headers and corresponding construction in
> HTML) will be free to ignore it.  I think normative SHOULD should be OK
> (sorry for pun).
> --response by J. Reschke "I think this link relation is purely advisory,
> so a better approach might be to replace "MAY" by "can"."
> --response by M. Yevstifeyev "Yes, advisory, which suits RFC 2119
> definition for SHOULD: 'SHOULD   This word, or the adjective
> "RECOMMENDED", mean that there may exist valid reasons in particular
> circumstances to ignore a particular item, but the full implications
> must be understood and carefully weighed before choosing a different
> course.'
> and natural meaning of should - advice/recommendation."
> --response by M. Ohye: “Thanks, in discussion with Joachim Kupke.”

No, it's really not a SHOULD.

> ...
> 7. OPEN. M. Yevstifeyev:
>    o Exist on a different protocol: http to https, or vice versa
> You probably meant URI scheme here, since https isn't a separate
> protocol.  As before these points we had "The value of the
> target/canonical URI MAY" or, if you consider my comment above, "The
> target/canonical URI MAY", this point may be reworded as "Have different
> scheme names" (which suits the second variant of a preface to this list
> better).
> --agreed by J. Reschke
> --response by M. Ohye/J. Kupke: “Good catch, Mykyta. We’re fine to
> change the draft to “scheme”:
> Have different scheme names: such as http to https, or vice versa
> Do we now need to expand the draft for ftp:// and gopher:// URIs? For
> example, ftp:// and gopher:// URIs”
> 1) Do not come with the equivalent of RFC 5988, so a non-HTML document
> available at any such URI won't be available to make use of <link
> rel="canonical">.
> 2) Have corresponding GOPHER error code (item type 3) or an FTP error
> 550, which like HTTP 404, is forbidden from being served for the target
> of a <link rel="canonical">.

a) A non-HTML document at a non-HTTP(s) URI may not be able to specify 
the link relation, but it could be the target of a link, relation, 
right? (that could be an example)

b) Future protocols might have other means to specify a link relation, btw.

> 8. OPEN. M. Yevstifeyev:
> Reading section 3 and 5 of the draft, it seems that is mandates use of
> HTTP when referring to canonical URIs.  And what is the situation when
> target URI is a 'ftp' or 'gopher' URI?  Section 3 allows different
> scheme names in context/target URIs, if I understand it correctly.
>   Therefore, unless it is deliberately, I think any mention of HTTP
> should be replaced by more generic regulations.
> --response by J. Reschke "Nope; I think the HTTP examples are very
> useful. But maybe we can have an additional statement that the link
> relation isn't specific to HTTP."
> --response by M. Yevstifeyev"Currently we have normative reference to
> RFC 2616 and normative requirements with respect to HTTP.  HTTP examples
> are OK; but it's redundant in Section 3.  I suppose in Section 3 we may
> replace HTTP-related stuff with something in the way like:
>
> Old:
>    o  The source URI of a "300 Multiple Choices" URI (Section 10.3.1 of
>       [RFC2616]) or a permanent redirect (Section 10.3.2 of [RFC2616]).
> New:
>    o  The source URI, which defines a resource which provides choice
>       in different represntations of a given resource, ientified by
>       the context URI, or is a link which has been permanently replaced
>       by an other one.
> etc."
> --response by B. Thorlacius: “Your wording seems overly confusing. Which
> is the resource that "provides choice in different represntations of a
> given resource?" A standard could be assigned the URI
> <http://example.org/spec>. An HTTP GET /spec might be responded with an
> HTTP/1.1 300 choice, and an entity linking to /spec.node.html,
> /spec.html, /spec.pdf, and /spec.txt. The resource (the standard, that
> is) would in no way provide this choice. The HTTP server simply offered
> multiple representations.”
> --response by M. Yevstifeyev: “First, this was an example only.  Next,
> my point was that the document makes HTTP/'http' scheme mandatory in
> context/target URIs, which I don't think is appropriate, since canonical
> URI may refer to a resource accessible via other protocol.  Even though
> HTTP is going to be the most often use case of canonical link relation,
> we shouldn't exclude other protocols.”
> --response by B. Thorlacius: “I agree. However, I don't understand the
> need for forbidding canonical links to resources with multiple
> representations. Are there not to be canonical links from
> representations of a resource to the resource (i.e. from /spec.html and
> /spec.txt to /spec)?”
> --response by M. Yevstifeyev: “Probably such restriction is set because
> multiple representation choice may ultimately refer the user to a
> resource which is not canonical.  A _definite_ canonical resource is
> necessary and required.”
> ...

...my general feeling here is that having specific HTTP examples is 
good, as long as the spec doesn't make the reader think it's the only 
protocol that qualifies.

Best regards, Julian