Re: [apps-discuss] [link-relations] Fwd: I-D Action: draft-ohye-canonical-link-relation-00.txt

Maile Ohye <maileko@gmail.com> Fri, 15 July 2011 07:29 UTC

Return-Path: <maileko@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 51ACB21F85E8; Fri, 15 Jul 2011 00:29:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.974
X-Spam-Level:
X-Spam-Status: No, score=-2.974 tagged_above=-999 required=5 tests=[AWL=0.024, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_39=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qaWmK03By8zO; Fri, 15 Jul 2011 00:29:50 -0700 (PDT)
Received: from mail-pv0-f172.google.com (mail-pv0-f172.google.com [74.125.83.172]) by ietfa.amsl.com (Postfix) with ESMTP id 6224421F85B9; Fri, 15 Jul 2011 00:29:50 -0700 (PDT)
Received: by pvh18 with SMTP id 18so1302499pvh.31 for <multiple recipients>; Fri, 15 Jul 2011 00:29:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=9BZN4b/ss2Z9WfeDLASDucidfoNypnIfWpvKYStLo30=; b=IQ+NZNuVzXyC7RuJr3tPjvZ93jzk8Yo6+UVDrJJknEPihtdyewijyoYf7OoF6cJ+fa Dvn0YCuv6DztNjBVp5ausITKVdwmn5FhFnJ1//7mt5VGTbDRz/2XM1SIsrKLbPLMZHcZ 2WDsOcUY270v7bEchWFx9ilPTX5tXpnagsbBk=
MIME-Version: 1.0
Received: by 10.68.66.130 with SMTP id f2mr3586040pbt.521.1310714989612; Fri, 15 Jul 2011 00:29:49 -0700 (PDT)
Received: by 10.68.47.70 with HTTP; Fri, 15 Jul 2011 00:29:49 -0700 (PDT)
In-Reply-To: <4E1818B9.8030804@gmx.de>
References: <4E083D3F.6030200@gmx.de> <4E0D3EA5.7010803@gmail.com> <4E0DCFEF.20206@gmx.de> <4E0DEA77.3050007@gmail.com> <4E0E0E76.2080007@gmail.com> <4E0E995A.7060800@gmail.com> <4E0F1058.3050201@gmail.com> <1309613470.2807.17.camel@mackerel> <4E0F1F2F.8020504@gmail.com> <CAGKau1GyaxpgZsZmUcqZp1iUG6wrvSG3LHM3Pq52AjXfZz900Q@mail.gmail.com> <4E10208C.6090209@gmx.de> <CAKACZovTrCEkFRvN94BW4NChko3_J=FzsAmc37jAJ6YnnjeOeg@mail.gmail.com> <4E1818B9.8030804@gmx.de>
Date: Fri, 15 Jul 2011 00:29:49 -0700
Message-ID: <CAGKau1HzJAtLwPxjSGJ8rmJy+pVNKOuOHtbR=Ox-93CeG-M_cA@mail.gmail.com>
From: Maile Ohye <maileko@gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Content-Type: multipart/alternative; boundary=bcaec544eee65b906c04a8169db0
X-Mailman-Approved-At: Fri, 15 Jul 2011 08:02:19 -0700
Cc: "link-relations@ietf.org" <link-relations@ietf.org>, Joachim Kupke <joachim@kupke.za.net>, IETF Apps Discuss <apps-discuss@ietf.org>, Bjartur Thorlacius <svartman95@gmail.com>
Subject: Re: [apps-discuss] [link-relations] Fwd: I-D Action: draft-ohye-canonical-link-relation-00.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jul 2011 07:29:55 -0000

Hi everyone, thanks again for the feedback!

Julian, MAY :) we submit a new draft with the changes discussed in this
thread?

Our comments to the open items are listed below (highlighted in yellow). All
comments for draft-00 are tracked in this doc:
https://docs.google.com/document/d/1SkGEFKILZKTD6r9D2Oz76LwxuXbbqtRnWB7y2NTaDt8/edit?hl=en_US

Thanks again,
Maile

2. OPEN. F. Ellermann:
The draft could s/SHOULD NOT/MUST NOT/, I don't see any good reason
to violate a SHOULD NOT, and if that's correct MUST NOT is clearer.
--seconded by M. Yevstifeyev
--response by M. Ohye and J. Kupke: “We prefer SHOULD NOT for a few reasons.
1) If we outlawed multiple canonicals using MUST NOT, we would effectively
call the HTML invalid. In reality, the HTML will still be processed, though
it’s likely that search engines will ignore both/all rel=canonicals. 2)
Worse, for the cases where somebody might rel=canonical to a 404, etc., if
we use MUST NOT, it would place a huge (and entirely unrealistic) burden on
the site owner to ensure that search engines recrawl pages in such an order
that all rel=canonical sources are updated before a page may become a 404.”
--response by J. Reschke “I'm not sure I understand the response. Is there a
use case where an author would legitimately add multiple instances of the
link relation?”
--response by J. Kupke “It is quite conceivable that an author might want to
designate
multiple instances of the relation with different attributes, e.g.:
<link rel="canonical" href="http://en.wikipedia.org/wiki/Randomness"
media="screen"/>
<link rel="canonical" href="http://en.m.wikipedia.org/wiki/Randomness"
media="handheld"/>
It is recommended not to do that because it would foist additional
complexity on implementations, and---without being presumptuous---it appears
that encoding attributes such as media into the URI itself can do more harm
than good.
This recommendation might become less valid over time.  For example, it has
become rather common to encode the language (or country) of a resource into
the URI when there would have been other means of their designation as well
(such as, the Accept-Language HTTP header).”
--response by F. Ellermann “If you are very sure that violations of the
SHOULD NOT are only *ignored* I'm fine with it.  But search engines could
also treat violations as some kind of "link farming" and punish authors for
their intended or unintended violations.

In that case authors would be better off with a MUST NOT clearly indicating
that it is their own fault if they get rel="canonical" wrong -- after all
they are not forced to use rel="canonical".

IOW, if you have "only" a SHOULD NOT for publishers you might need a
corresponding "MUST ignore violations" for web crawlers.”
--response by M. Ohye “In the new introduction, we’d like to state:
The canonical link relation specifies the preferred URI from a set of
identical or vastly similar content accessible on multiple URIs. This
designation MAY be used for future references to this resource, and clients
with link editing capabilities MAY automatically re-link references to the
context URI to the designated URI...

As we’re using “MAY” in the introduction, to later dictate search engine
behavior with “you MUST ignore violations,” seems like it would be very
strong language.
--response by F. Ellermann with regard to M. Ohye’s [Worse, for the cases
where somebody might rel=canonical to a 404, etc., if we use MUST NOT, it
would place a huge (and entirely unrealistic) burden on the site owner to
ensure that search engines recrawl pages in such an order that all
rel=canonical sources are updated before a page may become a 404.]

“Ugh.  A typical use case is a simple site with a mirror, in that case a 404
should be temporary:  New pages on the mirror(s) not yet available under
their canonical URL, or old pages deleted under their canonical URL still
found on the mirror(s).
You are right, a MUST NOT does not fly for this scenario.”
--response by M. Ohye “We’d like to leave this as SHOULD NOT.” Please let us
know if you have further arguments.

4. OPEN. M. Yevstifeyev:
> The canonical link relation specifies the preferred version of a URI
I think some introductory text on linking, probably based on RFC 5988,
should go here.
--response by J. Reschke "Why? It defines a link relation as defined by RFC
5988, so why repeat text from over there?"
--response by M. Yevstifeyev "It should be mentioned (1) what is link
relation at all and (2) that RFC 5988 is a specification of that technology
which this document depends on.  RFC 5988 is first mentioned in Examples."
--response by M. Ohye. “We could modify to:
“The canonical link relation (Link Relation Types reference <xref
target="RFC5988"/>) specifies the preferred version of a URI...”
--response by J. Reschke “+1”
--response by M. Yevstifeyev “I proposed to add something like:
RFC 5988 [RFC5988] specified the mechanism which is used to indicate
relationships between the links on the Internet.  This document defined a
new type of such relationships - canonical link relation.
in Section 1 as 1st para; other paragraphs retain in this case.”
--response by M. Ohye “We researched other RFCs and it seems best to start
the abstract with the main subject, such as for DC1:
‘The Dublin Core [DC1] is a small set of metadata elements for describing
information resources.’
or for HTML:
‘The Hypertext Markup Language (HTML) is a simple markup language used...”
so we’d prefer to start with a mention of the canonical link relation and
reference RFC5988 following the initial mention.
The canonical link relation, developed from <xref target="RFC5988"/> which
indicates relationships between Internet links, specifies the preferred URI
from a set of identical or vastly similar content accessible on multiple
URIs.

Then, similar to the first paragraph of Section 10.3.2 of RFC 2616:

This designation MAY be used for future references to this resource, and
clients with link editing capabilities MAY automatically re-link references
to the context URI to the designated URI."

5. CLOSED. M. Yevstifeyev:
> Presence of the canonical link relation indicates to applications, such as
search engines, that they MAY:
I wonder why it's MAY; in this case implementations (explicitly, those apps
which interpret Link: headers and corresponding construction in HTML) will
be free to ignore it.  I think normative SHOULD should be OK (sorry for
pun).
--response by J. Reschke "I think this link relation is purely advisory, so
a better approach might be to replace "MAY" by "can"."
--response by M. Yevstifeyev "Yes, advisory, which suits RFC 2119 definition
for SHOULD: 'SHOULD   This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and carefully
weighed before choosing a different course.'
and natural meaning of should - advice/recommendation."
--response by M. Ohye “Thanks, in discussion with Joachim Kupke.”
--response by J. Reschke “No, it's really not a SHOULD.”
--response by J. Kupke “Reading paragraph 5 of RFC 2119, I don't see
anything wrong with
‘MAY.’  How is ‘can’ better? It's indeed not a ‘SHOULD’ since a typical
implementation, say in a search engine, will want to go to some length to
find evidence for misguided usage of the relationship and in the presence of
such evidence decide to ignore it.  The point of ‘MAY’ is to make clear that
in the absence of such evidence, the responsibility not to advertise an
erroneous ‘canonical’ relationship lies with the author of the context URI.”
--response by J. Reschke “See RFC 2119, Section 6:

6. Guidance in the use of these Imperatives

 Imperatives of the type defined in this memo must be used with care

 and sparingly.  In particular, they MUST only be used where it is

 actually required for interoperation or to limit behavior which has

 potential for causing harm (e.g., limiting retransmisssions)  For

 example, they must not be used to try to impose a particular method

 on implementors where the method is not required for

 interoperability.

So my recommendation is NOT to use these kinds of keywords unless when
needed as described above (but I realize that many people disagree with
me...)
--response by M. Ohye “Left as ‘MAY’ in the draft, but further debate
welcomed.”

7. OPEN. M. Yevstifeyev:
  o  Exist on a different protocol: http to https, or vice versa
You probably meant URI scheme here, since https isn't a separate protocol.
 As before these points we had "The value of the target/canonical URI MAY"
or, if you consider my comment above, "The target/canonical URI MAY", this
point may be reworded as "Have different scheme names" (which suits the
second variant of a preface to this list better).
--agreed by J. Reschke
--response by M. Ohye/J. Kupke: “Good catch, Mykyta. We’re fine to change
the draft to “scheme”:
Have different scheme names: such as http to https, or vice versa
Do we now need to expand the draft for ftp:// and gopher:// URIs? For
example, ftp:// and gopher:// URIs”
1) Do not come with the equivalent of RFC 5988, so a non-HTML document
available at any such URI won't be available to make use of <link
rel="canonical">.
2) Have corresponding GOPHER error code (item type 3) or an FTP error 550,
which like HTTP 404, is forbidden from being served for the target of a
<link rel="canonical">.
--response by M. Yevstifeyev “I don't think we should make such changes.  If
we consider ftp and gopher URIs, we should also consider all other.  I
proposed to add generic statements which would be applicable to almost all
application protocols, not only HTTP, as in current version of the draft.”
--response by J. Reschke “a) A non-HTML document at a non-HTTP(s) URI may
not be able to specify the link relation, but it could be the target of a
link, relation, right? (that could be an example)
--response by J. Kupke “Correct.”
b) Future protocols might have other means to specify a link relation, btw.”

--response by J. Kupke “Is there terminology that reasonably unifies errors
across protocols (HTTP response code 4xx, GOPHER item type 3, FTP response
code 5xx)?”

--response by J. Reschke “I don't think so...”
--response by M. Ohye “Changed the draft to:
The target/canonical URI MAY:
* Have different scheme names: such as HTTP to HTTPS, or gopher to FTP

8. OPEN. M. Yevstifeyev:
Reading section 3 and 5 of the draft, it seems that is mandates use of HTTP
when referring to canonical URIs.  And what is the situation when target URI
is a 'ftp' or 'gopher' URI?  Section 3 allows different scheme names in
context/target URIs, if I understand it correctly.  Therefore, unless it is
deliberately, I think any mention of HTTP should be replaced by more generic
regulations.
--response by J. Reschke "Nope; I think the HTTP examples are very useful.
But maybe we can have an additional statement that the link relation isn't
specific to HTTP."
--response by M. Yevstifeyev "Currently we have normative reference to RFC
2616 and normative requirements with respect to HTTP.  HTTP examples are OK;
but it's redundant in Section 3.  I suppose in Section 3 we may replace
HTTP-related stuff with something in the way like:

Old:
  o  The source URI of a "300 Multiple Choices" URI (Section 10.3.1 of
     [RFC2616]) or a permanent redirect (Section 10.3.2 of [RFC2616]).
New:
  o  The source URI, which defines a resource which provides choice
     in different represntations of a given resource, ientified by
     the context URI, or is a link which has been permanently replaced
     by an other one.
etc."
--response by B. Thorlacius: “Your wording seems overly confusing. Which is
the resource that "provides choice in different represntations of a given
resource?" A standard could be assigned the URI <http://example.org/spec>.
An HTTP GET /spec might be responded with an HTTP/1.1 300 choice, and an
entity linking to /spec.node.html, /spec.html, /spec.pdf, and /spec.txt. The
resource (the standard, that is) would in no way provide this choice. The
HTTP server simply offered multiple representations.”
--response by M. Yevstifeyev: “First, this was an example only.  Next, my
point was that the document makes HTTP/'http' scheme mandatory in
context/target URIs, which I don't think is appropriate, since canonical URI
may refer to a resource accessible via other protocol.  Even though HTTP is
going to be the most often use case of canonical link relation, we shouldn't
exclude other protocols.”
--response by B. Thorlacius: “I agree. However, I don't understand the need
for forbidding canonical links to resources with multiple representations.
Are there not to be canonical links from representations of a resource to
the resource (i.e. from /spec.html and /spec.txt to /spec)?”
--response by M. Yevstifeyev: “Probably such restriction is set because
multiple representation choice may ultimately refer the user to a resource
which is not canonical.  A _definite_ canonical resource is necessary and
required.”

----response by  J. Cormack: “I think this relation (which is useful) need
to be called something

else, as it is performing a different function to canonical, which is

about relations between representations of resources, rather than

between representations of resources and a the resource itself

like /spec.

There does seem to not be a discussion of what is similar though in

terms of media types - is /spec.txt similar enough that /spec.html could

be a canonical link? One could certainly have a PNG version of an SVG

image with a canonical link I would presume.”

------response by  M. Yevstifeyev: “I personally think it is possible.  For
example, authoritative RFC source is .txt file on RFC Editor's site, while
we have a number of other sources for RFCs, not in .txt, such as "
tools.ietf.org/html/rfcXXXX" in HTML.  Designating a "canonical" link to "
http://www.rfc-editor.org/rfc/rfcXXXX.txt" seems to be OK.  So I think we
have no problem here.”
--response by M. Ohye: “We can change the draft to include corresponding
GOPHER or FTP error codes to demonstrate non-favortism to HTTP.”
--response by M. Yevstifeyev “I don't think it would be useful (see above).
 It's better, as I've already mentioned in my response to (7), to provide
rules which would be fine for almost any application protocol and which
would be equivalent to those provided for HTTP now.  For example:

OLD: “Be the source URI of a 302, 303, or 307 redirect (Sections 10.3.3,
10.3.4, and 10.3.8, respectively, of [RFC2616])."
NEW: "Provide a redirect to the other resource because it is temporarily
unavailable."

This is what HTTP 302 and 307 responses stand for.  303 response is used in
HTTP only and here is no equivalent to it in other protocols.”
--response by J. Reschke “...my general feeling here is that having specific
HTTP examples is good, as long as the spec doesn't make the reader think
it's the only protocol that qualifies.”
--response by M. Ohye “Given Julian’s comments in #15 below, and M.
Yevstifeyev’s original concern of being too HTTP specific, this now reads:
Be the source URI of a temporary redirect. For HTTP, this refers to status
codes 302, 303, or 307 (Sections 10.3.3, 10.3.4, and 10.3.8, respectively,
of <xref target="RFC2616"/>).

10. OPEN. J. Reschke:
This specification defines the canonical link relation -- an element which
designates the preferred version of content/URI from a set of duplicate or
near duplicate pages.
Maybe put "canonical" in double quotes? (similar in other places)
--response by M. Ohye “How adamant are you about double quoting “canonical”
in “canonical link relation” throughout the draft? That’s a lot of double
quotes (18 of them :). I think these double quotes would be more distracting
than helpful, but if that’s how other RFCs are written, we’ll certainly
reconsider.”

11. CLOSED. J. Reschke:
The canonical link relation specifies the preferred version of a URI from a
set of identical or vastly similar content that may be
maybe just "preferred URI"?
--response by M. Ohye “Changed. Here’s the new abstract:
The canonical link relation, developed from <xref target="RFC5988"/> which
indicates relationships between Internet links, specifies the preferred URI
from a set of identical or vastly similar content accessible on multiple
URIs.

12. CLOSED. J. Reschke:
The canonical (target URI) MUST contain content that duplicates
"canonical (target) URI MUST identify content" (it's not the content of the
URI, it's the content of the identified resource)
--response by M. Ohye “Updated in draft.”

13. CLOSED. J. Reschke
extremely similar, or is a superset of the content in the context
s/in/at/:
--response by M. Ohye “Updated in draft.”

14. OPEN. J. Reschke:
The value of the target/canonical URI MAY:

 o  Be self-referential (context URI identical to target URI)

 o  Specify a relative or absolute URI

"Specify a URI Reference (see [RFC3986], Section 4.1), i.e., a full URI or a
relative reference”
--response by M. Ohye “How about the edit below?
Specify a URI Reference (see [RFC3986], Section 4.1), i.e., an absolute URI
or a relative reference

15. OPEN. J. Reschke:
o  Be the source URI of a 302, 303, or 307 redirect (Sections 10.3.3,
    10.3.4, and 10.3.8, respectively, of [RFC2616]).

Maybe "Be the source URI of a temporary redirect, such as..."?
--response by M. Ohye “Changed to
Be the source URI of a temporary redirect. Using HTTP, this refers to status
codes 302, 303, or 307 (Sections 10.3.3, 10.3.4, and 10.3.8, respectively,
of <xref target="RFC2616"/>).
Julian, please let me know if you’d like this phased differently.

16. OPEN. J. Reschke:
may designate the canonical link relation in HTML as specified in [RFC5988]
Why do we cite RFC 5988 here? Should this ref HTML?
--response by M. Ohye “Changed to:
may designate the canonical link relation in HTML as specified in <xref
target="RFC1866"/>
I created this reference, but I couldn’t find an organization or email
address for Dan Connolly.
<reference anchor="RFC1866">
 <front>
   <title>Hypertext Markup Language - 2.0</title>
   <author initials="T." surname="Berners-Lee" fullname="T. Berners-Lee">
     <organization>MIT/W3C</organization>
     <address><email>timbl@w3.org</email></address>
   </author>
   <author initials="D." surname="Connolly" fullname="Dan Connolly">
   </author>
   <date month="November" year="1995"/>
 </front>
 <seriesInfo name="RFC" value="1866"/>
</reference>
Julian, should I have done anything differently?

17. CLOSED. J. Reschke:
<link rel="canonical" href="http://www.example.com/page.php?item=purse" />
 or alternatively, in the HTTP Header as specified in Section 5 of
 [RFC5988]:
s/Header/header field/
--response by M. Ohye “Updated in the draft.”

18. CLOSED. J. Reschke:
Recommendations
Before implementing the canonical link relation, verification of the

Maybe s/implementing/adding/?
--response by M. Ohye “Updated in the draft.”


On Sat, Jul 9, 2011 at 2:00 AM, Julian Reschke <julian.reschke@gmx.de>
wrote:
> On 2011-07-09 04:44, Joachim Kupke wrote:
>>
>> ...
>>>>
>>>> 5. OPEN. M. Yevstifeyev:
>>>>>
>>>>>  Presence of the canonical link relation indicates to applications,
>>>>
>>>> such as search engines, that they MAY:
>>>> I wonder why it's MAY; in this case implementations (explicitly, those
>>>> apps which interpret Link: headers and corresponding construction in
>>>> HTML) will be free to ignore it.  I think normative SHOULD should be OK
>>>> (sorry for pun).
>>>> --response by J. Reschke "I think this link relation is purely
advisory,
>>>> so a better approach might be to replace "MAY" by "can"."
>>
>> Reading paragraph 5 of RFC 2119, I don't see anything wrong with
>> "MAY."  How is "can" better?
>> ...
>
> See RFC 2119, Section 6:
>
>> 6. Guidance in the use of these Imperatives
>>
>>   Imperatives of the type defined in this memo must be used with care
>>   and sparingly.  In particular, they MUST only be used where it is
>>   actually required for interoperation or to limit behavior which has
>>   potential for causing harm (e.g., limiting retransmisssions)  For
>>   example, they must not be used to try to impose a particular method
>>   on implementors where the method is not required for
>>   interoperability.
>
> So my recommendation is NOT to use these kinds of keywords unless when
> needed as described above (but I realize that many people disagree with
> me...)
>
>> ...
>>>
>>> b) Future protocols might have other means to specify a link relation,
>>> btw.
>>
>> Is there terminology that reasonably unifies errors across protocols
>> (HTTP response code 4xx, GOPHER item type 3, FTP response code 5xx)?
>> ...
>
> I don't think so...
>
> Best regards, Julian
>