Re: rel="shortlink" proposal for advertising short URLs in HTML/HTTP

Keith Moore <moore@cs.utk.edu> Tue, 14 April 2009 23:51 UTC

Return-Path: <moore@cs.utk.edu>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D017F3A6E91 for <apps-discuss@core3.amsl.com>; Tue, 14 Apr 2009 16:51:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.201
X-Spam-Level: *
X-Spam-Status: No, score=1.201 tagged_above=-999 required=5 tests=[BAYES_50=0.001, J_CHICKENPOX_32=0.6, J_CHICKENPOX_39=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q9u+6sUgw103 for <apps-discuss@core3.amsl.com>; Tue, 14 Apr 2009 16:51:55 -0700 (PDT)
Received: from m1.imap-partners.net (m1.imap-partners.net [64.13.152.131]) by core3.amsl.com (Postfix) with ESMTP id F38353A6E78 for <apps-discuss@ietf.org>; Tue, 14 Apr 2009 16:51:53 -0700 (PDT)
Received: from lust.indecency.org (host65-17-26-18.birch.net [65.17.26.18]) by m1.imap-partners.net (MOS 3.10.5-GA) with ESMTP id BNH91923 (AUTH admin@network-heretics.com); Tue, 14 Apr 2009 16:53:02 -0700 (PDT)
Message-ID: <49E521DB.8080403@cs.utk.edu>
Date: Tue, 14 Apr 2009 19:52:59 -0400
From: Keith Moore <moore@cs.utk.edu>
User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302)
MIME-Version: 1.0
To: Sam Johnston <samj@samj.net>
Subject: Re: rel="shortlink" proposal for advertising short URLs in HTML/HTTP
References: <21606dcf0904141153t3433975fh2bacf75f37353beb@mail.gmail.com>
In-Reply-To: <21606dcf0904141153t3433975fh2bacf75f37353beb@mail.gmail.com>
X-Enigmail-Version: 0.95.7
OpenPGP: id=E1473978
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: apps-discuss@ietf.org
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Apr 2009 23:51:56 -0000

I think it's interesting that PURLs started out with the notion that
URLs via a 3rd party referrer could be more persistent than URLs from
the original domain, and now we're proposing that an original domain's
URLs might be more persistent than those through a 3rd party service. 
Both are correct, of course, depending on circumstances.

In general I think that the fewer human meaningful components to an
identifier, the more persistent you can make the binding between that
identifier and whatever it originally referred to.  Having the
components of the identifier be meaningless reduces the temptation to
change the binding associated with an identifier (from that originally
established) to something that is more "current".  (There are lots of
versions of this.  If you use file and directory names in URLs, at some
point you inevitably feel the need to reorganize the directory tree -
which tends to invalidate URLs based on the old tree.)

What's not immediately clear to me is why you'd use anything other than
the stable identifier in any links within documents.  I can understand
wanting to keep a set of "human friendly" identifiers that are easy for
users to remember, that might change over time, but I have a more
difficult time understanding why you'd want to use those in links.

So anyway, I think that having a "stable" or "persistent" link in the
HTTP and/or document header is a good idea.  (Note: putting this in the
HTTP header is more general: don't restrict this to just HTML!  And
putting it in the document header is somewhat risky as the link might
need to change (or not) when the document is updated - existing tools
certainly won't do the right thing in all cases.) 

Ideally, software would recognize these headers and use the stable URL
in preference to other URLs when appropriate. So that when you create a
bookmark to a web page, the bookmark points to the stable link by
default. Or when you are editing a document and you create a link to
another document, you should get the stable link by default.  I think it
should be possible to override the defaults, but it ought to be clear to
the user/editor that he's not using the stable URL.

I don't think this mechanism can work well without some sort of content
management system (it can be a very primitive one) that automatically
creates unique, persistent URLs for things.  So it's important for web
servers to _not_ generate these headers by default, but rather, to do so
only when explicitly configured (presumably by a content management system).

Of course, this is trying to solve the same problems for which URNs were
invented, and I still think that URNs are a good approach.  I'd like to
see any mechanism for advertising a persistent identifier be compatible
with URNs.  But it doesn't bother me that people are still interested in
using URLs for this purpose.

Keith

Sam Johnston wrote:
> Evening all,
>
> I think it's time to call the grown-ups into a discussion started by a
> handful of web developers trying to kill off URL shorteners and
> associated linkrot, opaque URLs, etc. Perhaps it's just accelerating
> change but the decidedly ordinary rev="canonical" proposal has already
> taken a life of its own, springing up a blog[1], a "30 minutes or
> less" PoC[2], a busy twitter hashtag, even a slashdot article[3]
> (submitted by the "inventor" himself no less) - but more worryingly a
> handful of high profile implementations at sites like Ars Technica
> (even if there's no clients yet that will read them).
>
> The concept is simple: rather than forcing users to rely on third
> parties like bit.ly, tinyurl.com etc. for short links the publishers
> themselves can suggest link(s) in the HTTP headers and/or HTML code.
> The resulting URL (e.g. http://example.com/promo) will generally live
> as long as the content does and won't vanish when the redirector
> disappears (as some invariably will - there's at least a 3 figure
> count of them now and probably a dozen new ones every day). It's also
> a good deal more useful to users as it can show the source (domain)
> and subject (path), and when the link is exposed via HTTP HEAD the
> performance is at least as good as third party shorteners.
>
> A bunch of alternatives to rev="canonical" have been proposed
> including rel="short|shorter|shortcut|short_url|short_url|alternate|self|...",
> but for various reasons[4] I don't think any of them are suitable. I
> think rel="shortlink" would work nicely and is impossible to confuse
> with anything else (I got here via rel="short" and rel="shortcut").
> I've roughed up a specification[5] so you can see the technical
> details (copied below).
>
> I'm not necessarily all that fussed about this - the concept looks
> good but I was basically just driving by on the weekend and saw an
> accident about to happen (e.g. people confusing rel and rev and
> knocking sites like Ars out of the search engines). Seems it caught
> mnot's eye too[6] and I should point out that I had every intention[7]
> of bringing this to your attention after having extracted consensus in
> the group[8]. I think link relations come relatively cheap and unless
> there's something I've missed helping to put another nail in the
> coffin of the URL shorteners is arguably a good thing.
>
> Thoughts?
>
> Sam
>
> 1. http://revcanonical.wordpress.com/
> 2. http://revcanonical.appspot.com/
> 3. http://developers.slashdot.org/article.pl?sid=09/04/12/1834205
> 4. http://code.google.com/p/shortlink/wiki/Alternatives
> 5. http://code.google.com/p/shortlink/wiki/Specification
> 6. http://www.mnot.net/blog/2009/04/14/rev_canonical_bad
> 7. http://samj.net/2009/04/introducing-relshort-better-alternative.html
> 8. http://groups.google.com/group/shortlink
>
> shortlink Specification
>
> Technical specification for the HTML/HTTP "shortlink" relation
>
> Introduction
>
> The shortlink relation allows webmasters to specify a short link to
> use for the resource, thereby avoiding having to obtain one from a
> potentially unreliable third party URL shortening service such as
> tinyurl.com.
>
> Such links are useful for space-constrained applications (e.g.
> microblogging including Twitter and mobile Internet) as well as any
> time URLs need to be manually entered (e.g. when they are printed or
> spoken).
>
> Note: Until such time as shortlink is officially standardised
> http://purl.org/net/shortlink should be used for standards compliance.
>
> Details
>
> The shortlink appears in two places:
>
> within the HEAD section of the HTML document:
>
> <link rel="shortlink" href="http://example.com/promo">
>
> in the Link: HTTP header:
>
> Link: <http://example.com/promo>; rel=shortlink
>
> Implementation
>
> Servers
>
> Servers should implement both HTML and HTTP links for efficiency and
> performance reasons.
>
> The shortlink should default to an automatically generated stable URI
> based on an existing unique identifier (e.g. http://example.com/123).
> Such identifiers may be compressed using base32 or similar (e.g.
> http://example.com/3r). URIs should be case-insensitive and avoid
> symbols that look or sound similar (e.g. 1 vs l), particularly when
> manual entry will be required (e.g. printed, spoken).
>
> Publishers should also be given the option to specify a human-friendly
> slug (e.g. http://example.com/promo), as users should be able to
> derive information about the resource (path) and its source (domain)
> from the URL.
>
> Where a shortlink is changed the previous URL should not be broken as
> it may have been stored by users. Typically this requires maintaining
> a register of mappings.
>
> Clients
>
> Clients that have already retrieved the document (e.g. web browsers,
> news readers) should parse it to discover the link rel="shortlink"
> element(s) and extract the href attribute from each.
>
> Clients that have the URL but not the document (e.g. microblogging
> software) should conduct a HTTP HEAD request and extract any Link:
> headers from the response. Clients should not retrieve and parse the
> document unless the user specifically requests it.
>
> In the event that there are multiple shortlinks then the client may
> choose one itself or offer the user the choice (e.g. in a drop-down
> list). If the client chooses one it may do so randomly, by order
> (first vs last) or by some quality of the URL (length, readability,
> etc.).
> _______________________________________________
> Apps-Discuss mailing list
> Apps-Discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/apps-discuss