Re: rel="shortlink" proposal for advertising short URLs in HTML/HTTP

Sam Johnston <samj@samj.net> Wed, 15 April 2009 12:07 UTC

Return-Path: <samj@samj.net>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 102C328C129 for <apps-discuss@core3.amsl.com>; Wed, 15 Apr 2009 05:07:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.174
X-Spam-Level: *
X-Spam-Status: No, score=1.174 tagged_above=-999 required=5 tests=[AWL=-1.250, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_32=0.6, J_CHICKENPOX_39=0.6, J_CHICKENPOX_42=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VGrEOE0YHAKi for <apps-discuss@core3.amsl.com>; Wed, 15 Apr 2009 05:07:01 -0700 (PDT)
Received: from mail-qy0-f134.google.com (mail-qy0-f134.google.com [209.85.221.134]) by core3.amsl.com (Postfix) with ESMTP id B65843A6BC5 for <apps-discuss@ietf.org>; Wed, 15 Apr 2009 05:07:00 -0700 (PDT)
Received: by qyk40 with SMTP id 40so1915536qyk.29 for <apps-discuss@ietf.org>; Wed, 15 Apr 2009 05:08:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.99.149 with SMTP id u21mr19538vcn.94.1239797292337; Wed, 15 Apr 2009 05:08:12 -0700 (PDT)
In-Reply-To: <49E521DB.8080403@cs.utk.edu>
References: <21606dcf0904141153t3433975fh2bacf75f37353beb@mail.gmail.com> <49E521DB.8080403@cs.utk.edu>
Date: Wed, 15 Apr 2009 14:08:12 +0200
Message-ID: <21606dcf0904150508k210991b6gf8001c262d305a3f@mail.gmail.com>
Subject: Re: rel="shortlink" proposal for advertising short URLs in HTML/HTTP
From: Sam Johnston <samj@samj.net>
To: Keith Moore <moore@cs.utk.edu>
Content-Type: multipart/alternative; boundary="0016e64650a0340c1d046796cd11"
Cc: apps-discuss@ietf.org
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Apr 2009 12:07:03 -0000

Keith,

Oh my, it seems we have yet another type of link relation to cater for:
immutable unique identifiers ala atom:id :)

On Wed, Apr 15, 2009 at 1:52 AM, Keith Moore <moore@cs.utk.edu> wrote:

> I think it's interesting that PURLs started out with the notion that
> URLs via a 3rd party referrer could be more persistent than URLs from
> the original domain, and now we're proposing that an original domain's
> URLs might be more persistent than those through a 3rd party service.
> Both are correct, of course, depending on circumstances.
>

Persistency is not the key requirement here - shortness is (and for some
applications, human-friendliness). Canonical URLs (which are generally long
and stuffed with SEO juice) are very much subject to change. A human
friendly shortlink like http://nike.com/just-do-it is less likely to change
while shortlinks based on immutable identifiers like
http://nike.com/123should never change (though the resource they point
at may).

In general I think that the fewer human meaningful components to an
> identifier, the more persistent you can make the binding between that
> identifier and whatever it originally referred to.  Having the
> components of the identifier be meaningless reduces the temptation to
> change the binding associated with an identifier (from that originally
> established) to something that is more "current".  (There are lots of
> versions of this.  If you use file and directory names in URLs, at some
> point you inevitably feel the need to reorganize the directory tree -
> which tends to invalidate URLs based on the old tree.)
>

Agreed, the more information in the URL the less persistent it will be. URLs
committed to dead tree versions (e.g. references in an academic paper) are
by very definition immutable so these should be able to be updated to point
at the most recent/best version of the intended resource.

What's not immediately clear to me is why you'd use anything other than
> the stable identifier in any links within documents.  I can understand
> wanting to keep a set of "human friendly" identifiers that are easy for
> users to remember, that might change over time, but I have a more
> difficult time understanding why you'd want to use those in links.
>

Say I'm microblogging or texting about some campaign, for example
Greenpeace's Save the Whales. Both me and my users are going to much prefer
seeing http://greenpeace.com/whales than
http://tinyurl.com/aZq4b<http://example.com/>- and I'm far more likely
to remember the former than the latter (even if
longer).

So anyway, I think that having a "stable" or "persistent" link in the
> HTTP and/or document header is a good idea.  (Note: putting this in the
> HTTP header is more general: don't restrict this to just HTML!  And
> putting it in the document header is somewhat risky as the link might
> need to change (or not) when the document is updated - existing tools
> certainly won't do the right thing in all cases.)
>

Agreed, except that I'm not sure the unique ID need be resolveable to the
resource - just that it allow differentiating between resources. I actually
think specifying rel=canonical (that's rel, not rev) solves many of the real
world problems we see today. If people want to break their own links then
that's their problem.


> Ideally, software would recognize these headers and use the stable URL
> in preference to other URLs when appropriate. So that when you create a
> bookmark to a web page, the bookmark points to the stable link by
> default. Or when you are editing a document and you create a link to
> another document, you should get the stable link by default.  I think it
> should be possible to override the defaults, but it ought to be clear to
> the user/editor that he's not using the stable URL.
>

Again this sounds like a job for the canonical URL - I'd much rather see
search engines and bookmarks having URLs like
http://greenpeace.com/blog/2009/04/greenpeace-helps-save-the-whales.htmlthan
http://greenpeace.com/123.


> I don't think this mechanism can work well without some sort of content
> management system (it can be a very primitive one) that automatically
> creates unique, persistent URLs for things.  So it's important for web
> servers to _not_ generate these headers by default, but rather, to do so
> only when explicitly configured (presumably by a content management
> system).
>

Sure, for persistent IDs.


> Of course, this is trying to solve the same problems for which URNs were
> invented, and I still think that URNs are a good approach.  I'd like to
> see any mechanism for advertising a persistent identifier be compatible
> with URNs.  But it doesn't bother me that people are still interested in
> using URLs for this purpose.
>

I personally prefer UUIDv4 (random) based URNs for this, but I think it's a
completely separate need from that of shortlinks. Does that make sense?

Sam

Sam Johnston wrote:
> > Evening all,
> >
> > I think it's time to call the grown-ups into a discussion started by a
> > handful of web developers trying to kill off URL shorteners and
> > associated linkrot, opaque URLs, etc. Perhaps it's just accelerating
> > change but the decidedly ordinary rev="canonical" proposal has already
> > taken a life of its own, springing up a blog[1], a "30 minutes or
> > less" PoC[2], a busy twitter hashtag, even a slashdot article[3]
> > (submitted by the "inventor" himself no less) - but more worryingly a
> > handful of high profile implementations at sites like Ars Technica
> > (even if there's no clients yet that will read them).
> >
> > The concept is simple: rather than forcing users to rely on third
> > parties like bit.ly, tinyurl.com etc. for short links the publishers
> > themselves can suggest link(s) in the HTTP headers and/or HTML code.
> > The resulting URL (e.g. http://example.com/promo) will generally live
> > as long as the content does and won't vanish when the redirector
> > disappears (as some invariably will - there's at least a 3 figure
> > count of them now and probably a dozen new ones every day). It's also
> > a good deal more useful to users as it can show the source (domain)
> > and subject (path), and when the link is exposed via HTTP HEAD the
> > performance is at least as good as third party shorteners.
> >
> > A bunch of alternatives to rev="canonical" have been proposed
> > including
> rel="short|shorter|shortcut|short_url|short_url|alternate|self|...",
> > but for various reasons[4] I don't think any of them are suitable. I
> > think rel="shortlink" would work nicely and is impossible to confuse
> > with anything else (I got here via rel="short" and rel="shortcut").
> > I've roughed up a specification[5] so you can see the technical
> > details (copied below).
> >
> > I'm not necessarily all that fussed about this - the concept looks
> > good but I was basically just driving by on the weekend and saw an
> > accident about to happen (e.g. people confusing rel and rev and
> > knocking sites like Ars out of the search engines). Seems it caught
> > mnot's eye too[6] and I should point out that I had every intention[7]
> > of bringing this to your attention after having extracted consensus in
> > the group[8]. I think link relations come relatively cheap and unless
> > there's something I've missed helping to put another nail in the
> > coffin of the URL shorteners is arguably a good thing.
> >
> > Thoughts?
> >
> > Sam
> >
> > 1. http://revcanonical.wordpress.com/
> > 2. http://revcanonical.appspot.com/
> > 3. http://developers.slashdot.org/article.pl?sid=09/04/12/1834205
> > 4. http://code.google.com/p/shortlink/wiki/Alternatives
> > 5. http://code.google.com/p/shortlink/wiki/Specification
> > 6. http://www.mnot.net/blog/2009/04/14/rev_canonical_bad
> > 7. http://samj.net/2009/04/introducing-relshort-better-alternative.html
> > 8. http://groups.google.com/group/shortlink
> >
> > shortlink Specification
> >
> > Technical specification for the HTML/HTTP "shortlink" relation
> >
> > Introduction
> >
> > The shortlink relation allows webmasters to specify a short link to
> > use for the resource, thereby avoiding having to obtain one from a
> > potentially unreliable third party URL shortening service such as
> > tinyurl.com.
> >
> > Such links are useful for space-constrained applications (e.g.
> > microblogging including Twitter and mobile Internet) as well as any
> > time URLs need to be manually entered (e.g. when they are printed or
> > spoken).
> >
> > Note: Until such time as shortlink is officially standardised
> > http://purl.org/net/shortlink should be used for standards compliance.
> >
> > Details
> >
> > The shortlink appears in two places:
> >
> > within the HEAD section of the HTML document:
> >
> > <link rel="shortlink" href="http://example.com/promo">
> >
> > in the Link: HTTP header:
> >
> > Link: <http://example.com/promo>; rel=shortlink
> >
> > Implementation
> >
> > Servers
> >
> > Servers should implement both HTML and HTTP links for efficiency and
> > performance reasons.
> >
> > The shortlink should default to an automatically generated stable URI
> > based on an existing unique identifier (e.g. http://example.com/123).
> > Such identifiers may be compressed using base32 or similar (e.g.
> > http://example.com/3r). URIs should be case-insensitive and avoid
> > symbols that look or sound similar (e.g. 1 vs l), particularly when
> > manual entry will be required (e.g. printed, spoken).
> >
> > Publishers should also be given the option to specify a human-friendly
> > slug (e.g. http://example.com/promo), as users should be able to
> > derive information about the resource (path) and its source (domain)
> > from the URL.
> >
> > Where a shortlink is changed the previous URL should not be broken as
> > it may have been stored by users. Typically this requires maintaining
> > a register of mappings.
> >
> > Clients
> >
> > Clients that have already retrieved the document (e.g. web browsers,
> > news readers) should parse it to discover the link rel="shortlink"
> > element(s) and extract the href attribute from each.
> >
> > Clients that have the URL but not the document (e.g. microblogging
> > software) should conduct a HTTP HEAD request and extract any Link:
> > headers from the response. Clients should not retrieve and parse the
> > document unless the user specifically requests it.
> >
> > In the event that there are multiple shortlinks then the client may
> > choose one itself or offer the user the choice (e.g. in a drop-down
> > list). If the client chooses one it may do so randomly, by order
> > (first vs last) or by some quality of the URL (length, readability,
> > etc.).
>


> > _______________________________________________
> > Apps-Discuss mailing list
> > Apps-Discuss@ietf.org
> > https://www.ietf.org/mailman/listinfo/apps-discuss
>