rel="shortlink" proposal for advertising short URLs in HTML/HTTP

Sam Johnston <samj@samj.net> Tue, 14 April 2009 18:52 UTC

Return-Path: <samj@samj.net>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E71133A68A8 for <apps-discuss@core3.amsl.com>; Tue, 14 Apr 2009 11:52:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.077
X-Spam-Level:
X-Spam-Status: No, score=-0.077 tagged_above=-999 required=5 tests=[AWL=-1.899, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_32=0.6, J_CHICKENPOX_39=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kzci2Y1gfTlt for <apps-discuss@core3.amsl.com>; Tue, 14 Apr 2009 11:52:38 -0700 (PDT)
Received: from mail-qy0-f134.google.com (mail-qy0-f134.google.com [209.85.221.134]) by core3.amsl.com (Postfix) with ESMTP id 874B73A659C for <apps-discuss@ietf.org>; Tue, 14 Apr 2009 11:52:38 -0700 (PDT)
Received: by qyk40 with SMTP id 40so1387017qyk.29 for <apps-discuss@ietf.org>; Tue, 14 Apr 2009 11:53:50 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.85.9 with SMTP id m9mr8191610vcl.40.1239735229337; Tue, 14 Apr 2009 11:53:49 -0700 (PDT)
Date: Tue, 14 Apr 2009 20:53:49 +0200
Message-ID: <21606dcf0904141153t3433975fh2bacf75f37353beb@mail.gmail.com>
Subject: rel="shortlink" proposal for advertising short URLs in HTML/HTTP
From: Sam Johnston <samj@samj.net>
To: apps-discuss@ietf.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Apr 2009 18:56:24 -0000

Evening all,

I think it's time to call the grown-ups into a discussion started by a
handful of web developers trying to kill off URL shorteners and
associated linkrot, opaque URLs, etc. Perhaps it's just accelerating
change but the decidedly ordinary rev="canonical" proposal has already
taken a life of its own, springing up a blog[1], a "30 minutes or
less" PoC[2], a busy twitter hashtag, even a slashdot article[3]
(submitted by the "inventor" himself no less) - but more worryingly a
handful of high profile implementations at sites like Ars Technica
(even if there's no clients yet that will read them).

The concept is simple: rather than forcing users to rely on third
parties like bit.ly, tinyurl.com etc. for short links the publishers
themselves can suggest link(s) in the HTTP headers and/or HTML code.
The resulting URL (e.g. http://example.com/promo) will generally live
as long as the content does and won't vanish when the redirector
disappears (as some invariably will - there's at least a 3 figure
count of them now and probably a dozen new ones every day). It's also
a good deal more useful to users as it can show the source (domain)
and subject (path), and when the link is exposed via HTTP HEAD the
performance is at least as good as third party shorteners.

A bunch of alternatives to rev="canonical" have been proposed
including rel="short|shorter|shortcut|short_url|short_url|alternate|self|...",
but for various reasons[4] I don't think any of them are suitable. I
think rel="shortlink" would work nicely and is impossible to confuse
with anything else (I got here via rel="short" and rel="shortcut").
I've roughed up a specification[5] so you can see the technical
details (copied below).

I'm not necessarily all that fussed about this - the concept looks
good but I was basically just driving by on the weekend and saw an
accident about to happen (e.g. people confusing rel and rev and
knocking sites like Ars out of the search engines). Seems it caught
mnot's eye too[6] and I should point out that I had every intention[7]
of bringing this to your attention after having extracted consensus in
the group[8]. I think link relations come relatively cheap and unless
there's something I've missed helping to put another nail in the
coffin of the URL shorteners is arguably a good thing.

Thoughts?

Sam

1. http://revcanonical.wordpress.com/
2. http://revcanonical.appspot.com/
3. http://developers.slashdot.org/article.pl?sid=09/04/12/1834205
4. http://code.google.com/p/shortlink/wiki/Alternatives
5. http://code.google.com/p/shortlink/wiki/Specification
6. http://www.mnot.net/blog/2009/04/14/rev_canonical_bad
7. http://samj.net/2009/04/introducing-relshort-better-alternative.html
8. http://groups.google.com/group/shortlink

shortlink Specification

Technical specification for the HTML/HTTP "shortlink" relation

Introduction

The shortlink relation allows webmasters to specify a short link to
use for the resource, thereby avoiding having to obtain one from a
potentially unreliable third party URL shortening service such as
tinyurl.com.

Such links are useful for space-constrained applications (e.g.
microblogging including Twitter and mobile Internet) as well as any
time URLs need to be manually entered (e.g. when they are printed or
spoken).

Note: Until such time as shortlink is officially standardised
http://purl.org/net/shortlink should be used for standards compliance.

Details

The shortlink appears in two places:

within the HEAD section of the HTML document:

<link rel="shortlink" href="http://example.com/promo">

in the Link: HTTP header:

Link: <http://example.com/promo>; rel=shortlink

Implementation

Servers

Servers should implement both HTML and HTTP links for efficiency and
performance reasons.

The shortlink should default to an automatically generated stable URI
based on an existing unique identifier (e.g. http://example.com/123).
Such identifiers may be compressed using base32 or similar (e.g.
http://example.com/3r). URIs should be case-insensitive and avoid
symbols that look or sound similar (e.g. 1 vs l), particularly when
manual entry will be required (e.g. printed, spoken).

Publishers should also be given the option to specify a human-friendly
slug (e.g. http://example.com/promo), as users should be able to
derive information about the resource (path) and its source (domain)
from the URL.

Where a shortlink is changed the previous URL should not be broken as
it may have been stored by users. Typically this requires maintaining
a register of mappings.

Clients

Clients that have already retrieved the document (e.g. web browsers,
news readers) should parse it to discover the link rel="shortlink"
element(s) and extract the href attribute from each.

Clients that have the URL but not the document (e.g. microblogging
software) should conduct a HTTP HEAD request and extract any Link:
headers from the response. Clients should not retrieve and parse the
document unless the user specifically requests it.

In the event that there are multiple shortlinks then the client may
choose one itself or offer the user the choice (e.g. in a drop-down
list). If the client chooses one it may do so randomly, by order
(first vs last) or by some quality of the URL (length, readability,
etc.).