[urn] URN fragments

Juha Hakala <juha.hakala@helsinki.fi> Thu, 21 February 2013 11:36 UTC

Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9CD7321F8DA2 for <urn@ietfa.amsl.com>; Thu, 21 Feb 2013 03:36:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[AWL=-0.001, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RYV9s2P-92h6 for <urn@ietfa.amsl.com>; Thu, 21 Feb 2013 03:36:33 -0800 (PST)
Received: from smtp-rs2.it.helsinki.fi (smtp-rs2-vallila1.fe.helsinki.fi [128.214.173.73]) by ietfa.amsl.com (Postfix) with ESMTP id C04B821F8CED for <urn@ietf.org>; Thu, 21 Feb 2013 03:36:30 -0800 (PST)
Received: from [128.214.71.180] (lh2-kkl1206.lib.helsinki.fi [128.214.71.180]) (authenticated bits=0) by smtp-rs2.it.helsinki.fi (8.13.8/8.13.8) with ESMTP id r1LBaSS7003771 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 21 Feb 2013 13:36:29 +0200
Message-ID: <512606BC.7020902@helsinki.fi>
Date: Thu, 21 Feb 2013 13:36:28 +0200
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: "urn@ietf.org" <urn@ietf.org>, Peter Saint-Andre <stpeter@stpeter.im>
Content-Type: multipart/alternative; boundary="------------060301030606030704000506"
Subject: [urn] URN fragments
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 11:36:34 -0000

Hello Peter; all,

Concerning this:
> Could you also explain (perhaps in a separate thread) the intended use
> of the fragment identifier?
It took a while before the bibliographic community got a sufficient grasp of how fragment can best be used in the URN context. Initially we thought that fragment should be part of the URN namespace specific string, but we realized soon that that would complicate things a lot. Since many (most, probably) standard namespaces do not allow add-ons to identifier strings, this:

  Ex. 1http://urn.fi/URN:ISBN:978-952-10-7670-1

would have been possible, but not this (NOTE: this is only an example)

Ex. 2http://urn.fi/URN:ISBN:978-952-10-7670-1#chapter2

Moreover, adding a fragment to an existing URN would in this case have been an act of identifier assignment, which in some namespaces is a managed process, so too few people would have been allowed to use fragments with URNs.

Eventually Alfred Hoenes, myself and others concluded that the best solution is to separate identification & URN assignment from fragment assignment, that is, not to include fragment in the NSS. Which means that fragments can be added to the URN by anybody, any time after the URN has been assigned and the document made available (via URN resolution) in the Internet.

Then a typical use case is a scientist who wants to cite a structured (and identified) resource. For instance, citation to chapter 2 of the book with ISBN978-952-10-7670-1  <http://urn.fi/URN:ISBN:978-952-10-7670-1>  could now be done with the HTTP URI in Example 2 above. Of course, using fragment is only possible if the URN is actionable, supports resolution to the resource and the file format of that resource supports fragment in the spirit of RFC 3986. There are a lot of if's here, but this infrastructure is maintained by a national library / archive, service is likely to persist for quite some time.

As an aside, standard guidelines for citing & referencing are out of date. When PIDs are not assigned to start with, people have to use URLs when citing networked resources. In some cases URLs are used even if there is a resolvable PID. Most of us have probably seen 5 -10 year old documents in which most URL links in the reference list are already dead. This will make it very difficult to confirm that citing and referencing has been made correctly, or that the cited document has ever even existed. It is important to improve the situation by using PIDs such as URNs and DOIs, and by providing guidelines on how to use them in citing & referencing.

When fragment usage was last discussed on the list, the IETF community disapproved of the idea (if I remember correctly) on the basis that URNs will not be bound to a single manifestation of a resource, such as a PDF version of a book. When the file format changes, the probability that the fragment no longer works is too close for comfort to 1.

There are two responses to this. The obvious one is that if the fragment is not part of the URN itself (but just tells the browser a location where to go within the identified resource) the URN will nevertheless still be valid, and the user will get a modernized version of the resource. The other one, with which the bibliographic community did not get anywhere, was that there are namespaces where identifiers must be assigned separately for each manifestation of a resource. ISBN is an example of this, and although some publishers do misuse the system, the vast majority of ISBNs have been correctly assigned - which means that we have a good reason to believe that fragments would work fine longer than the browsers are able to cope with the ancient file formats. Guaranteeing access to original versions of resources, or digital archaeology, may well keep the memory institutions busy in the future...

There may of course exist namespaces where fragments cannot be used. There are identifier systems for immaterial works (libraries do make a difference between the original english version of "Hamlet"; its expressions in other languages, and physical manifestations of these). There are identifiers for names, collections, metadata elements, and so on. These identifiers do not have URN namespace registrations yet, but such registrations can be made in the future. Therefore namespace registrations should say whether fragment usage is applicable at least in theory. In the same way, they could provide examples of resolution services that can be supported.

Even if the IETF URN syntax specification would not say anything about the usage of fragment, the Finnish standard will. Given that RFC 3986 specifies fragment usage and all popular browsers support this functionality, it is more or less certain that people start using fragments with URNs (expressed as HTTP URIs) when e.g. citing documents. If guidelines for doing this are given in RFCs or other standards, it will be easier to co-ordinate and perhaps even foster this process, and to renew for instance ISO 690, the standard for bibliographic referencing (http://en.wikipedia.org/wiki/ISO_690).

Best regards,

Juha

-- 

  Juha Hakala
  Senior advisor

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503)
  FIN-00014 Helsinki University
  tel +358 9 191 44293