Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal

Juha Hakala <juha.hakala@helsinki.fi> Tue, 17 July 2012 08:11 UTC

Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AAF0421F8601 for <urn@ietfa.amsl.com>; Tue, 17 Jul 2012 01:11:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.613
X-Spam-Level:
X-Spam-Status: No, score=-4.613 tagged_above=-999 required=5 tests=[AWL=-0.703, BAYES_05=-1.11, J_CHICKENPOX_34=0.6, J_CHICKENPOX_35=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nDZu4+DUxqnH for <urn@ietfa.amsl.com>; Tue, 17 Jul 2012 01:11:39 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id A8B0421F85F1 for <urn@ietf.org>; Tue, 17 Jul 2012 01:11:37 -0700 (PDT)
Received: from [128.214.91.90] (kkkl25.lib.helsinki.fi [128.214.91.90]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id q6H8CGSF013010 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NOT); Tue, 17 Jul 2012 11:12:20 +0300
Message-ID: <50051E60.90002@helsinki.fi>
Date: Tue, 17 Jul 2012 11:12:16 +0300
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.5) Gecko/20120605 Thunderbird/10.0.5
MIME-Version: 1.0
To: "Svensson, Lars" <L.Svensson@dnb.de>
References: <201207050926.LAA08015@TR-Sys.de> <4FF6DAAB.8090800@helsinki.fi> <24637769D123E644A105A0AF0E1F92EF24697EA4@dnbf-ex1.AD.DDB.DE>
In-Reply-To: <24637769D123E644A105A0AF0E1F92EF24697EA4@dnbf-ex1.AD.DDB.DE>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: Maarit Huttunen <Maarit.Huttunen@helsinki.fi>, "\"Kett, Jürgen\"" <J.Kett@dnb.de>, "urn@ietf.org" <urn@ietf.org>, Stella Griffiths <stella@isbn-international.org>, "Geipel, Markus" <M.Geipel@dnb.de>
Subject: Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jul 2012 08:11:42 -0000

Hello Lars,

On 13.7.2012 12:49, Svensson, Lars wrote:
> Juha, all,
>
> Much of the current discussion -- particularly the use of fragments -- revolves around the consequences for urn:isbn (RFC 3187bis). In my opinion, that discussion prevents us from making progress in other areas. In short, I consider the use of ISBNs to create URNs flawed and suggest that we for the time being ignore requirements arising from the use of urn:isbn.

Your message is related to a much broader issue of what kind of usage of 
(traditional) identifiers is acceptable, generally and more specifically 
from the URN point of view. Because identifier assignment is usually a 
manual process, human errors are possible, and perhaps inevitable. ISBN 
is no exception in this respect, and taking a closer look at the 
problems within this namespace is useful. But these problems do not 
undermine the value of the ISBN system as a whole. The biggest challenge 
to the URN system are the namespaces where there is no control over 
identifier assignment at all.

The ISBN community has detailed identifier assignment rules expressed in 
the ISBN manual, and national ISBN centers are responsible of 
communicating these rules to the publishers. In most countries these 
rules are well respected, but there are also countries where there has 
been pressures to cut corners, for instance as regards the requirement 
to give different ISBNs to different formats.

In spite of its problems, ISBN is one of the best managed bibliographic 
identifier systems in use. When ISBNs are assigned according to the ISBN 
manual, there should be no serious conflicts with the URN principles, 
although in some areas the ISBN community should perhaps investigate 
more closely from the URN point of view the practical implications of 
the existing policies.

Please see some comments below.
>
> Longer version:
>
> As an answer to Alfred, Juha wrote:
>>> Unlike for other URIs, URNs in general are dedicated to be media-
>>> and technology-independent, as almost necessitated by the target of
>>> long-term, global scope, uniqueness, and persistence (RFC 1737,
>>> Section 2).
>>
>> I agree on technology independence, but media independence is a more
>> complex issue. Many traditional identifier systems are media dependent.
>> For instance, each manifestation of a book (hard back, paperback, PDF)
>> must get its own ISBN. So any URN:ISBN will be forever tied to a
>> single manifestation of the book. When the book in PDF is migrated to
>> a more modern format, that updated manifestation shall receive a new ISBN.
>> These two ISBNs / URN:ISBNs will be interlinked in metadata so the
>> users can travel forward and backward in time, depending on their
>> preferences.
>
> This statement is dependent on two axioms/assumptions specified in URN documents:
>
> 	(1)	"Global uniqueness: The same URN will never be assigned to two different
> 		resources." (RFC 2141bis-02 sec 1.2)
>
> 	(2)	"Assumption #1:  Assignment of a URN is a managed process.” (RFC 3406bis-02)
> 		To me this means that if we create (assign) URNs merely by prefixing an
> 		existing identifier – as by putting “urn:issn:” in front of an existing ISSN
> 		– the process assigning the already existing identifier needs to be managed
> 		as well.
>
> My take is that urn:isbn does not adhere to any of those criteria, at least not globally.

IMHO, it does - if the system is used correctly. This interpretation is 
of course dependent on how we understand "different": is an unaltered 
reprint of a book a different resource than the original or not?

I would prefer not to elaborate in rfc2141bis the meaning of "different" 
in axiom 1, since too strict an interpretation might mean that some 
traditional identifiers would not be suitable URNs. And this would be 
harmful both to the URN system and these identifiers.

> 1)	Uniqueness: An analysis of ISBNs in the catalogue of the Deutsche Nationalbibliothek (German National Library, DNB) revealed more than 200,000 ISBNs where more than one bibliographic record shared the same ISBN. Some of those instances are OK according to the ISBN Manual [ISBN Manual], whereby it must be noted, that -- for someone used to the language of RFC 2119 -- the ISBN Manual's language is somewhat fuzzy, using a mixture of shall, should, must etc where an IETF document would use MUST. Nonetheless, an application building on the uniqueness of urn:isbn would then entail that all records having the same ISBN are the same resource, which is not true.

For the sake of comparison, how many ISBNs have been correctly assigned? 
And of these 200.000, how many are OK according to the ISBN manual?

> 2)	Managed process: The assignment of ISBNs to individual publications is in many (most?) jurisdictions specified by the ISBN Manual [ISBN Manual] but delegated to the publisher. A publisher wishing to use ISBNs to identify his publications buys a number range from a national ISBN centre. In many cases, the national centre has no control over how the publisher uses the assigned number range and if he (willingly or not) assigns the same ISBN more than once. A national library _could_ be a control point, but that assumes that a) legal deposit is in place, so that the national library can be certain that _all_ publications can be checked and b) that the national library is in a position to check ISBNs _before_ the book is published. National libraries are generally documentation centres (i. e. we document what has been published) so when a publication arrives it is often too late to change an improper ISBN anyway...
>
> The above statements about uniqueness are based on an ad hoc-analysis of data from the DNB. The first step was to collect all internal record identifiers sharing a common ISBN and to output a list of ISBNs pointing to more than one internal identifier. The second step was to manually evaluate a randomly chosen set of identifiers and see if the identified records actually are different or not. I could identify (at least) five different cases:
> 1)	The publisher has re-used an ISBN for two entirely different publications. Example: urn:isbn:3-332-00079-9 would identify [1] and [2]. This violates 5.1 of the ISBN Manual [ISBN Manual]

This is a human error and these cases should be rare. The national 
library or other organisation maintaining the national bibliography will 
spot these errors, and may have a policy for fixing them.

> 2)	The publisher has re-used an ISBN for the same publication in two different formats (e. g. print and electronic). Example: urn:isbn:3-8272-6986-5 would identify [3] and [4]. This violates 5.4 of the ISBN Manual.

In the worst case (if the principles of the ISBN Manual are not 
followed) these cases may become increasingly common. The library 
community and book stores must try to convince the publishers that this 
practice may have dire consequences.

> 3)	The publisher has used the same ISBN for two editions of a publication, where there has been no change of content (unchanged reprints). Example: urn:isbn:3-10-048198-4 would identify [5] and [6]. This is OK according to 5.2 of the ISBN Manual.

 From the URN resolution point of view, this should not be a serious 
problem. There is no need to reprint a digital book. And if there are 
reprints of a printed book, there may be just one bibliographic record 
describing all of these versions.

> 4)	The publisher has used the same ISBN for two editions of a publication where there has been a (significant) change in the content (augmented or revised reprints). Example: urn:isbn:3-921885-30-2 would identify [7] and [8]. This violates 5.2 of the ISBN Manual.

The publishers should not do this, so these cases ought to be relatively 
rare. The national library or other organisation maintaining the 
national bibliography should be able to spot these errors.

> 5)	The publisher has assigned an ISBN to a media bundle (e. g. a book with an accompanying CD or a collection of books sold as a bundle, such as the complete Works of William Shakespeare). Examples: urn:isbn:978-3-609-62394-8 would identify the media bundle [9] consisting of [10] and [11] (a book with a CD-ROM). This is OK according to 5.6 of the ISBN Manual.

Individual parts of the bundle often receive their own ISBNs. An example 
of this is (book) series and its individual parts (e.g. The Lord of the 
rings and its three parts). There should be no problems in resolving 
these URN:ISBNs in an appropriate manner. If the parts do not have their 
own ISBNs, other identifiers can be assigned.
>
> A more thorough evaluation will provide data on the relative frequencies of those four cases and possibly identify further ones. And yes, the data indicates that some publishers are more notorious than others...

I look forward to this additional information.

Does the Deutsche Nationalbibliothek have plans to contact those 
publishers who are the worst offenders?

I don't know if we could provide a comparable analysis of the situation 
in Finland. And I doubt it would be of interest on the urn list.

> So what does this mean?
> 1)	We should stop using urn:isbns as globally unique identifiers, because they aren’t. The only statement we can make is that urn:isbn:xxxx identifies whatever the owner of xxxx says it identifies, which might not be uniquely determined. There is no way the urn community, the IETF or any other agency can globally control ISBN assignment and much less enforce the rules laid down in the ISBN Manual [ISBN Manual]. If you cannot control uniqueness, you should not regulate it (and much less you should build applications that depend on it).

There will be no URN namespaces where IETF or any other organization can 
fully control the uniqueness and persistence of an identifier.

If a traditional identifier can only get a URN namespace if the 
identifier cannot be misused, by mistake or by purpose, then URN system 
could never become popular.

> 2)	I propose that we deprecate RFC 3187 and cease work on 3187bis, at least until the questions of uniqueness and process management have been clarified. The process question requires close co-operation with the International ISBN Agency (which Juha does anyway, as far as I know). Alternatively, RFC 3187bis can state that the use of urn:isbn is restricted to certain registration group elements (e. g. 951 and 952 since it seems to work in Finland), but I doubt that that is feasible.

IETF must not deprecate URN namespaces which are already in use. Any 
changes that would undermine existing URN services are out of scope of 
the URNbis.

Any fundamental issues with a namespace concerning the uniqueness and 
persistence of the identifiers should be discussed in the namespace 
registration request. IMHO, human errors in identifier assignment are 
not such issues. In the namespace registration I would not go beyond 
telling if the community has rules of identifier assignment, and perhaps 
providing an overview of these principles. Obviously these rules should 
not be in conflict with the URN principles. In the case of ISBN, I see 
no such conflicts, although there are cases in which we should 
investigate the practical consequences of existing (acceptable) practices.

The URN community may be able to support the ISBN international and the 
national ISBN centers in their work at enforcing the guidelines of the 
ISBN manual, by showing that if the same ISBN is given to different 
formats and / or editions of a book, URN resolution process can be 
compromised.

> 3)	When discussing the use of fragment identifiers in RFC 2141bis, we should ignore potential use cases stemming from the use of urn:isbn.

This is fine with me.

Best regards,

Juha
>
> All the best,
>
> Lars
>
> [1] http://d-nb.info/870687158
> [2] http://d-nb.info/881221112
> [3] http://d-nb.info/1015716695
> [4] http://d-nb.info/976818019
> [5] http://d-nb.info/946281599
> [6] http://d-nb.info/870379119
> [7] http://d-nb.info/800228464
> [8] http://d-nb.info/810433923
> [9] http://d-nb.info/988495430
> [10] http://d-nb.info/986478970
> [11] http://d-nb.info/988495589
> [ISBN Manual] ISBN User's Manual. International Edition. Sixth Edition. London 2012. http://www.isbn-international.org/pages/media/Usermanuals/ISBN%20Manual%202012%20-corr.pdf
>
> ***Lesen. Hören. Wissen. 100 Jahre Deutsche Nationalbibliothek*** ***Reading. Listening. Understanding. A century of the German National Library***
>
> --
> Dr. Lars G. Svensson
> Deutsche Nationalbibliothek / Informationstechnik http://www.dnb.de/ l.svensson@dnb.de http://www.dnb.de/100jahre
>

-- 

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email juha.hakala@helsinki.fi, tel +358 50 382 7678