Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal
Juha Hakala <juha.hakala@helsinki.fi> Tue, 17 July 2012 08:11 UTC
Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AAF0421F8601 for <urn@ietfa.amsl.com>; Tue, 17 Jul 2012 01:11:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.613
X-Spam-Level:
X-Spam-Status: No, score=-4.613 tagged_above=-999 required=5 tests=[AWL=-0.703, BAYES_05=-1.11, J_CHICKENPOX_34=0.6, J_CHICKENPOX_35=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nDZu4+DUxqnH for <urn@ietfa.amsl.com>; Tue, 17 Jul 2012 01:11:39 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id A8B0421F85F1 for <urn@ietf.org>; Tue, 17 Jul 2012 01:11:37 -0700 (PDT)
Received: from [128.214.91.90] (kkkl25.lib.helsinki.fi [128.214.91.90]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id q6H8CGSF013010 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NOT); Tue, 17 Jul 2012 11:12:20 +0300
Message-ID: <50051E60.90002@helsinki.fi>
Date: Tue, 17 Jul 2012 11:12:16 +0300
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.5) Gecko/20120605 Thunderbird/10.0.5
MIME-Version: 1.0
To: "Svensson, Lars" <L.Svensson@dnb.de>
References: <201207050926.LAA08015@TR-Sys.de> <4FF6DAAB.8090800@helsinki.fi> <24637769D123E644A105A0AF0E1F92EF24697EA4@dnbf-ex1.AD.DDB.DE>
In-Reply-To: <24637769D123E644A105A0AF0E1F92EF24697EA4@dnbf-ex1.AD.DDB.DE>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: Maarit Huttunen <Maarit.Huttunen@helsinki.fi>, "\"Kett, Jürgen\"" <J.Kett@dnb.de>, "urn@ietf.org" <urn@ietf.org>, Stella Griffiths <stella@isbn-international.org>, "Geipel, Markus" <M.Geipel@dnb.de>
Subject: Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jul 2012 08:11:42 -0000
Hello Lars, On 13.7.2012 12:49, Svensson, Lars wrote: > Juha, all, > > Much of the current discussion -- particularly the use of fragments -- revolves around the consequences for urn:isbn (RFC 3187bis). In my opinion, that discussion prevents us from making progress in other areas. In short, I consider the use of ISBNs to create URNs flawed and suggest that we for the time being ignore requirements arising from the use of urn:isbn. Your message is related to a much broader issue of what kind of usage of (traditional) identifiers is acceptable, generally and more specifically from the URN point of view. Because identifier assignment is usually a manual process, human errors are possible, and perhaps inevitable. ISBN is no exception in this respect, and taking a closer look at the problems within this namespace is useful. But these problems do not undermine the value of the ISBN system as a whole. The biggest challenge to the URN system are the namespaces where there is no control over identifier assignment at all. The ISBN community has detailed identifier assignment rules expressed in the ISBN manual, and national ISBN centers are responsible of communicating these rules to the publishers. In most countries these rules are well respected, but there are also countries where there has been pressures to cut corners, for instance as regards the requirement to give different ISBNs to different formats. In spite of its problems, ISBN is one of the best managed bibliographic identifier systems in use. When ISBNs are assigned according to the ISBN manual, there should be no serious conflicts with the URN principles, although in some areas the ISBN community should perhaps investigate more closely from the URN point of view the practical implications of the existing policies. Please see some comments below. > > Longer version: > > As an answer to Alfred, Juha wrote: >>> Unlike for other URIs, URNs in general are dedicated to be media- >>> and technology-independent, as almost necessitated by the target of >>> long-term, global scope, uniqueness, and persistence (RFC 1737, >>> Section 2). >> >> I agree on technology independence, but media independence is a more >> complex issue. Many traditional identifier systems are media dependent. >> For instance, each manifestation of a book (hard back, paperback, PDF) >> must get its own ISBN. So any URN:ISBN will be forever tied to a >> single manifestation of the book. When the book in PDF is migrated to >> a more modern format, that updated manifestation shall receive a new ISBN. >> These two ISBNs / URN:ISBNs will be interlinked in metadata so the >> users can travel forward and backward in time, depending on their >> preferences. > > This statement is dependent on two axioms/assumptions specified in URN documents: > > (1) "Global uniqueness: The same URN will never be assigned to two different > resources." (RFC 2141bis-02 sec 1.2) > > (2) "Assumption #1: Assignment of a URN is a managed process.” (RFC 3406bis-02) > To me this means that if we create (assign) URNs merely by prefixing an > existing identifier – as by putting “urn:issn:” in front of an existing ISSN > – the process assigning the already existing identifier needs to be managed > as well. > > My take is that urn:isbn does not adhere to any of those criteria, at least not globally. IMHO, it does - if the system is used correctly. This interpretation is of course dependent on how we understand "different": is an unaltered reprint of a book a different resource than the original or not? I would prefer not to elaborate in rfc2141bis the meaning of "different" in axiom 1, since too strict an interpretation might mean that some traditional identifiers would not be suitable URNs. And this would be harmful both to the URN system and these identifiers. > 1) Uniqueness: An analysis of ISBNs in the catalogue of the Deutsche Nationalbibliothek (German National Library, DNB) revealed more than 200,000 ISBNs where more than one bibliographic record shared the same ISBN. Some of those instances are OK according to the ISBN Manual [ISBN Manual], whereby it must be noted, that -- for someone used to the language of RFC 2119 -- the ISBN Manual's language is somewhat fuzzy, using a mixture of shall, should, must etc where an IETF document would use MUST. Nonetheless, an application building on the uniqueness of urn:isbn would then entail that all records having the same ISBN are the same resource, which is not true. For the sake of comparison, how many ISBNs have been correctly assigned? And of these 200.000, how many are OK according to the ISBN manual? > 2) Managed process: The assignment of ISBNs to individual publications is in many (most?) jurisdictions specified by the ISBN Manual [ISBN Manual] but delegated to the publisher. A publisher wishing to use ISBNs to identify his publications buys a number range from a national ISBN centre. In many cases, the national centre has no control over how the publisher uses the assigned number range and if he (willingly or not) assigns the same ISBN more than once. A national library _could_ be a control point, but that assumes that a) legal deposit is in place, so that the national library can be certain that _all_ publications can be checked and b) that the national library is in a position to check ISBNs _before_ the book is published. National libraries are generally documentation centres (i. e. we document what has been published) so when a publication arrives it is often too late to change an improper ISBN anyway... > > The above statements about uniqueness are based on an ad hoc-analysis of data from the DNB. The first step was to collect all internal record identifiers sharing a common ISBN and to output a list of ISBNs pointing to more than one internal identifier. The second step was to manually evaluate a randomly chosen set of identifiers and see if the identified records actually are different or not. I could identify (at least) five different cases: > 1) The publisher has re-used an ISBN for two entirely different publications. Example: urn:isbn:3-332-00079-9 would identify [1] and [2]. This violates 5.1 of the ISBN Manual [ISBN Manual] This is a human error and these cases should be rare. The national library or other organisation maintaining the national bibliography will spot these errors, and may have a policy for fixing them. > 2) The publisher has re-used an ISBN for the same publication in two different formats (e. g. print and electronic). Example: urn:isbn:3-8272-6986-5 would identify [3] and [4]. This violates 5.4 of the ISBN Manual. In the worst case (if the principles of the ISBN Manual are not followed) these cases may become increasingly common. The library community and book stores must try to convince the publishers that this practice may have dire consequences. > 3) The publisher has used the same ISBN for two editions of a publication, where there has been no change of content (unchanged reprints). Example: urn:isbn:3-10-048198-4 would identify [5] and [6]. This is OK according to 5.2 of the ISBN Manual. From the URN resolution point of view, this should not be a serious problem. There is no need to reprint a digital book. And if there are reprints of a printed book, there may be just one bibliographic record describing all of these versions. > 4) The publisher has used the same ISBN for two editions of a publication where there has been a (significant) change in the content (augmented or revised reprints). Example: urn:isbn:3-921885-30-2 would identify [7] and [8]. This violates 5.2 of the ISBN Manual. The publishers should not do this, so these cases ought to be relatively rare. The national library or other organisation maintaining the national bibliography should be able to spot these errors. > 5) The publisher has assigned an ISBN to a media bundle (e. g. a book with an accompanying CD or a collection of books sold as a bundle, such as the complete Works of William Shakespeare). Examples: urn:isbn:978-3-609-62394-8 would identify the media bundle [9] consisting of [10] and [11] (a book with a CD-ROM). This is OK according to 5.6 of the ISBN Manual. Individual parts of the bundle often receive their own ISBNs. An example of this is (book) series and its individual parts (e.g. The Lord of the rings and its three parts). There should be no problems in resolving these URN:ISBNs in an appropriate manner. If the parts do not have their own ISBNs, other identifiers can be assigned. > > A more thorough evaluation will provide data on the relative frequencies of those four cases and possibly identify further ones. And yes, the data indicates that some publishers are more notorious than others... I look forward to this additional information. Does the Deutsche Nationalbibliothek have plans to contact those publishers who are the worst offenders? I don't know if we could provide a comparable analysis of the situation in Finland. And I doubt it would be of interest on the urn list. > So what does this mean? > 1) We should stop using urn:isbns as globally unique identifiers, because they aren’t. The only statement we can make is that urn:isbn:xxxx identifies whatever the owner of xxxx says it identifies, which might not be uniquely determined. There is no way the urn community, the IETF or any other agency can globally control ISBN assignment and much less enforce the rules laid down in the ISBN Manual [ISBN Manual]. If you cannot control uniqueness, you should not regulate it (and much less you should build applications that depend on it). There will be no URN namespaces where IETF or any other organization can fully control the uniqueness and persistence of an identifier. If a traditional identifier can only get a URN namespace if the identifier cannot be misused, by mistake or by purpose, then URN system could never become popular. > 2) I propose that we deprecate RFC 3187 and cease work on 3187bis, at least until the questions of uniqueness and process management have been clarified. The process question requires close co-operation with the International ISBN Agency (which Juha does anyway, as far as I know). Alternatively, RFC 3187bis can state that the use of urn:isbn is restricted to certain registration group elements (e. g. 951 and 952 since it seems to work in Finland), but I doubt that that is feasible. IETF must not deprecate URN namespaces which are already in use. Any changes that would undermine existing URN services are out of scope of the URNbis. Any fundamental issues with a namespace concerning the uniqueness and persistence of the identifiers should be discussed in the namespace registration request. IMHO, human errors in identifier assignment are not such issues. In the namespace registration I would not go beyond telling if the community has rules of identifier assignment, and perhaps providing an overview of these principles. Obviously these rules should not be in conflict with the URN principles. In the case of ISBN, I see no such conflicts, although there are cases in which we should investigate the practical consequences of existing (acceptable) practices. The URN community may be able to support the ISBN international and the national ISBN centers in their work at enforcing the guidelines of the ISBN manual, by showing that if the same ISBN is given to different formats and / or editions of a book, URN resolution process can be compromised. > 3) When discussing the use of fragment identifiers in RFC 2141bis, we should ignore potential use cases stemming from the use of urn:isbn. This is fine with me. Best regards, Juha > > All the best, > > Lars > > [1] http://d-nb.info/870687158 > [2] http://d-nb.info/881221112 > [3] http://d-nb.info/1015716695 > [4] http://d-nb.info/976818019 > [5] http://d-nb.info/946281599 > [6] http://d-nb.info/870379119 > [7] http://d-nb.info/800228464 > [8] http://d-nb.info/810433923 > [9] http://d-nb.info/988495430 > [10] http://d-nb.info/986478970 > [11] http://d-nb.info/988495589 > [ISBN Manual] ISBN User's Manual. International Edition. Sixth Edition. London 2012. http://www.isbn-international.org/pages/media/Usermanuals/ISBN%20Manual%202012%20-corr.pdf > > ***Lesen. Hören. Wissen. 100 Jahre Deutsche Nationalbibliothek*** ***Reading. Listening. Understanding. A century of the German National Library*** > > -- > Dr. Lars G. Svensson > Deutsche Nationalbibliothek / Informationstechnik http://www.dnb.de/ l.svensson@dnb.de http://www.dnb.de/100jahre > -- Juha Hakala Senior advisor, standardisation and IT The National Library of Finland P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University Email juha.hakala@helsinki.fi, tel +358 50 382 7678
- [urn] A way forward for rfc2141bis and rfc3406bis… Alfred Hönes
- Re: [urn] A way forward for rfc2141bis and rfc340… Juha Hakala
- Re: [urn] A way forward for rfc2141bis and rfc340… Juha Hakala
- Re: [urn] A way forward for rfc2141bis and rfc340… Svensson, Lars
- Re: [urn] A way forward for rfc2141bis and rfc340… Svensson, Lars
- Re: [urn] A way forward for rfc2141bis and rfc340… Juha Hakala
- Re: [urn] A way forward for rfc2141bis and rfc340… Peter Saint-Andre