[URN] Bibliographic URN's

Cecilia Preston <cecilia@well.com> Wed, 16 April 1997 20:08 UTC

Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id QAA16986 for urn-ietf-out; Wed, 16 Apr 1997 16:08:09 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id QAA16973 for <urn-ietf@services.bunyip.com>; Wed, 16 Apr 1997 16:08:04 -0400 (EDT)
Received: from ranga.SIMS.Berkeley.EDU by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA25057 (mail destined for urn-ietf@services.bunyip.com); Wed, 16 Apr 97 16:07:56 -0400
Received: from d15.ucop.edu by ranga.SIMS.Berkeley.EDU; (5.65/1.1.8.2/11Aug95-1134AM) id AA15463; Wed, 16 Apr 1997 13:03:53 -0700
X-Sender: cpreston@briet.sims.berkeley.edu
Message-Id: <v03007805af7ae1c4ffcb@[128.48.100.36]>
Mime-Version: 1.0
Content-Type: text/enriched; charset="us-ascii"
Date: Wed, 16 Apr 1997 13:12:08 -0700
To: urn-ietf@bunyip.com
From: Cecilia Preston <cecilia@well.com>
Subject: [URN] Bibliographic URN's
Sender: owner-urn-ietf@Bunyip.Com
Precedence: bulk
Reply-To: Cecilia Preston <cecilia@well.com>
Errors-To: owner-urn-ietf@Bunyip.Com

At the meeting in Memphis, we decided to put this out to the list again
this time being very carefull that the text be in plain old ASCII.  It
seems what looked just fine to me as I composed the message, didn't
always come out that

way.  Networks....


The file says this is text only.  If anyone has any problems reading
this let me know and we will work something out.


--Cecilia


<fontfamily><param>Times</param><bigger><bigger>

Internet Draft                                Clifford Lynch

draft-ietf-urn-biblio-00.txt        University of California

22 March 1997                                Cecilia Preston

Expires in six months                        Preston & Lynch

                                              Ron Daniel Jr.

                              Los Alamos National Laboratory



          Using Existing Bibliographic Identifiers

                             as 

                   Uniform Resource Names



Status of this Document


This document is an Internet-Draft.  Internet-Drafts are 

working documents of the Internet Engineering Task Force 

(IETF), its areas, and its working groups.  Note that other 

groups may also distribute working documents as Internet-

Drafts.


Internet-Drafts are draft documents valid for a maximum of 

six months and may be updated, replaced or made obsolete by 

other documents at any time.  It is inappropriate to use 

Internet-Drafts as reference material or to cite them other 

than as works in progress. 


Distribution of this document is unlimited.  Please send 

comments to clifford.lynch@ucop.edu and cecilia@well.com.


This document does not specify a standard; it is purely 

informational.



0. Abstract


A system for Uniform Resource Names (URNs) must be capable

of supporting identifiers from existing widely-used naming 

systems.  This document discusses how three major 

bibliographic identifiers (the ISBN, ISSN and SICI) can be 

supported within the URN framework and the currently 

proposed syntax for URNs.




                                                    [Page 1]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



1. Introduction


The ongoing work of several IETF working groups, most 

recently in the Uniform Resource Names working group, has 

culminated the development of a syntax for Uniform Resource 

Names (URNs).   The functional requirements and overall 

framework for Uniform Resource Names are specified in RFC 

1737 [Sollins & Masinter] and the current proposal for the 

URN syntax is draft-ietf-urn-syntax-04.txt [Moats].


As part of the validation process for the development of 

URNs the IETF working group has agreed that it is important 

to demonstrate that the current URN syntax proposal can 

accommodate[RD1] existing identifiers from well managed 

namespaces.  One such well-established infrastructure for 

assigning and managing names comes from the bibliographic 

community.  Bibliographic identifiers function as names for 

objects that exist both in print and, increasingly, in 

electronic formats.  This Internet draft demonstrates the 

feasibility of supporting three representative bibliographic 

identifiers within the currently proposed URN framework and 

syntax.


Note that this document does not purport to define the 

"official" standard way of doing so; it merely demonstrates 

feasibility.  It has not been developed in consultation with 

the standards bodies and maintenance agencies that oversee 

the existing bibliographic identifiers.  Any actual Internet 

standard for encoding these bibliographic identifiers as 

URNs will need to be developed in consultation with the 

responsible standards bodies and maintenance agencies.


In addition, there are several open questions with regard to 

the management and registry of Namespace Identifiers (NIDs) 

for URNs.  For purposes of illustration, we have used the 

three NIDs "ISBN", "ISSN" and "SICI" for the three 

corresponding bibliographic identifiers discussed in this 

document.  While we believe this to be the most appropriate 

choice, it is not the only one.  The NIDs could be based on 

the standards body and standard number (e.g. "US-ANSI-NISO-

Z39.56-1997" rather than "SICI").  Alternatively, one could 

lump all bibliographic identifiers into a single 

"BIBLIOGRAPHIC" name space, and structure the namespace-


                                                   [Page 2]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997


specific string to specify which identifier is being used.

We do not believe that these are advantageous approaches, 

but must wait for the outcome of namespace management 

discussions in the working group. 


For the purposes of this document, we have selected three 

major bibliographic identifiers (national and international) 

to fit within the URN framework.  These are the 

International Standard Book Number (ISBN) [ISO1], the 

International Standard Serials Number (ISSN) [NISO1,ISO2, 

ISO3], and the Serial Item and Contribution Identifier 

(SICI) [NISO2].  ISBNs are used to identify monographs 

(books).  ISSNs are used to identify serial publications 

(journals, newspapers) as a whole.  SICIs augment the ISSN 

in order to identify individual issues of serial 

publications, or components within those issues (such as an 

individual article, or the table of contents of a given 

issue).  The ISBN and ISSN are defined in the United States 

by standards issued by the National Information Standards 

Organization (NISO) and also by parallel international 

standards issued under the auspices of the International 

Organization for Standardization (ISO).  NISO is the ANSI-

accredited standards body serving libraries, publishers and 

information services.  The SICI code is defined by a NISO 

document in the United States and does not have a parallel 

international standards document at present. 


Many other bibliographic identifiers are in common use (for 

example, the CODEN, numbers assigned by major bibliographic 

utilities such as OCLC and RLG, national library numbers 

such as the Library of Congress Control Number) or are under 

development.  While we do not discuss them in this document, 

many of these will also need to be supported within the URN 

framework as it moves to large scale implementation.  The 

issues involved in supporting those additional identifiers 

are anticipated to be broadly similar to those involved in 

supporting ISBNs, ISSNs, and SICIs.



2. Identification vs. Resolution


It is important to distinguish between the resource 

identified by a URN and the resources that can reasonably be 

provided when attempting to resolve an identifier.  For


                                                    [Page 3]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



example, the ISSN 0040-781X identifies the popular

"Time".  All of it, every issue for from the start of 

publication to present.  Resolving such an identifier should 

not result in the equivalent of hundreds of thousands of 

pages of text and photos being dumped to the user's machine.

It is more reasonable for ISSNs to resolve to a navigational 

system, such as an HTML-based search form, so the user may 

select issues or articles of interest.  ISBNs and SICIs, on

the other hand, do identify finite, manageably-sized 

objects, but they may still be large enough that resolution 

to a hierarchical system is appropriate.  


In addition, the materials identified by an ISSN, ISBN or 

SICI may exist only in printed or other physical form, not 

electronically.  The best that a resolver may be able to 

offer is information about where to get the physical 

resource, such as library holdings or a bookstore or 

publisher order form.  The URN Framework provides resolution 

services that may be used to describe any differences 

between the resource identified by a URN and the resource 

that would be returned as a result of resolving that URN.



3. International Standard Book Numbers


3.1 Overview


An International Standard Book Number (ISBN) identifies an 

edition of a monographic work.  The ISBN is defined by the 

standard NISO/ANSI/ISO 2108:1992 [ISO 1]


Basically, an ISBN is a ten-digit number (actually, the last 

digit can be the letter "X" as well, as described below) 

which is divided into four variable length parts usually 

separated by hyphens when printed.  The parts are as follows 

(in this order): 


* a group identifier which specifies a group of publishers, 

based on national, geographic or some other criteria,


* the publisher identifier,


                                                   [Page 4]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



* the title identifier,


* and a modulus 11 check digit[RD2], using X in lieu of 10.


The group and publisher number assignments are managed in 

such a way that the hyphens are not needed to parse the ISBN 

unambiguously into its constituent parts.  However, the ISBN 

is normally transmitted and displayed with hyphens to make 

it easy for human beings to recognize these parts without 

having to make reference to or have knowledge of the number 

assignments for group and publisher identifiers.


3.2 Encoding Considerations


Embedding ISBNs within the URN framework presents no 

particular coding problems, since all of the characters that 

can appear in an ISBN are valid in the identifier segment of 

the URN.  %-encoding is never needed.


Example: URN:ISBN:0-395-36341-1


For the ISBN namespace, some additional equivalence rules 

are appropriate.  Prior to comparing two ISBN URNs for 

equivalence, it is appropriate to remove all hyphens, and to 

convert any occurrences of the letter X to upper case.


3.3 Additional considerations


The ISBN standard and related community implementation 

guidelines define when different versions of a work should 

be assigned the same or differing ISBNs.  In actuality, 

however, practice varies somewhat depending on publisher as 

to whether different ISBNs are assigned for paperbound vs. 

hardbound versions of the same work, electronic vs. printed 

versions of the same work, or versions of the same work 

published for example in the US and in Europe.  The choice 

of whether to assign a new ISBN or to reuse an existing one 

when publishing a revised printing of an existing edition of 

a work or even a revised edition of a work is somewhat

subjective.  Practice varies from publisher to publisher 

(indeed, the distinction between a revised printing and a 

new edition is itself somewhat subjective).  The use of 


                                                    [Page 5]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



ISBNs within the URN framework simply reflects these 

existing practices.  Note that it is likely that an ISBN URN 

will often resolve to many instances of the work (many 

URLs).



4. International Standard Serials Numbers


4.1 Overview


International Standard Serials Numbers (ISSN) identify a 

work that is being published on a continued basis in issues; 

they identify the entire (often open-ended, in the case of 

an actively published) work.  ISSNs are defined by the 

standards ISO 3297:1986 [ISO 2] and ISO/DIS 3297 [ISO 3] and 

within the United States by NISO Z39.9-1992 [NISO 1].  The 

ISSN International Centre is located in Paris and 

coordinates a network of regional centers.  The National 

Serials Data Program within the Library of Congress is the 

US Center of this network.


ISSNs have the form NNNN-NNNN where N is a digit, the last 

digit may be an upper case X as the result of the check 

character calculation.  Unlike the ISBN the ISSN components 

do not have much structure; blocks of numbers are passed out 

to the regional centers and publishers.


4.2 Encoding Considerations


Again, there is no problem representing ISSNs in the 

namespace-specific string of URNs since all characters valid 

in the ISSN are valid in the namespace-specific URN string, 

and %-encoding is never required. 


Example: URN:ISSN:1046-8188


Supplementary comparison rules are also appropriate for the 

ISSN namespace.  Just as for ISBNs, hyphens should be 

dropped prior to comparison and occurrences of 'x' 

normalized to uppercase.



                                                    [Page 6]



INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



4.3 Additional Considerations


The ISSN standard and related community implementation 

guidelines specify when new ISSNs should be assigned vs. 

continuing to use an existing one.  There are some 

publications where practice within the bibliographic 

community varies from site to site, such as annuals or 

annual conference proceedings.  In some cases these are 

treated as serials and ISSNs are used, and in some cases 

they are treated as monographs and ISBNs are used.  For 

example SIGMOD Record volume 24 number 2 June 1995 contains 

the Proceedings of the 1995 ACM SIGMOD International 

Conference on Management of Data.  If you subscribe to the 

journal (ISSN 0163-5808) this is simply the June issue.  On 

the other hand you may have acquired this volume as the 

conference proceedings (a monograph) and as such would use 

the ISBN 0-89791-731-6 to identify the work.  There are also 

varying practices within the publishing community as to when 

new ISSNs are assigned due to the change in the name of a 

periodical (Atlantic becomes Atlantic Monthly); or when a 

periodical is published both in printed and electronic 

versions (The New York Times).  The use of ISSNs as URNs 

will reflect these judgments and practices.



5. Serial Item and Contribution Identifiers


5.1 Overview


The standard for Serial Item and Contribution Identifiers 

(SICI) has recently been extensively revised and is defined 

by NISO/ANSI Z39.56-1997 [NISO 2].  The maintenance agency 

for the SICI code is the UnCover Corporation.


SICI codes can be used to identify an issue of a serial, or 

a specific contribution (i.e., an article, or the table of 

contents) within an issue of a serial.  SICI codes are not 

assigned, they are constructed based on information about 

the issue or issue component in question.


The complete syntax for the SICI code will not be discussed 

here; see NISO/ANSI Z39.56-1997 for details.  However an 

example and brief review of the major components is needed


                                                    [Page 7]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



to understand the relationship with the ISSN and how this 

identifier differs.  An example of a SICI code is:


0015-6914(19960101)157:1<<62:KTSW>2.0.TX;2-F


The first nine characters are the ISSN identifying the 

serial title.  The second component, in parentheses, is the 

chronology information giving the date the particular serial 

issue was published.  In this example that date was January 

1, 1996.  The third component, 157:1, is enumeration 

information (volume, number) on the particular issue of the 

serial.  These three components comprise the "item segment" 

of a SICI code.  By augmenting the ISSN with the chronology 

and/or enumeration information, specific issues of the 

serial can be identified.  The next segment, <<62:KTSW>, 

identifies a particular contribution within the issue.  In 

this example we provide the starting page number and a title 

code constructed from the initial characters of the title. 

Identifiers assigned to a contribution can be used in the 

contribution segment if page numbers are inappropriate.  The 

rest of the identifier is the control segment, which 

includes a check character.  Interested readers are 

encouraged to consult the standard for an explanation of the 

fields in that segment.


5.2 Encoding Considerations


The character set for SICIs is intended to be email-

transport-transparent, so it does not present major 

problems.  However, all printable excluded and reserved 

characters from the URN syntax draft are valid in the SICI 

character set and must be %-encoded.


Example of a SICI for an issue of a journal


     URN:SICI:1046-8188(199501)13:1%3C%3E1.0.TX;2-F


For an article contained within that issue


     URN:SICI:1046-8188(199501)13:1%3C69:FTTHBI%3E2.0.TX;2-4



                                                    [Page 8]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997



Special equivalence rules for SICIs are not appropriate for 

definition as part of the namespace and incorporation in 

areas such as cache management algorithms.  These are best 

left to resolver systems which try to determine if two SICIs 

refer to the same content.  Consequently, we do not propose 

any specific rules for equivalence testing through lexical 

manipulation.


5.3 Additional Considerations


Since the serial is identified by an ISSN, some of the 

ambiguity currently found in the assignment of ISSNs carries 

over into SICI codes.  In cases where an ISSN may refer to a 

serial that exists in multiple formats, the SICI contains a 

qualifier that specifies the format type (for example, 

print, microform, or electronic).  SICI codes may be 

constructed from a variety of sources (the actual issue of 

the  serial, a citation or a record from an abstracting 

service) and, as such are based on the principle of using 

all available information, so there may be multiple SICI 

codes representing the same article [NISO2, Appenidx D].  

For example, one code might be constructed with access to 

both chronology and enumeration (that is, date of issue and 

volume, issue and page number), another code might be 

constructed based only on enumeration information and 

without benefit of chronology.  Systems that use SICI codes 

employ complex matching algorithms to try to match SICI 

codes constructed from incomplete information to SICI codes 

constructed with the benefit of all relevant information.


6. Security Considerations


This document proposes means of encoding several existing 

bibliographic identifiers within the URN framework.  It does 

not discuss resolution; thus questions of secure or 

authenticated resolution mechanisms are out of scope.  It 

does not address means of validating the integrity or 

authenticating the source or provenance of URNs that contain 

bibliographic identifiers.  Issues regarding intellectual 

property rights associated with objects identified by the

various bibliographic identifiers are also beyond the scope 

of this document, as are questions about rights to the 

databases that might be used to construct resolvers.


                                                    [Page 9]


INTERNET DRAFT:Bibliographic Identifiers as URNs     3/1997


7. References


[ISO1] NISO/ANSI/ISO 2108:1992 Information and documentation 

       -- International standard book number (ISBN)

[ISO2] ISO 3297:1986 Documentation -- International standard 

       serial numbering (ISSN)

[ISO3] ISO/DIS 3297 Information and documentation -- 

       International standard serial numbering (ISSN) 

       (Revision of ISO 3297:1986)

[Moats] R. Moats, "URN Syntax" draft-ietf-urn-syntax-

       04.text. March 1997

[NISO 1] NISO/ANSI Z39.9-1992 International standard serial

       numbering (ISSN)

[NISO 2] NISO/ANSI Z39.56-1997 Serial Item and Contribution

       Identifier

[Sollins & Masinter] K. Sollins and L. Masinter, "Functional

       Requirements for Uniform Resource Names", RFC 1737 

       December 1994.


8. Author's Addresses


Clifford Lynch

University of California Office of the President

300 Lakeside Drive, 8th floor

Oakland CA 94612-3550

clifford.lynch@ucop.edu


Cecilia Preston

Preston & Lynch

PO Box 8310

Emeryville, CA 94662

cecilia@well.com


Ron Daniel Jr.

Advanced Computing Lab, MS B287

Los Alamos National Laboratory

Los Alamos, NM, 87545

voice: +1 505 665 0597

fax: +1 505 665 4939

http://www.acl.lanl.gov/~rdaniel





                                                   [Page 10]


</bigger></bigger></fontfamily>