Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal

"Svensson, Lars" <L.Svensson@dnb.de> Fri, 13 July 2012 15:43 UTC

From: "Svensson, Lars" <L.Svensson@dnb.de>
To: Juha Hakala <juha.hakala@helsinki.fi>, "urn@ietf.org" <urn@ietf.org>
Thread-Topic: Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal
Thread-Index: AQHNW3NQHoiszHIz+k+LfQ+cidsNyJcnVkKg
Date: Fri, 13 Jul 2012 15:44:01 +0000
Message-ID: <24637769D123E644A105A0AF0E1F92EF2469815D@dnbf-ex1.AD.DDB.DE>
References: <201207050926.LAA08015@TR-Sys.de> <4FF6DAAB.8090800@helsinki.fi>
In-Reply-To: <4FF6DAAB.8090800@helsinki.fi>
Accept-Language: de-DE, en-US
Content-Language: de-DE
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Subject: Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal
Precedence: list

Juha, all,

On July 6, Juha wrote:

> This is the second part of my comments.

And of my comments, too. They are not intended to be off-putting, but simply an attempt to nail down problems so that we can move forward.
 
> On 5.7.2012 12:26, Alfred   wrote:

[...]
> > One stems from the chartered restriction to not revise the
> > "strategical" RFCs laying the foundation of URNs and to presently
> > obstain from work on services/methods and details of URN resolution
> > and URN services.  There seems to be consensus that some parts of
> > these RFCs are outdated by more than a decade of experience with
> URNs.
> 
> True. There are for instance many resolution services that are
> definitely relevant but which are not defined in RFC 2483.

Yes, and Alfred gives us a very nice set of requirements for RFC 2483bis below.

[...]
> > Since we aim our work towards bringing our documents
> > forward on the Standards Track, we cannot make Normative references
> to
> > past, Informational RFCs.  So, as elaborated upon in the URNbis
> > chartering discussion, we need to incorporate selected text from,
> e.g.
> > RFC 1737, verbatim in order to remind the readers (including
> > prospective stakeholders of URN Namespaces) of what we now deem still
> > particularly valid and important for URNs.
> 
> OK.

Agreed.
 
> > Likewise, experience shows that we need to provide a more precise
> > framework for the establishment of URN services for URN Namespaces,
> in
> > order to further a uniform style -- to the benefit of generic URN
> > handling applications.
> 
> Creating the technical framework for establishment of URN resolution
> services has been one of the weaknesses of the URN effort. There is no
> widely used open source software package like the one that has existed
> for many years for the Handle system. And locally developed resolvers
> usually provide only the basic URN resolution services, such as mapping
> the URN to (single) URL.

I believe that most institutions interested in this wait for the RFC 2483bis so that they know which services to provide. It's probably a chicken/egg-problem, everyone waiting for the other one to make the first move.

> > Unlike for other URIs, URNs in general are dedicated to be media- and
> > technology-independent, as almost necessitated by the target of
> > long-term, global scope, uniqueness, and persistence (RFC 1737,
> > Section 2).
> 
> I agree on technology independence, but media independence is a more
> complex issue. Many traditional identifier systems are media dependent.
> For instance, each manifestation of a book (hard back, paperback, PDF)
> must get its own ISBN. So any URN:ISBN will be forever tied to a single
> manifestation of the book. When the book in PDF is migrated to a more
> modern format, that updated manifestation shall receive a new ISBN.
> These two ISBNs / URN:ISBNs will be interlinked in metadata so the
> users can travel forward and backward in time, depending on their
> preferences.

I commented on this in my previous mail: In order to go forward we should ignore ISBN for the time being.

> The national libraries are of course storing all the versions, so as to
> protect ourselves from mistakes made during migrations.
> 
> There are also identifiers which relate to immaterial works and are
> therefore media dependent. ISTC (International Standard Text Code) is
> an example of this. URN:ISTC therefore fulfills the spirit of RFC 1737
> fully. Those URNs will link to the metadata record which contains links
> to all the manifestations.
> 
> However, the URN system as a whole is only functional when it combines
> the work level and manifestation level identifiers.

As an aside: ISTC works on the frbr:Expression level (different language versions receive different ISTCs) cf. ISTC Manual [ISTC Manual] Sec 7.1.

> > Since there are various services applicable to URNs, resolution of a
> > URN does not have the same media orientation properties like it is
> > common in a HTTP/HTML context. The objects/resources named by URNs
> > might be structured, complex, and inter-related with the details
> > perhaps evolving over time, whereas the abstract object and its
> naming
> > (as done by the assignement of {NID}:{NSS}) needs to be stable.
> 
> Based on what I said above, I cannot agree with this. In all those
> namespaces which belong to manifestation identifiers such as ISBN,
> resolution is very much media oriented but however stable. National
> libraries and archives will have users who, for the sake of
> authenticity, even after hundreds of years, still want the original
> version of the digital document. Naming of these documents, given the
> assignment policy of ISBN and other systems, is as stable as it gets.

This is only true if we can rely on everyone playing according to the rules. Since we can't, we should not rely on an ISBN being coupled to an object of a specific media type.

[...] <Snipped discussion about urn:isbn>

> > In the meantime, it has become clear that the subsequent text in
> > Section 3.5 of RFC 3986 is incompatible with the goals, since it
> calls
> > for URI users to strip the fragment identifier component before
> > forwarding a URI reference for resolution, and to apply the fragment
> > identifier, in a media-type dependent manner, to the returned
> content.
> 
> As I see it, section 3.5, or the way HTTP deals with the fragments, is
> not incompatible with the goals of the bibliographic community. We
> would not use fragments to actually identify anything, but to help the
> user to get into a certain location within the identified resource.
> This would be very helpful for e.g. citing purposes.

I agree that the possibility to use a persistent identifier to point to a particular location in a resource will be _extremely_ helpful to researchers and other people who want to cite the sources they use. However, if we do not use the fragment to identify anything, we should find another name than "fragment identifier". On the other hand, if we want to help the user to get into a certain location within the resource, we need a something so that the system can find that location. That something is an identifier (identifying that location).
 
> In the bibliographic context, components might
> > be archived in different media items over time to maintain their
> > accessability, and they might be subject to diverse distribution
> > restrictions; so in general, it will be impossible or impractical to
> > return an all-encompassing response and allow the client to select
> the
> > required part.
> 
> This applies to logical fragments, but it is not our intention to apply
> URI fragments to them. URI fragments will be applied to physical
> fragments of documents, to which they are applicable.

What is the difference between a "logical fragment", a "URI fragment", and a "physical fragment"? Which is the one referred to in RFC 3986 sec 3.5?

> An additional restriction of the use
> > of fragment identifiers is that, in practice, media types and/or
> > common browsers do not support to "pick a component" from the
> returned
> > resource, but represent the whole resource, pointing to a particular
> > spot therein, such maintain the user perception of a "fragment
> > identifier" essentially being used as a pointer to a particular point
> > in the returned media, not a particular part of the resource.
> 
> Yes, this is why fragment is not part of the NSS, and why, if you have
> a base urn:isbn and you attach 10 different fragments to it, you will
> still have just one URN, but you have access to 10 different places
> within that particular manifestation of a resource.
> 
> And this is also why you should never use fragment in those URN
> namespaces where the identifier is not tied to particular manifestation
> of a resource or the identifier does not identify single documents.
> This means no fragments for ISCI (International Standard Collection
> Identifier) or ISSN (identifier for serials).

Or ISBN, because no-one can ensure that there is a bijection between an ISBN and a particular manifestion.

> The IETF has recently put emphasis on this
> > particular, strict media-type dependence of the fragment part of
> URIs,
> > and we need to accomodate that and established practice in browsers.
> 
> The relevant text portions in rfc2141bis and elsewhere need to be
> clarified. The most important thing is to say that the fragment is not
> part of the NSS, and draw conclusions from that.

Yes.

> >
> > In order to avoid recurrence of this issue, explaining text on
> > fragment use with URNs IMO needs to be present in rfc2141bis, _and_
> we
> > need to provide a uniform working scheme for the identified
> > requirements.

Yes.

> > ** the proposal **
> >
> > Study of RFCs and off-list conversations with folks from the
> > bibliographic community has lead to a model how these goals could be
> > achieved by a common-style usage of the<query>  URI part, and I want
> > to present this to the WG as a way forward for discussion before
> going
> > to work out the details in the next version of the rfc2141bis and
> > rfc3406bis I-Ds.
> 
> Sorry - I am unwilling to put any fragment related data into query.

With "fragment" here, do you mean "logical fragment", "URI fragment", or "physical fragment"?

> > Let me explain the idea with a very hypothetical (intentionally
> > invalid) example:
> >
> > Say a book has been assigned the ISBN (ISBN-13) 987-65-4321-678-9.
> > Thus, per the rfc3187bis I-D, it gets assigned the URN,
> >              urn:isbn:987-65-4321-678-9
> 
> This ISBN would belong to a particular manifestation of a book, say a
> PDF version, in its entirety.

Or a printed book in your bookshelf. Or a package of books (complete works), or -- against the rules but done anyway -- a printed book and its e-book version.

> > A resolution service might be able to provide the bibliographic
> record
> > of the book and point to reproductions of selected parts of it, say
> >      - an image of the front page (cover page),
> >      - a text version of the table of contents,
> >      - some rich text copy (e.g. HTML or PDF) of the foreword,
> >      - the list of references included in the book
> >        (e.g. in the form of a set of shortened bibligraphic records),
> >      all of the above available for free, without restrictions to
> anyone;
> >      and
> >      - the Introduction section of the book (in PDF)
> >      available to registered (authenticated and authorized) users
> >      of a specific community only.
> 
> I am afraid that the URN resolution service would not and will not be
> able do all of this. For the time being they are simple tools with only
> a limited supporting role.

I think this is a very good set of requirements for resolution services.

I will snip the rest of the discussion on resolution services for urn:isbn.

> Resolution service may help the user to retrieve a bibliographic record
> describing the book, and those records nowadays often provide a link to
> the image of the book. Table of contents may be part of the
> bibliographic record, and the record may also contain links to Amazon
> and elsewhere where excerpts of the book are stored.

We could integrate those services into our library catalogues.

> Adjusting the resolution services and bibliographic information systems
> in such a way that the user could request various data elements one at
> the time may be technically possible, but libraries probably prefer to
> supply this information from bibliographic systems, and not to extend
> radically the role of the resolution services.

What if someone else decides to build his own library catalogue from the linked open data libraries supply?

[...]

> >      urn:isbn:987-65-4321-678-9?s=I2R&c=toc
> >        returns the table of contents;
> >      urn:isbn:987-65-4321-678-9?s=I2L&c=foreword
> >        returns a URL for the foreword of the book;
> 
> In some situations, the same effect can be achieved by

In which situations?

>  >      urn:isbn:987-65-4321-678-9#toc
>  >        takes the user to the beginning of the table of contents;
>  >      urn:isbn:987-65-4321-678-9#foreword
>  >        takes the user to the beginning of the the foreword.
> 
> Suitable structural elements could be harvested from the source
> document, and the resolver could be made aware of them. If the resource
> is not structured in the URI syntax sense, or the wanted structural
> elements are missing, metadata in the library system may contain the
> toc and reveal the location of the foreword. Alas, maintaining these
> URLs (pointing to e.g. the publisher's web site) will be difficult in
> the long term.
> 
> >      urn:isbn:987-65-4321-678-9?s=I2Ns&c=reflist
> >        returns a URI list (text/uri-list per RFC 2483)
> >        with the URNs of the references included in the book;
> >      urn:isbn:987-65-4321-678-9?s=I2L&c=sec.1
> >        returns a URL pointing to the Introduction (Section 1)
> >        of the book, which can only be resolved by authorized users.
> 
> Libraries use another mechanism (OpenURL) for dynamic linking which
> checks whether the users are authorized to use the resource.

That could be integrated into the resolver. OpenURL [OpenURL] is one way of doing that and not all libraries use it.

> > This solution, in a nutshell, would consist of the following elements
> > for rfc2141bis and rfc3406bis:


[...] 
Agree with the snipped text and Juha's comments.
 
> >     - that supported "c=" values need to be specified per URN
> namespace.
> 
> I am not sure if it is a good idea to do this, given that the list of
> services and service components will grow. But there must be a way with
> which a user can check from the resolver which services and service
> components it supports.

Would it work that the URN namespace specifies a (changing) document where the components are listed? Then we don't need to update the rfc every time something changes.

> >     Further, rfc2141bis will indicate that future URN Namespace
> >     registration documents (as per rfc3406bis) need to specify the
> >     support of the above<query>  syntax by its resolution service(s),
> >     supported/applicable services, the default service provided,
> >     and the usage of "c=" (if applicable) and any other potential
> >     keywords for that URN Namespace and supported service.
> 
> Namespace registrations should make it clear if fragment usage is
> allowed. This is based on what is being identified. If the target is a
> single manifestation of a resource, fine. If the identified object can
> be anything, then common sense can be used. If the object can never be
> something to which URI fragments in the RFC 3986 sense can be applied,
> forget it.

According to what I understood, fragments in the RFC 3986 sense can always be used. The question is if they make sense for all urn namespaces. The namespace registration should make that clear.

> The registration can also list other services that may be supported. I
> don't know if we can say that some services must be supported.

Some services probably should be mandatory.

> >     Explanatory material related to the issues (described above) with
> >     the use of<fragment>  identifiers as in some recent prototype URN
> >     service implementations will stay in Appendices of rfc2141bis;
> >     this includes the mention of the choices URN namespace designers
> >     have for support of hierarchical (and cross-linked) resources:
> >     - include component identifier in registered identifier,
> >       making it a (perhaps distinguishable) part of the NSS;
> 
> This will be a common approach for logical fragments. In many
> namespaces the component identifiers will be identical to identifiers
> assigned to whole documents, and - of course - always part of the NSS.

Again: what are logical fragments?
 
> >     - support/use<query>  with "c=", so the NSS registry for the
> >       namespace doesn't have to deal with the component information
> >       (which will be added value by the resolution services);
> 
> I am not sure - yet - how useful this might be.
> 
> >     - use<fragment>  (if media types returned for particular NID
> >       are long-term stable and allow to support that).
> 
> There will be namespaces and documents to which this functionality is
> very useful.
> 
> >     The proper use of<fragment>  will be emphasized in the main body
> >     of rfc2141bis, with pointers to other specs, including the
> >     work-in-progress RFC 4288bis from APPSAWG.
> 
> I'll draft something to this effect.
> 
> > o  rfc3406bis specifies the details for the above scheme expected to
> >     be specified in registration documents, including new entries in
> >     the URN Namespace registration template for supported services
> >     (per the "s" value IANA registry) and the usage and rules for
> >     "c=" (if applicable) and any other<query>  keywords, including
> >     possible IANA registration of new keywords.
> 
> OK.
> >
> > o  The definition of new service labels, and an update to the
> >     existing definitions is left to future work on a rfc2483bis
> >     document.  The inofficial rfc2482bis pre-draft circulated
> >     can be stripped of the definition of the URN service label
> >     IANA registry (then done in rfc2141bis) and focus on updates
> >     of service descriptions and the new services that have been
> >     identified in practice as being needed.
> 
> Once we have agreed that this is the way to go, I will modify the
> rfc2483bis accordingly.
> 
> > Please discuss this constructive proposal for a way forward  --
> > preferably by on-list comments.

Yes, a very helpful proposal!

All the best,

Lars

P. S. I shall be offline until August 8, so don't hold your breaths for answers from me...

[ISTC Manual] http://www.istc-international.org/html/multimedia/pdfs/ISTC_User_Manual_2010v1.2.pdf
[OpenURL] http://www.niso.org/apps/group_public/download.php/6640/The%20OpenURL%20Framework%20for%20Context-Sensitive%20Services.pdf 

***Lesen. Hören. Wissen. 100 Jahre Deutsche Nationalbibliothek***
***Reading. Listening. Understanding. A century of the German National Library***

-- 
Dr. Lars G. Svensson
Deutsche Nationalbibliothek / Informationstechnik
http://www.dnb.de/
l.svensson@dnb.de
http://www.dnb.de/100jahre

[urn] A way forward for rfc2141bis and rfc3406bis… Alfred Hönes
Re: [urn] A way forward for rfc2141bis and rfc340… Juha Hakala
Re: [urn] A way forward for rfc2141bis and rfc340… Juha Hakala
Re: [urn] A way forward for rfc2141bis and rfc340… Svensson, Lars
Re: [urn] A way forward for rfc2141bis and rfc340… Svensson, Lars
Re: [urn] A way forward for rfc2141bis and rfc340… Juha Hakala
Re: [urn] A way forward for rfc2141bis and rfc340… Peter Saint-Andre