Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal

Juha Hakala <juha.hakala@helsinki.fi> Fri, 06 July 2012 12:31 UTC

Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E064921F8707 for <urn@ietfa.amsl.com>; Fri, 6 Jul 2012 05:31:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.747
X-Spam-Level:
X-Spam-Status: No, score=-4.747 tagged_above=-999 required=5 tests=[AWL=-0.948, BAYES_00=-2.599, J_BACKHAIR_31=1, J_CHICKENPOX_34=0.6, J_CHICKENPOX_35=0.6, J_CHICKENPOX_75=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ti45oEJwm5Fk for <urn@ietfa.amsl.com>; Fri, 6 Jul 2012 05:31:26 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id BCE7A21F86F8 for <urn@ietf.org>; Fri, 6 Jul 2012 05:31:25 -0700 (PDT)
Received: from [128.214.91.90] (kkkl25.lib.helsinki.fi [128.214.91.90]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id q66CVdTF027990 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NOT) for <urn@ietf.org>; Fri, 6 Jul 2012 15:31:40 +0300
Message-ID: <4FF6DAAB.8090800@helsinki.fi>
Date: Fri, 06 Jul 2012 15:31:39 +0300
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.5) Gecko/20120605 Thunderbird/10.0.5
MIME-Version: 1.0
To: urn@ietf.org
References: <201207050926.LAA08015@TR-Sys.de>
In-Reply-To: <201207050926.LAA08015@TR-Sys.de>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Subject: Re: [urn] A way forward for rfc2141bis and rfc3406bis -- comments to way forward & the proposal
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2012 12:31:28 -0000

Hello,

This is the second part of my comments.

On 5.7.2012 12:26, Alfred � wrote:
> URN folks,
>
> thanks to all for reviving the discussion on the rfx2141bis and
> rfc3406bis I-Ds.  As the editor of both drafts, I try to sum up
> below and provide a perspective for a way forward; I'll respond
> individually in more detail ASAP (see endnote).
>
> ** towards a way forward **
>
> We face several general kinds of problems to deal with.

The trick is to select and solve only those problems that the charter 
requires us to solve, and leave the rest for later WGs or individual 
contributions. Having said this, I need to add that there is not that 
much that we can definitely leave for later in this message.

> One stems from the chartered restriction to not revise the
> "strategical" RFCs laying the foundation of URNs and to presently
> obstain from work on services/methods and details of URN resolution
> and URN services.  There seems to be consensus that some parts of
> these RFCs are outdated by more than a decade of experience with
> URNs.

True. There are for instance many resolution services that are 
definitely relevant but which are not defined in RFC 2483.

But we do not need to solve this problem now, and the charter does not 
have a mandate for that. I have been working on an individual 
contribution which will specify a novel way of establishing URN 
resolution services, but that work can / should wait until 2141bis and 
3406bis have been completed.

Since we aim our work towards bringing our documents
> forward on the Standards Track, we cannot make Normative
> references to past, Informational RFCs.  So, as elaborated upon
> in the URNbis chartering discussion, we need to incorporate
> selected text from, e.g. RFC 1737, verbatim in order to remind
> the readers (including prospective stakeholders of URN Namespaces)
> of what we now deem still particularly valid and important for URNs.

OK.

> Likewise, experience shows that we need to provide a more precise
> framework for the establishment of URN services for URN Namespaces,
> in order to further a uniform style -- to the benefit of generic
> URN handling applications.

Creating the technical framework for establishment of URN resolution 
services has been one of the weaknesses of the URN effort. There is no 
widely used open source software package like the one that has existed 
for many years for the Handle system. And locally developed resolvers 
usually provide only the basic URN resolution services, such as mapping 
the URN to (single) URL.

> Unlike for other URIs, URNs in general are dedicated to be media-
> and technology-independent, as almost necessitated by the target
> of long-term, global scope, uniqueness, and persistence (RFC 1737,
> Section 2).

I agree on technology independence, but media independence is a more 
complex issue. Many traditional identifier systems are media dependent. 
For instance, each manifestation of a book (hard back, paperback, PDF) 
must get its own ISBN. So any URN:ISBN will be forever tied to a single 
manifestation of the book. When the book in PDF is migrated to a more 
modern format, that updated manifestation shall receive a new ISBN. 
These two ISBNs / URN:ISBNs will be interlinked in metadata so the users 
can travel forward and backward in time, depending on their preferences. 
The national libraries are of course storing all the versions, so as to 
protect ourselves from mistakes made during migrations.

There are also identifiers which relate to immaterial works and are 
therefore media dependent. ISTC (International Standard Text Code) is an 
example of this. URN:ISTC therefore fulfills the spirit of RFC 1737 
fully. Those URNs will link to the metadata record which contains links 
to all the manifestations.

However, the URN system as a whole is only functional when it combines 
the work level and manifestation level identifiers.

> Since there are various services applicable to URNs, resolution
> of a URN does not have the same media orientation properties like
> it is common in a HTTP/HTML context. The objects/resources named
> by URNs might be structured, complex, and inter-related with the
> details perhaps evolving over time, whereas the abstract object
> and its naming (as done by the assignement of {NID}:{NSS}) needs
> to be stable.

Based on what I said above, I cannot agree with this. In all those 
namespaces which belong to manifestation identifiers such as ISBN, 
resolution is very much media oriented but however stable. National 
libraries and archives will have users who, for the sake of 
authenticity, even after hundreds of years, still want the original 
version of the digital document. Naming of these documents, given the 
assignment policy of ISBN and other systems, is as stable as it gets.

> Our effort has been driven by a major class of mass URN "customers",
> the bibliographic community.  That community has identified the
> urgent need to identify in a uniform way, in the URN resolution
> process, object/resource components and/or related resources.
> The description in the first paragraph of Section 3.5 in RFC 3986
> has lead to a URN service/resolution implementation attempt based
> on the usage of fragment identifiers, and that has been reflected
> in the rfc2141bis draft since its beginning as an individual I-D.

In fact there are two different approaches. Each namespace may have its 
own policy for identification of logical fragments which are not 
fragments in the URI syntax sense. For instance, both the entire book 
and its each chapter may receive ISBNs. When these chapters are 
published as individual files, resolving those URN:ISBNs to URLs is 
piece of cake.

When the entire book is published as a single structured file, there 
will be one ISBN for the book. Then it is possible to use fragments to 
specify the beginning of each chapter. This is based on physical 
fragments in the URI syntax sense of the word. From the ISBN point of 
view, no new ISBNs have been assigned. From URN point of view, there is 
only single URN since fragment is not part of the NSS.

In principle the chapters may receive ISBNs also in the second case. 
But using them as URN:ISBNs would not make much sense, since all these 
URN:ISBNs would resolve to the same resource.
>
> In the meantime, it has become clear that the subsequent text
> in Section 3.5 of RFC 3986 is incompatible with the goals, since
> it calls for URI users to strip the fragment identifier component
> before forwarding a URI reference for resolution, and to apply
> the fragment identifier, in a media-type dependent manner, to the
> returned content.

As I see it, section 3.5, or the way HTTP deals with the fragments, is 
not incompatible with the goals of the bibliographic community. We would 
not use fragments to actually identify anything, but to help the user to 
get into a certain location within the identified resource. This would 
be very helpful for e.g. citing purposes.


In the bibliographic context, components might
> be archived in different media items over time to maintain their
> accessability, and they might be subject to diverse distribution
> restrictions; so in general, it will be impossible or impractical
> to return an all-encompassing response and allow the client to
> select the required part.

This applies to logical fragments, but it is not our intention to apply 
URI fragments to them. URI fragments will be applied to physical 
fragments of documents, to which they are applicable.

An additional restriction of the use
> of fragment identifiers is that, in practice, media types and/or
> common browsers do not support to "pick a component" from the
> returned resource, but represent the whole resource, pointing to
> a particular spot therein, such maintain the user perception of
> a "fragment identifier" essentially being used as a pointer to
> a particular point in the returned media, not a particular part
> of the resource.

Yes, this is why fragment is not part of the NSS, and why, if you have a 
base urn:isbn and you attach 10 different fragments to it, you will 
still have just one URN, but you have access to 10 different places 
within that particular manifestation of a resource.

And this is also why you should never use fragment in those URN 
namespaces where the identifier is not tied to particular manifestation 
of a resource or the identifier does not identify single documents. This 
means no fragments for ISCI (International Standard Collection 
Identifier) or ISSN (identifier for serials).

The IETF has recently put emphasis on this
> particular, strict media-type dependence of the fragment part
> of URIs, and we need to accomodate that and established practice
> in browsers.

The relevant text portions in rfc2141bis and elsewhere need to be 
clarified. The most important thing is to say that the fragment is not 
part of the NSS, and draw conclusions from that.
>
> In order to avoid recurrence of this issue, explaining text on
> fragment use with URNs IMO needs to be present in rfc2141bis,
> _and_ we need to provide a uniform working scheme for the
> identified requirements.
>
>
> ** the proposal **
>
> Study of RFCs and off-list conversations with folks from the
> bibliographic community has lead to a model how these goals could
> be achieved by a common-style usage of the<query>  URI part,
> and I want to present this to the WG as a way forward for
> discussion before going to work out the details in the next
> version of the rfc2141bis and rfc3406bis I-Ds.

Sorry - I am unwilling to put any fragment related data into query.

> Let me explain the idea with a very hypothetical (intentionally
> invalid) example:
>
> Say a book has been assigned the ISBN (ISBN-13) 987-65-4321-678-9.
> Thus, per the rfc3187bis I-D, it gets assigned the URN,
>              urn:isbn:987-65-4321-678-9

This ISBN would belong to a particular manifestation of a book, say a 
PDF version, in its entirety.
>
> A resolution service might be able to provide the bibliographic
> record of the book and point to reproductions of selected parts
> of it, say
>      - an image of the front page (cover page),
>      - a text version of the table of contents,
>      - some rich text copy (e.g. HTML or PDF) of the foreword,
>      - the list of references included in the book
>        (e.g. in the form of a set of shortened bibligraphic records),
>      all of the above available for free, without restrictions to anyone;
>      and
>      - the Introduction section of the book (in PDF)
>      available to registered (authenticated and authorized) users
>      of a specific community only.

I am afraid that the URN resolution service would not and will not be 
able do all of this. For the time being they are simple tools with only 
a limited supporting role.

Resolution service may help the user to retrieve a bibliographic record 
describing the book, and those records nowadays often provide a link to 
the image of the book. Table of contents may be part of the 
bibliographic record, and the record may also contain links to Amazon 
and elsewhere where excerpts of the book are stored.

Adjusting the resolution services and bibliographic information systems 
in such a way that the user could request various data elements one at 
the time may be technically possible, but libraries probably prefer to 
supply this information from bibliographic systems, and not to extend 
radically the role of the resolution services.
>
> Then, specific URI references to the above URN can direct its consumer
> to steer the resolution process, using the fragment part of the URI
> reference:
>
>      urn:isbn:987-65-4321-678-9
>        returns the metadata for the book (default);

While no default behaviour has been specified, this would usually return 
the entire book. If the user wants just information about the book, 
asking for  metadata (and you can have descriptive, administrative, and 
structural metadata) is a good start.

>      urn:isbn:987-65-4321-678-9?s=I2R&c=toc
>        returns the table of contents;
>      urn:isbn:987-65-4321-678-9?s=I2L&c=foreword
>        returns a URL for the foreword of the book;

In some situations, the same effect can be achieved by

 >      urn:isbn:987-65-4321-678-9#toc
 >        takes the user to the beginning of the table of contents;
 >      urn:isbn:987-65-4321-678-9#foreword
 >        takes the user to the beginning of the the foreword.

Suitable structural elements could be harvested from the source 
document, and the resolver could be made aware of them. If the resource 
is not structured in the URI syntax sense, or the wanted structural 
elements are missing, metadata in the library system may contain the toc 
and reveal the location of the foreword. Alas, maintaining these URLs 
(pointing to e.g. the publisher's web site) will be difficult in the 
long term.

>      urn:isbn:987-65-4321-678-9?s=I2Ns&c=reflist
>        returns a URI list (text/uri-list per RFC 2483)
>        with the URNs of the references included in the book;
>      urn:isbn:987-65-4321-678-9?s=I2L&c=sec.1
>        returns a URL pointing to the Introduction (Section 1)
>        of the book, which can only be resolved by authorized users.

Libraries use another mechanism (OpenURL) for dynamic linking which 
checks whether the users are authorized to use the resource.

> This solution, in a nutshell, would consist of the following elements
> for rfc2141bis and rfc3406bis:

This is a rather large nutshell :-).
>
> o  rfc2141bis specifies
>     - the forms-like syntax of the<query>  component in URN references,
>       as a sequence of  keyword=value  items separated by "&" chars;
>       I suggest that for simplicity both<keyword>  and<value>  should
>       be simple fixed tokens (or follow simple patterns), i.e. kind of
>       enumerated value protocol elements, and hence not subject to
>       internationalization
>       (<keyword>s are case-insensitive, in the spirit of RFC 1737,
>       <value>s should preferably be case-insensitive as well, but
>       namespace-specific considerations might dictate allowance for
>       case-sensitivity);

Some of the URN resolution services may become complex. For instance, 
when a user asks for metadata about the resource, this metadata may come 
in many different formats, and the user may need to specify her 
preference. So the mechanism suggested here will be relevant in any 
case, with or without fragment functionality.

>     - the handling rules: single instance of items with a particular
>       <keyword>  only, semantics independent of order of the items,
>       items with unknown (or falsely repeated)<keyword>  are to be
>       ignored by the resolution service, "sensical" fallback in case
>       of unknown / unsupported / not applicable<value>  needed
>       (these rules will support easy introduction and future extension
>       of the repertoire supported by URN services);

This information will also be relevant for people who develop resolver 
applications.

>     - a new IANA registry of "URN Resolution Query Tokens" with a
>       sub-registry for<keyword>s to be used with URI references
>       to URNs -- either for general use or specific to particular
>       URN Namespaces --, which will be initialized with two entries
>       explained below;
>     - the<keyword>  "s" (Service) to indicate the label of the desired
>       URN resolution service;
>     - the<keyword>  "c" (component) to indicate the desire to obtain
>       information about a particular component of the object/resource
>       designated by the base URN;
>     - another sub-registry of the above new IANA registry for
>       <value>s used for the "s" keyword, (i.e. the "service labels"),
>       which will be provisionally populated by the service identifiers
>       from RFC 2483 -- leaving details to a future rfc2483bis;

We need this registry as well. At the moment services are carved in 
stone in RFC 2483, but during the 10+ years since it was written 
technology has changed, and there are many more services needed now.

>     - that supported "c=" values need to be specified per URN namespace.

I am not sure if it is a good idea to do this, given that the list of 
services and service components will grow. But there must be a way with 
which a user can check from the resolver which services and service 
components it supports.

>     Further, rfc2141bis will indicate that future URN Namespace
>     registration documents (as per rfc3406bis) need to specify the
>     support of the above<query>  syntax by its resolution service(s),
>     supported/applicable services, the default service provided,
>     and the usage of "c=" (if applicable) and any other potential
>     keywords for that URN Namespace and supported service.

Namespace registrations should make it clear if fragment usage is 
allowed. This is based on what is being identified. If the target is a 
single manifestation of a resource, fine. If the identified object can 
be anything, then common sense can be used. If the object can never be 
something to which URI fragments in the RFC 3986 sense can be applied, 
forget it.

The registration can also list other services that may be supported. I 
don't know if we can say that some services must be supported.

>     Explanatory material related to the issues (described above) with
>     the use of<fragment>  identifiers as in some recent prototype URN
>     service implementations will stay in Appendices of rfc2141bis;
>     this includes the mention of the choices URN namespace designers
>     have for support of hierarchical (and cross-linked) resources:
>     - include component identifier in registered identifier,
>       making it a (perhaps distinguishable) part of the NSS;

This will be a common approach for logical fragments. In many namespaces 
the component identifiers will be identical to identifiers assigned to 
whole documents, and - of course - always part of the NSS.

>     - support/use<query>  with "c=", so the NSS registry for the
>       namespace doesn't have to deal with the component information
>       (which will be added value by the resolution services);

I am not sure - yet - how useful this might be.

>     - use<fragment>  (if media types returned for particular NID
>       are long-term stable and allow to support that).

There will be namespaces and documents to which this functionality is 
very useful.

>     The proper use of<fragment>  will be emphasized in the main body
>     of rfc2141bis, with pointers to other specs, including the
>     work-in-progress RFC 4288bis from APPSAWG.

I'll draft something to this effect.

> o  rfc3406bis specifies the details for the above scheme expected to
>     be specified in registration documents, including new entries in
>     the URN Namespace registration template for supported services
>     (per the "s" value IANA registry) and the usage and rules for
>     "c=" (if applicable) and any other<query>  keywords, including
>     possible IANA registration of new keywords.

OK.
>
> o  The definition of new service labels, and an update to the
>     existing definitions is left to future work on a rfc2483bis
>     document.  The inofficial rfc2482bis pre-draft circulated
>     can be stripped of the definition of the URN service label
>     IANA registry (then done in rfc2141bis) and focus on updates
>     of service descriptions and the new services that have been
>     identified in practice as being needed.

Once we have agreed that this is the way to go, I will modify the 
rfc2483bis accordingly.

Best regards,

Juha
>
>
> Please discuss this constructive proposal for a way forward  --
> preferably by on-list comments.
>
> Since I'll be unable to go online for the rest of July, I'll
> evaluate the list discussion and comments sent in private
> communications subsequently ASAP, and then provide feedback and
> update the drafts accordingly; so please stay patient.
>
>
> Best regards,
>    Alfred.
>
> _______________________________________________
> urn mailing list
> urn@ietf.org
> https://www.ietf.org/mailman/listinfo/urn

-- 

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email juha.hakala@helsinki.fi, tel +358 50 382 7678