[urn] A way forward for rfc2141bis and rfc3406bis -- was: Finalizing items ...

Alfred Hönes <ah@TR-Sys.de> Thu, 05 July 2012 09:27 UTC

Return-Path: <A.Hoenes@TR-Sys.de>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3EDFC21F85B8 for <urn@ietfa.amsl.com>; Thu, 5 Jul 2012 02:27:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -98.298
X-Spam-Level:
X-Spam-Status: No, score=-98.298 tagged_above=-999 required=5 tests=[AWL=-0.149, BAYES_00=-2.599, CHARSET_FARAWAY_HEADER=3.2, HELO_EQ_DE=0.35, J_CHICKENPOX_75=0.6, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7d1kp3VD5fS7 for <urn@ietfa.amsl.com>; Thu, 5 Jul 2012 02:27:38 -0700 (PDT)
Received: from TR-Sys.de (gateway.tr-sys.de [213.178.172.147]) by ietfa.amsl.com (Postfix) with ESMTP id 8A1C921F8440 for <urn@ietf.org>; Thu, 5 Jul 2012 02:27:36 -0700 (PDT)
Received: from ZEUS.TR-Sys.de by w. with ESMTP ($Revision: 1.37.109.26 $/16.3.2) id AA173260369; Thu, 5 Jul 2012 11:26:09 +0200
Received: (from ah@localhost) by z.TR-Sys.de (8.9.3 (PHNE_25183)/8.7.3) id LAA08015; Thu, 5 Jul 2012 11:26:08 +0200 (MESZ)
From: Alfred =?hp-roman8?B?SM5uZXM=?= <ah@TR-Sys.de>
Message-Id: <201207050926.LAA08015@TR-Sys.de>
To: urn@ietf.org
Date: Thu, 5 Jul 2012 11:26:08 +0200 (MESZ)
X-Mailer: ELM [$Revision: 1.17.214.3 $]
Mime-Version: 1.0
Content-Type: text/plain; charset=hp-roman8
Content-Transfer-Encoding: 7bit
Subject: [urn] A way forward for rfc2141bis and rfc3406bis -- was: Finalizing items ...
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2012 09:27:39 -0000

URN folks,

thanks to all for reviving the discussion on the rfx2141bis and
rfc3406bis I-Ds.  As the editor of both drafts, I try to sum up
below and provide a perspective for a way forward; I'll respond
individually in more detail ASAP (see endnote).


** general **

Unfortunately, stakeholders of URN Namespaces for various reasons
seem to feel discouraged to participate in the on-list discussion,
which now has been majorized by few long-time "IETF professional"
contributors.  Part of the frustration I observe also seems to be
based on the lack of constructive proposals on the list so far,
as replacement solutions for the options being voted against.

We should not forget that the prime target audience of these URN
core documents is the rather divers family of (present and)
prospective stakeholders of URN Namespaces, which frequently
have not been in contact with the IETF, but possibly with other
persistent ID schemes, _or_ which are working within the IETF,
but with a focus on a particular technology and mostly unaware
of relevant work for URNs.
Hence, the goal of the "core URN documents" should be to give
concise guidance to _that_ audience, in order to help avoid to
have to explain the same context again and again individually
(as I have done in the past couple of years).
This IMO needs to include a bit of a "marketing" effort for the
URN framework against other, concurring identifier systems.
Recently, being able to point prospective applicants of
new URN Namespaces (from inside and outside of the IETF) to
the material in the present Introductions of the rfc2141bis
and rfc3406bis drafts has been very helpful in giving guidance
on technical/document/historical context; such material is
sought by prospective stakeholders in the core documents.


** towards a way forward **

We face several general kinds of problems to deal with.
One stems from the chartered restriction to not revise the
"strategical" RFCs laying the foundation of URNs and to presently
obstain from work on services/methods and details of URN resolution
and URN services.  There seems to be consensus that some parts of
these RFCs are outdated by more than a decade of experience with
URNs.  Since we aim our work towards bringing our documents
forward on the Standards Track, we cannot make Normative
references to past, Informational RFCs.  So, as elaborated upon
in the URNbis chartering discussion, we need to incorporate
selected text from, e.g. RFC 1737, verbatim in order to remind
the readers (including prospective stakeholders of URN Namespaces)
of what we now deem still particularly valid and important for URNs.
Likewise, experience shows that we need to provide a more precise
framework for the establishment of URN services for URN Namespaces,
in order to further a uniform style -- to the benefit of generic
URN handling applications.

Unlike for other URIs, URNs in general are dedicated to be media-
and technology-independent, as almost necessitated by the target
of long-term, global scope, uniqueness, and persistence (RFC 1737,
Section 2).
Since there are various services applicable to URNs, resolution
of a URN does not have the same media orientation properties like
it is common in a HTTP/HTML context. The objects/resources named
by URNs might be structured, complex, and inter-related with the
details perhaps evolving over time, whereas the abstract object
and its naming (as done by the assignement of {NID}:{NSS}) needs
to be stable.

Our effort has been driven by a major class of mass URN "customers",
the bibliographic community.  That community has identified the
urgent need to identify in a uniform way, in the URN resolution
process, object/resource components and/or related resources.
The description in the first paragraph of Section 3.5 in RFC 3986
has lead to a URN service/resolution implementation attempt based
on the usage of fragment identifiers, and that has been reflected
in the rfc2141bis draft since its beginning as an individual I-D.

In the meantime, it has become clear that the subsequent text
in Section 3.5 of RFC 3986 is incompatible with the goals, since
it calls for URI users to strip the fragment identifier component
before forwarding a URI reference for resolution, and to apply
the fragment identifier, in a media-type dependent manner, to the
returned content.  In the bibliographic context, components might
be archived in different media items over time to maintain their
accessability, and they might be subject to diverse distribution
restrictions; so in general, it will be impossible or impractical
to return an all-encompassing response and allow the client to
select the required part.  An additional restriction of the use
of fragment identifiers is that, in practice, media types and/or
common browsers do not support to "pick a component" from the
returned resource, but represent the whole resource, pointing to
a particular spot therein, such maintain the user perception of
a "fragment identifier" essentially being used as a pointer to
a particular point in the returned media, not a particular part
of the resource.  The IETF has recently put emphasis on this
particular, strict media-type dependence of the fragment part
of URIs, and we need to accomodate that and established practice
in browsers.

In order to avoid recurrence of this issue, explaining text on
fragment use with URNs IMO needs to be present in rfc2141bis,
_and_ we need to provide a uniform working scheme for the
identified requirements.


** the proposal **

Study of RFCs and off-list conversations with folks from the
bibliographic community has lead to a model how these goals could
be achieved by a common-style usage of the <query> URI part,
and I want to present this to the WG as a way forward for
discussion before going to work out the details in the next
version of the rfc2141bis and rfc3406bis I-Ds.

Let me explain the idea with a very hypothetical (intentionally
invalid) example:

Say a book has been assigned the ISBN (ISBN-13) 987-65-4321-678-9.
Thus, per the rfc3187bis I-D, it gets assigned the URN,
            urn:isbn:987-65-4321-678-9

A resolution service might be able to provide the bibliographic
record of the book and point to reproductions of selected parts
of it, say
    - an image of the front page (cover page),
    - a text version of the table of contents,
    - some rich text copy (e.g. HTML or PDF) of the foreword,
    - the list of references included in the book
      (e.g. in the form of a set of shortened bibligraphic records),
    all of the above available for free, without restrictions to anyone;
    and
    - the Introduction section of the book (in PDF)
    available to registered (authenticated and authorized) users
    of a specific community only.

Then, specific URI references to the above URN can direct its consumer
to steer the resolution process, using the fragment part of the URI
reference:

    urn:isbn:987-65-4321-678-9
      returns the metadata for the book (default);
    urn:isbn:987-65-4321-678-9?s=I2R&c=toc
      returns the table of contents;
    urn:isbn:987-65-4321-678-9?s=I2L&c=foreword
      returns a URL for the foreword of the book;
    urn:isbn:987-65-4321-678-9?s=I2Ns&c=reflist
      returns a URI list (text/uri-list per RFC 2483)
      with the URNs of the references included in the book;
    urn:isbn:987-65-4321-678-9?s=I2L&c=sec.1
      returns a URL pointing to the Introduction (Section 1)
      of the book, which can only be resolved by authorized users.

This solution, in a nutshell, would consist of the following elements
for rfc2141bis and rfc3406bis:

o  rfc2141bis specifies
   - the forms-like syntax of the <query> component in URN references,
     as a sequence of  keyword=value  items separated by "&" chars;
     I suggest that for simplicity both <keyword> and <value> should
     be simple fixed tokens (or follow simple patterns), i.e. kind of
     enumerated value protocol elements, and hence not subject to
     internationalization
     (<keyword>s are case-insensitive, in the spirit of RFC 1737,
     <value>s should preferably be case-insensitive as well, but
     namespace-specific considerations might dictate allowance for
     case-sensitivity);
   - the handling rules: single instance of items with a particular
     <keyword> only, semantics independent of order of the items,
     items with unknown (or falsely repeated) <keyword> are to be
     ignored by the resolution service, "sensical" fallback in case
     of unknown / unsupported / not applicable <value> needed
     (these rules will support easy introduction and future extension
     of the repertoire supported by URN services);
   - a new IANA registry of "URN Resolution Query Tokens" with a
     sub-registry for <keyword>s to be used with URI references
     to URNs -- either for general use or specific to particular
     URN Namespaces --, which will be initialized with two entries
     explained below;
   - the <keyword> "s" (Service) to indicate the label of the desired
     URN resolution service;
   - the <keyword> "c" (component) to indicate the desire to obtain
     information about a particular component of the object/resource
     designated by the base URN;
   - another sub-registry of the above new IANA registry for
     <value>s used for the "s" keyword, (i.e. the "service labels"),
     which will be provisionally populated by the service identifiers
     from RFC 2483 -- leaving details to a future rfc2483bis;
   - that supported "c=" values need to be specified per URN namespace.

   Further, rfc2141bis will indicate that future URN Namespace
   registration documents (as per rfc3406bis) need to specify the
   support of the above <query> syntax by its resolution service(s),
   supported/applicable services, the default service provided,
   and the usage of "c=" (if applicable) and any other potential
   keywords for that URN Namespace and supported service.

   Explanatory material related to the issues (described above) with
   the use of <fragment> identifiers as in some recent prototype URN
   service implementations will stay in Appendices of rfc2141bis;
   this includes the mention of the choices URN namespace designers
   have for support of hierarchical (and cross-linked) resources:
   - include component identifier in registered identifier,
     making it a (perhaps distinguishable) part of the NSS;
   - support/use <query> with "c=", so the NSS registry for the
     namespace doesn't have to deal with the component information
     (which will be added value by the resolution services);
   - use <fragment> (if media types returned for particular NID
     are long-term stable and allow to support that).

   The proper use of <fragment> will be emphasized in the main body
   of rfc2141bis, with pointers to other specs, including the
   work-in-progress RFC 4288bis from APPSAWG.

o  rfc3406bis specifies the details for the above scheme expected to
   be specified in registration documents, including new entries in
   the URN Namespace registration template for supported services
   (per the "s" value IANA registry) and the usage and rules for
   "c=" (if applicable) and any other <query> keywords, including
   possible IANA registration of new keywords.

o  The definition of new service labels, and an update to the
   existing definitions is left to future work on a rfc2483bis
   document.  The inofficial rfc2482bis pre-draft circulated
   can be stripped of the definition of the URN service label
   IANA registry (then done in rfc2141bis) and focus on updates
   of service descriptions and the new services that have been
   identified in practice as being needed.


Please discuss this constructive proposal for a way forward  --
preferably by on-list comments.

Since I'll be unable to go online for the rest of July, I'll
evaluate the list discussion and comments sent in private
communications subsequently ASAP, and then provide feedback and
update the drafts accordingly; so please stay patient.


Best regards,
  Alfred.