Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

"Svensson, Lars" <L.Svensson@dnb.de> Mon, 05 May 2014 09:58 UTC

Return-Path: <L.Svensson@dnb.de>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 865BF1A02A1 for <urn@ietfa.amsl.com>; Mon, 5 May 2014 02:58:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.25
X-Spam-Level:
X-Spam-Status: No, score=-3.25 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_EQ_DE=0.35, J_CHICKENPOX_31=0.6, RCVD_IN_DNSWL_HI=-5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L0R9k7l88dXh for <urn@ietfa.amsl.com>; Mon, 5 May 2014 02:58:04 -0700 (PDT)
Received: from nordpol.dnb.de (nordpol.ddb.de [193.175.100.40]) by ietfa.amsl.com (Postfix) with ESMTP id A83B61A02A0 for <urn@ietf.org>; Mon, 5 May 2014 02:58:03 -0700 (PDT)
Received: from dnbf-ex1.AD.DDB.DE (unknown [10.69.63.245]) by nordpol.dnb.de (Postfix) with ESMTP id EE3B07F397; Mon, 5 May 2014 11:57:58 +0200 (CEST)
From: "Svensson, Lars" <L.Svensson@dnb.de>
To: Juha Hakala <juha.hakala@helsinki.fi>, John C Klensin <john-ietf@jck.com>, "jehakala@mappi.helsinki.fi" <jehakala@mappi.helsinki.fi>
Thread-Topic: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
Thread-Index: AQHPZkCr8vJIUWRXmE693mS+C8u215sxsG8A
Date: Mon, 05 May 2014 09:57:58 +0000
Message-ID: <24637769D123E644A105A0AF0E1F92EFA43FFFAC@dnbf-ex1.AD.DDB.DE>
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <534BED18.9090009@gmx.de> <3D39F1AA700A179F3C051DE2@JcK-HP8200.jck.com> <534D3410.50607@ninebynine.org> <54ecc96adba240159cf624c54c507136@BL2PR02MB307.namprd02.prod.outlook.com> <952E89C207E59D25CD5953D6@JCK-EEE10> <20140502180642.Horde.k922N8-cIl2au4mAP9neJA2@webmail.helsinki.fi> <86412DCF67470AFC510CD4F4@JcK-HP8200.jck.com> <5363F867.60503@helsinki.fi>
In-Reply-To: <5363F867.60503@helsinki.fi>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.69.12.173]
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/LXIjdu_4yJbF9F9aassJ7G8zwiQ
Cc: "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "urn@ietf.org" <urn@ietf.org>, Graham Klyne <GK@ninebynine.org>
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 05 May 2014 09:58:07 -0000

All,

This discussion has been very helpful to me since it has forced me to reflect on my own views on persistent identification, managed processes, resolution services and other features of urn:s (the lower-case rfc 2141 ones). I have tried to follow the discussion closely as I work for a memory organisation offering urn services, but I cannot deny that I often have had the impression that any attempt to find consensus would be futile and that I often felt that I had completely lost track of in what direction this WG is going. Many thanks to John and Juha for getting the discussion more focused on what features we expect from urn:s. This mail was triggered by the discussion from Friday, May 2nd, 2014 and I use it both to comment and to check if I have understood the arguments correctly.

On May 2nd, 9:56 PM, Juha Hakala wrote:

> Hello,
> 
> On 2.5.2014 20:16, John C Klensin wrote:
> > (apps-discuss list dropped)
> >
> > Juha,
> >
> > I found this (both the part quoted below and the rest of your
> > note) very helpful.  Thanks.  One question below...
> >
> > --On Friday, May 02, 2014 18:06 +0300 jehakala@mappi.helsinki.fi
> > wrote:
> >
> > I would assume that, if the poem is published as part of a
> > collection, the collection appears in a book, and the book is
> > identified with an ISBN, the mechanism for finding the poem
> > within the book becomes part of a specification about the book
> > and its identifier.
> Sort of. I left out some details from my description, but add them here
> because similar techniques are being put into use not only in libraries
> but in other organizations as well.
> 
> In our slang, there is a host record describing the book, and component
> part records describing the poem, article, image or any other thing in
> the book. There is a bidirectional link between host and component part
> records.

Yes, that is the model we envision, although it will take some time before we reach that stage. Traditionally, libraries have only catalogued the physical item at hand, so that there usually would be metadata available about the book (e. g. an anthology of poems by different authors) but not about the individual parts (i. e. individual poems or illustrations). So it might be more correct to rewrite the above statement as (emphasis added):

[[
In our slang, there is a host record describing the book, and there *might be* component part records describing the poem, article, image or any other thing in the book. *In that case*, there is a bidirectional link between host and component part records.
]]

We must keep in mind that cataloguing in libraries is extremely heterogeneous and that often only very coarse data is available. We're getting better, though.

> Similar linking techniques are in use when libraries describe
> for instance serials and serial articles and CDs & tracks in them. The
> work level record of a poem, track, article or any other component part
> will be linked directly to the the component part record. Persistent
> identifiers are required in all levels, and there can be several of them
> - for instance, a periodical articles often have embedded images.
> 
> These component parts are sometimes components only in logical sense.
> Each track or article may be available as a separate file. But if a file
> contains many component resources, the file syntax may reveal this
> internal structure. For instance, when the National Library of Finland
> digitizes serials we often create structured METS/ALTO XML files where
> encoding shows the logical structure of the issue.

Yes. And we must be careful when and how we use that as a basis for creating (persistent) identifiers. Do you suggest that we specify in some RFC that we use specific syntaxes (in this case METS/ALTO) to describe the (internal) structure of resource? If yes, I must say that I consider it a mistake a to depend on a certain technology to represent things considering that that technology might be obsolete in 500 years... 

[...]
> > While the distinctions are real, the details may be a matter of
> > convention.  For example, if the book's table of contents were
> > considered metadata about the book, the poem might also be found
> > within the book by asking a question of that metadata and then
> > using the result.
> Table of contents (and abstract) is often provided as metadata for
> non-fiction books. So it is possible to find for instance an article
> within a book even if there is no separate metadata record for it.
> Whether it is possible to provide a direct link to such article depends
> on the encoding of the (text) file. Rich encoding takes time, but serves
> the users well.

Perhaps one of the most important questions is how we handle resources that have not reached their ideal state yet...

On May 2nd, 5:07 PM, Juha wrote:

> Quoting John C Klensin <john-ietf@jck.com>:
> 
[skipping Ozymandias]

> > Whatever its other properties,
> > http://masinter.blogspot.com/2010/03/ozymandias-uri.html
> > is lousy as a persistent identifier.
> 
> URLs really should not be called identifiers. There is a fundamental
> difference between assigning a persistent identifier using a managed
> process (which includes creation of descriptive metadata about the
> resource) and getting a URL to a web page.
[...]
> Naming / identification has at times been discussed on this list in a
> very abstract level. For most resources that will be preserved for
> long term, things are not that complex. There are guidebooks for using
> ISTCs, ISBNs, ISSNs, NBNs and so on. In those namespaces at least it
> is not necessary to be familiar with Heidegger in order to be able to
> assign identifiers for things. On the other hand, it is these existing
> practices which are in conflict with some of the principles expressed
> in RFC 3986. If fragment were to identify something (instead of just
> pinpointing a location within the identified resource) the new URN
> syntax would not be OK of most identifiers used in libraries and
> publishing sector. The same applies for query.

Thanks Juha for mentioning the managed process again. If we require that urn:s (identifers) be assigned according to a formal process, that implies that *all parts* of the string are created according to that process. So if we decide to allow queries and fragment *identifiers* in urn:s, any institution assigning identifiers need to document according to what rules those are created. Is this correct?

And if we say that fragment identifiers are not identifiers (in the above sense), then we should not allow them. This just as a further argument why RFC 3986 FIs are a problem in URNs...

On April 29th, 10:46 PM, John C. Klensin wrote:

> For an http-style URL, the query is addressed to
> the store in which the object is located and may be used to
> select the object, to select within it, etc.  In a two (or more)
> fork environment, queries can, in principle, be addressed to
> information about the object (aka "metadata"), to the selection
> of the object or subsets of it, and so on.  They may specify if
> retrieval is actually wanted and, if so, in which fork.  For
> some types of objects (types presumably identified by NID) there
> may be one fork, two forks, or more forks and actual retrieval
> may be meaningful (or not) for each other them.  In principle,
> one could have an NID (or NID NSS pair) that did not identify an
> object at all but was a pure string for comparison purposes
> (that is allowed by 2141 as I read it).
>
> Because of those combinations, it is desirable to be able to
> identify where a query is intended to be processed and/or what
> sort of query it is on a basis that applies to all urn-method
> URNs and maybe to have abstractions about what happens when
> queries cannot be satisfied that goes somewhat beyond what 3986
> specifies (or allows other things to specify).  Because the
> query model of 3986 is, at least IMO, pretty closely tied to the
> interpretation of queries in http-style URLs, it is hard to make
> those distinctions except, perhaps, by kludge.
> 
> And, if we are really trying to construct identifiers that will
> be useful (or at least accurately interpretable) for centuries,
> even if the presumed associated retrieval methods go away,
> kludges that can be avoided are probably an extra-bad idea.

Until now I had been pretty certain that queries only make sense in the context of resolution services and thus argued that we should disallow them in the urn: syntax and defer them to RFC 2483bis, but the above comment together with Juha's hint that we might in future have urn:- based resolving (without relying on http:) has made me think about that again. One way to stay within the narrow query syntax of 3986 could be to specify a list of keywords (or perhaps better a prefix like "urnrs-" (urn resolution service)) that -- when used in queries -- are *only* to be interpreted by resolvers and MUST be ignored by other processors. Since creating the query part of the urn: is part of the managed process, that should be feasible. (I think Juha has mentioned similar thoughts earlier on this list, but I cannot find the reference right now).

Please let me know if I'm totally off on this.

Best,

Lars

*** Lesen. Hören. Wissen. Deutsche Nationalbibliothek *** 
-- 
Dr. Lars G. Svensson
Deutsche Nationalbibliothek
Informationstechnologie
Telefon: +49-69-1525-1752
mailto:l.svensson@dnb.de 
http://www.dnb.de