Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

jehakala@mappi.helsinki.fi Mon, 05 May 2014 22:48 UTC

Return-Path: <jehakala@mappi.helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 71DC31A049C for <urn@ietfa.amsl.com>; Mon, 5 May 2014 15:48:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.252
X-Spam-Level:
X-Spam-Status: No, score=-4.252 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_31=0.6, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s4Jbloovn_A6 for <urn@ietfa.amsl.com>; Mon, 5 May 2014 15:48:14 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id AF2C31A0496 for <urn@ietf.org>; Mon, 5 May 2014 15:48:13 -0700 (PDT)
Received: from webmail-5.mappi.helsinki.fi (webmail-5.mappi.helsinki.fi [128.214.20.189]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id s45Ml1DO028217 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 6 May 2014 01:47:01 +0300
Received: from host2.citi-us.com (host2.citi-us.com [38.100.20.80]) by webmail.helsinki.fi (Horde Framework) with HTTP; Tue, 06 May 2014 01:47:01 +0300
Date: Tue, 06 May 2014 01:47:01 +0300
Message-ID: <20140506014701.Horde.uMyXim2GMU5VjqR2gaG_gw8@webmail.helsinki.fi>
From: jehakala@mappi.helsinki.fi
To: "Svensson, Lars" <L.Svensson@dnb.de>
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <534BED18.9090009@gmx.de> <3D39F1AA700A179F3C051DE2@JcK-HP8200.jck.com> <534D3410.50607@ninebynine.org> <54ecc96adba240159cf624c54c507136@BL2PR02MB307.namprd02.prod.outlook.com> <952E89C207E59D25CD5953D6@JCK-EEE10> <20140502180642.Horde.k922N8-cIl2au4mAP9neJA2@webmail.helsinki.fi> <86412DCF67470AFC510CD4F4@JcK-HP8200.jck.com> <5363F867.60503@helsinki.fi> <24637769D123E644A105A0AF0E1F92EFA43FFFAC@dnbf-ex1.AD.DDB.DE>
In-Reply-To: <24637769D123E644A105A0AF0E1F92EFA43FFFAC@dnbf-ex1.AD.DDB.DE>
User-Agent: Internet Messaging Program (IMP) H5 (6.1.6)
Content-Type: text/plain; charset="UTF-8"; format="flowed"; DelSp="Yes"
MIME-Version: 1.0
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/bwTAvU9U6S6Bu1t-UQ02FqBtfaw
Cc: julian.reschke@gmx.de, urn@ietf.org, Graham Klyne <GK@ninebynine.org>
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 05 May 2014 22:48:18 -0000

Hello Lars,

Some comments below.

Quoting "Svensson, Lars" <L.Svensson@dnb.de>:

>
>>
>> In our slang, there is a host record describing the book, and component
>> part records describing the poem, article, image or any other thing in
>> the book. There is a bidirectional link between host and component part
>> records.
>
> Yes, that is the model we envision, although it will take some time  
> before we reach that stage. Traditionally, libraries have only  
> catalogued the physical item at hand, so that there usually would be  
> metadata available about the book (e. g. an anthology of poems by  
> different authors) but not about the individual parts (i. e.  
> individual poems or illustrations).

For digital resources this policy must change. No manifestation of a  
digital resource (such as PDF version of a doctoral dissertation) will  
ever be persistent in the national library sense of the word. What we  
need is description of the work (which will never change) and linked  
to it, successive versions (manifestations) of the resource, with  
preservation metadata which describes the differences between these  
versions.


So it might be more correct to
> rewrite the above statement as (emphasis added):
>
> [[
> In our slang, there is a host record describing the book, and there  
> *might be* component part records describing the poem, article,  
> image or any other thing in the book. *In that case*, there is a  
> bidirectional link between host and component part records.
> ]]
>
> We must keep in mind that cataloguing in libraries is extremely  
> heterogeneous and that often only very coarse data is available.  
> We're getting better, though.

Yes - we can't manage digital content properly without getting better.  
Of course, most libraries do not have the burden of preserving digital  
resources for long term, but for those libraries that do, there is no  
alternative. And any organization which intends to preserve digital  
resources for long term must have a fairly sophisticated data model to  
support that activity. Persistent identifiers are just a small, but  
essential, part of this.


>>
>> These component parts are sometimes components only in logical sense.
>> Each track or article may be available as a separate file. But if a file
>> contains many component resources, the file syntax may reveal this
>> internal structure. For instance, when the National Library of Finland
>> digitizes serials we often create structured METS/ALTO XML files where
>> encoding shows the logical structure of the issue.
>
> Yes. And we must be careful when and how we use that as a basis for  
> creating (persistent) identifiers. Do you suggest that we specify in  
> some RFC that we use specific syntaxes (in this case METS/ALTO) to  
> describe the (internal) structure of resource? If yes, I must say  
> that I consider it a mistake a to depend on a certain technology to  
> represent things considering that that technology might be obsolete  
> in 500 years...

In this particular case I am talking about internal structure of a  
single version / manifestation of the resource and the identifiers  
needed for that. The next manifestation may have the same logical  
components but different encoding and identifiers (only the identifier  
of the work itself never changes). When migration is applied as the  
preservation policy the aim is usually to preserve the intellectual  
content. Trying to preserve the original look and feel is not likely  
to succeed.



> Thanks Juha for mentioning the managed process again. If we require  
> that urn:s (identifers) be assigned according to a formal process,  
> that implies that *all parts* of the string are created according to  
> that process. So if we decide to allow queries and fragment  
> *identifiers* in urn:s, any institution assigning identifiers need  
> to document according to what rules those are created. Is this  
> correct?

According to the current URN syntax I-D specification, fragment and  
query are not part of the namespace specific string and therefore do  
not identify anything. If we change our minds, we would need to decide  
what they actually identify, and try to convince the people who  
maintain traditional identifier systems that extending the scope of  
their identifiers with fragment identification and whatever queries  
can be applied to is OK. I don't believe that that would be easy.


>
> And if we say that fragment identifiers are not identifiers (in the  
> above sense), then we should not allow them. This just as a further  
> argument why RFC 3986 FIs are a problem in URNs...

Well, we do allow the use of URI fragments but for us they just  
indicate a location within the document identified by the namespace  
specific string.

Juha

>
> On April 29th, 10:46 PM, John C. Klensin wrote:
>
>> For an http-style URL, the query is addressed to
>> the store in which the object is located and may be used to
>> select the object, to select within it, etc.  In a two (or more)
>> fork environment, queries can, in principle, be addressed to
>> information about the object (aka "metadata"), to the selection
>> of the object or subsets of it, and so on.  They may specify if
>> retrieval is actually wanted and, if so, in which fork.  For
>> some types of objects (types presumably identified by NID) there
>> may be one fork, two forks, or more forks and actual retrieval
>> may be meaningful (or not) for each other them.  In principle,
>> one could have an NID (or NID NSS pair) that did not identify an
>> object at all but was a pure string for comparison purposes
>> (that is allowed by 2141 as I read it).
>>
>> Because of those combinations, it is desirable to be able to
>> identify where a query is intended to be processed and/or what
>> sort of query it is on a basis that applies to all urn-method
>> URNs and maybe to have abstractions about what happens when
>> queries cannot be satisfied that goes somewhat beyond what 3986
>> specifies (or allows other things to specify).  Because the
>> query model of 3986 is, at least IMO, pretty closely tied to the
>> interpretation of queries in http-style URLs, it is hard to make
>> those distinctions except, perhaps, by kludge.
>>
>> And, if we are really trying to construct identifiers that will
>> be useful (or at least accurately interpretable) for centuries,
>> even if the presumed associated retrieval methods go away,
>> kludges that can be avoided are probably an extra-bad idea.
>
> Until now I had been pretty certain that queries only make sense in  
> the context of resolution services and thus argued that we should  
> disallow them in the urn: syntax and defer them to RFC 2483bis, but  
> the above comment together with Juha's hint that we might in future  
> have urn:- based resolving (without relying on http:) has made me  
> think about that again. One way to stay within the narrow query  
> syntax of 3986 could be to specify a list of keywords (or perhaps  
> better a prefix like "urnrs-" (urn resolution service)) that -- when  
> used in queries -- are *only* to be interpreted by resolvers and  
> MUST be ignored by other processors. Since creating the query part  
> of the urn: is part of the managed process, that should be feasible.  
> (I think Juha has mentioned similar thoughts earlier on this list,  
> but I cannot find the reference right now).
>
> Please let me know if I'm totally off on this.
>
> Best,
>
> Lars
>
> *** Lesen. Hören. Wissen. Deutsche Nationalbibliothek ***
> --
> Dr. Lars G. Svensson
> Deutsche Nationalbibliothek
> Informationstechnologie
> Telefon: +49-69-1525-1752
> mailto:l.svensson@dnb.de
> http://www.dnb.de