Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

jehakala@mappi.helsinki.fi Fri, 02 May 2014 15:06 UTC

Return-Path: <jehakala@mappi.helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9EB311A6FB4; Fri, 2 May 2014 08:06:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.303
X-Spam-Level:
X-Spam-Status: No, score=0.303 tagged_above=-999 required=5 tests=[BAYES_50=0.8, FRT_ADOBE2=2.455, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2FtR4P2LWEhU; Fri, 2 May 2014 08:06:55 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id 308C31A6F27; Fri, 2 May 2014 08:06:54 -0700 (PDT)
Received: from webmail-4.mappi.helsinki.fi (webmail-4.mappi.helsinki.fi [128.214.20.218]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id s42F6gje022186 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 2 May 2014 18:06:43 +0300
Received: from a88-114-107-213.elisa-laajakaista.fi (a88-114-107-213.elisa-laajakaista.fi [88.114.107.213]) by webmail.helsinki.fi (Horde Framework) with HTTP; Fri, 02 May 2014 18:06:42 +0300
Date: Fri, 02 May 2014 18:06:42 +0300
Message-ID: <20140502180642.Horde.k922N8-cIl2au4mAP9neJA2@webmail.helsinki.fi>
From: jehakala@mappi.helsinki.fi
To: John C Klensin <john-ietf@jck.com>
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <534BED18.9090009@gmx.de> <3D39F1AA700A179F3C051DE2@JcK-HP8200.jck.com> <534D3410.50607@ninebynine.org> <54ecc96adba240159cf624c54c507136@BL2PR02MB307.namprd02.prod.outlook.com> <952E89C207E59D25CD5953D6@JCK-EEE10>
In-Reply-To: <952E89C207E59D25CD5953D6@JCK-EEE10>
User-Agent: Internet Messaging Program (IMP) H5 (6.1.6)
Content-Type: text/plain; charset="UTF-8"; format="flowed"; DelSp="Yes"
MIME-Version: 1.0
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/073NpXZ2r1j7_D3xueX2sa1HsB4
Cc: julian.reschke@gmx.de, urn@ietf.org, Graham Klyne <GK@ninebynine.org>, apps-discuss@ietf.org
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 May 2014 15:06:58 -0000

Hello,


Quoting John C Klensin <john-ietf@jck.com>:

> Hi.
> <snip> ... one very quick observation in response to part of
> Larry's note...
>
> --On Tuesday, 15 April, 2014 16:25 +0000 Larry Masinter
> <masinter@adobe.com> wrote:
>
>> The only thing that makes something a name 'persistsent' is
>> the existence of a name resolution service or method which
>> persists. The syntax or namespace is irrelevant.  'persistent'
>> isn't binary, it's just "how long". Everything has a life-time.
>>
>> http://masinter.blogspot.com/2010/03/ozymandias-uri.html
>
> I think the above largely misses the point and that, in a sense,
> your blog posting illustrates the point although not in the way
> you probably intend.   "Persistent identifier" entered the IETF
> vocabulary a long time ago but may not be very look terminology.
>
> Some objects -- like stone statues if one doesn't care whether
> they remain intact and standing-- have very long life
> expectancies.  Others, like people, have shorter ones.

The library point of view to this is that Shelley's Ozymandias has an  
extremely long life expectancy as a work. It will exist at least as  
long as there is at least one copy left of it, in some physical  
(printed or digital) form. In principle the work itself may survive  
even longer, if 1-n people have memorized it before the last copy  
vanishes (which will inevitably happen).

Persistence of works, their manifestations and identifiers assigned to  
them is primarily an organizational issue. For instance, publications  
will survive a long time because national libraries look after them.  
Some of us are also busy harvesting web pages, using the same  
technologies as the Internet Archive. Likewise archives and museums  
will keep some materials for hundreds of years. This is the community  
which needs persistent identifiers and is already actively using them  
(and understands pretty well how to do it).

Now, in order to identify Ozymandias, the following steps are needed:

1. Give an identifier to the (immaterial) work, and provide metadata  
about the poem.

2. Give identifier to a manifestation of the work. The identifier  
system used can be an ISBN, if Ozymandias is published in a collection  
of Shelley's poems. Once the identifier has been assigned, provide  
metadata about the book. If the resource is not a book, use other  
identifier system such as NBN.

3. If the resource is digital, provide a link (using the existing  
identifier) from the metadata record to the resource. Traditional  
identifiers can be made actionable in the Internet as URNs, but other  
PID systems such as DOI, Handle, ARK et cetera can also be used. As an  
aside, new features which URNbis has built are not in conflict with  
other PIDs; on the contrary, they can use them as well. The only  
slightly problematic case is ARK since it has its own established  
practice for using query.

The essential difference between this PID-driven approach and that of  
some people on this list is that for me, URLs like this

http://www.poemhunter.com/poem/ozymandias/

or even this

http://ozymandias.perm

are not identifiers, first and foremost because their assignment  
process is not managed.

One may argue that sometimes there are no proper identifier system to  
use but this is not true; Handles, DOIs and URNs can be applied to  
anything out there). Second, there is a problem that URLs are not  
persistent, since they are technology dependent. Even if they had the  
same organizational support than the URN-based link supported by the  
national library (which is very unlikely) it may become very difficult  
to support these URLs long before reading the resources requires  
digital archaeology.

Some people on this list have argued that  
http://urn.fi/URN:ISBN:978-952-10-9658-7 or other PID with embedded  
resolver address is no different from e.g. http://ozymandias.perm. But  
in this case the syntactical similarity is deceptive; URN and other  
PIDs must be resolved before the current URL (URLs) of the resource is  
found, so all kinds of interesting stuff can be done in the resolution  
process before the result is passed to HTTP. And once Resolver  
Discovery Service is in place, it will be possible to drop the  
resolver URL from the URN string, and make the difference between URLs  
and URNs more clear.


But URIs
> (with or within including URNs) aren't objects, they are object
> reference that, like most good references, involve some degree
> of abstraction.  Now, "Ozymandias" is a very long-persistent
> reference.  It isn't the object.  It is a somewhat ambiguous
> reference because it can refer to the statue, the poem, your
> blog posting, and probably several other things, but has a long
> duration, perhaps one that is long enough to survive the objects
> it references (just like titles of long-lost books).   But, as a
> reference, it is that persistent because it is not bound to a
> retrieval mechanism that has its own persistence properties.

References to immaterial objects (works and expressions, the latter  
meaning for instance translations of works) are persistent, but they  
do require metadata since otherwise they may be ambiguous. For  
instance, we are talking about Shelley's poem, first published in The  
Examiner, 11 January 1818, along with Hymn to intellectual beauty.  
There is also book by Thomas Monteleone which has the same name.

>
> Whatever its other properties,
> http://masinter.blogspot.com/2010/03/ozymandias-uri.html
> is lousy as a persistent identifier.

URLs really should not be called identifiers. There is a fundamental  
difference between assigning a persistent identifier using a managed  
process (which includes creation of descriptive metadata about the  
resource) and getting a URL to a web page.

One problem I and others have with the concept of URI is that it blurs  
what should be a clear difference between locations and proper  
identifiers. Anyone should be allowed to give the former, but  
identifier assignment for resources that are to be kept for centuries  
is not something anyone is allowed do. Of course even people assigning  
proper identifiers such as ISBNs and ISSNs make mistakes, and there  
are pressures to use these systems in less than optimal manner, but  
that is not the key issue here. What is important is that any digital  
resource that will be preserved for future generations will require a  
persistent identifier. Millions and millions of them - Handles, DOIs,  
URNs, ARKs) have been assigned already by libraries, publishers,  
universities, archives and museums. In order to optimize the tools  
used for generating and resolving URNs and other PIDs, we need some  
new functionalities such as the possibility of using fragments and  
queries.


It depends on the
> availability of "http" (or something called that) as a retrieval
> mechanism.  It depends on there being a DNS, on there being a
> TLD named "com" an SLD named "blogspot", and both of them having
> certain properties of which HTTP can take advantage.

It seems that the extent to which URLs are dependent on underlying  
technologies and the impact this will have on URLs in the long run  
(after, say, a few decades) is not properly understood yet even by  
those communities which must preserve things for extended periods. It  
might be useful to explain where the problems are, and make a  
guesstimate of when each of these issues might emerge.

> "Ozymandias", and even the potential URN
> urn:poems:Shelley/Ozymandias are more persistent because they
> represent a higher level of abstraction and are not tied to
> either a location or a retrieval/access method.  Use of a
> hypothetical ISPN (International Standard Poem Identifier) in
> the form urn:ispn:NNNN-NNNNNN-NNNNNNNNNNNNNNNNNNNN might or
> might not be more persistent: it would be an exact identifier of
> something and makes equivalence comparison more feasible and
> less ambiguous, but it is probably tied to a database to
> actually identify an object and such databases may be more or
> less available and persistent than access methods and/or the DNS
> (which is, after all, just another database).

In practice there will be ISTC (International Standard Text Code)  
assigned to the poem. ISTC can be expressed as URN  
(urn:istc:xxxx-xxxx-xxxx-xxxx). Library databases all over the world  
will have work metadata about the poem (once one library has  
catalogued it, it can be copied to the OPAC of every other library in  
the world). From the work record there will be persistent links to  
manifestation records, which in turn will contain the locations of the  
manifestations - physical shelf locations for printed stuff, and links  
using persistent identifiers to electronic resources. When thousands  
of library OPACs have the same metadata record, it is important to  
have just persistent identifier -based links in them; resolving these  
URNs / Handles / etc. to URLs must be centralized. Then it is not  
necessary to modify every OPAC when URLs change.

Naming / identification has at times been discussed on this list in a  
very abstract level. For most resources that will be preserved for  
long term, things are not that complex. There are guidebooks for using  
ISTCs, ISBNs, ISSNs, NBNs and so on. In those namespaces at least it  
is not necessary to be familiar with Heidegger in order to be able to  
assign identifiers for things. On the other hand, it is these existing  
practices which are in conflict with some of the principles expressed  
in RFC 3986. If fragment were to identify something (instead of just  
pinpointing a location within the identified resource) the new URN  
syntax would not be OK of most identifiers used in libraries and  
publishing sector. The same applies for query.

Juha

>
> Regardless of what one calls them, I suggest that
> access-method-dependent identifiers (http:...,
> NineTrackTape:??42.3744°?N,?71.1169°?W?123456 (where "?"
> represents one or more carefully-chosen delimiters that may not
> have RFC 3986 semantics), ...) are different kinds of creatures
> than either the objects to which they refer or names of those
> objects that are not, in the sense of the above, method or
> location-model dependent.
>
> More soon.
>
>     john
>
>
> _______________________________________________
> urn mailing list
> urn@ietf.org
> https://www.ietf.org/mailman/listinfo/urn