Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC3986)

"Svensson, Lars" <L.Svensson@dnb.de> Tue, 03 June 2014 11:14 UTC

From: "Svensson, Lars" <L.Svensson@dnb.de>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>, John C Klensin <john-ietf@jck.com>
Thread-Topic: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC3986)
Thread-Index: Ac9/HOq9y64uFoo1TS+MdcsppWfLbA==
Date: Tue, 03 Jun 2014 11:14:16 +0000
Message-ID: <24637769D123E644A105A0AF0E1F92EFA444A682@dnbf-ex1.AD.DDB.DE>
Accept-Language: de-DE, en-US
Content-Language: de-DE
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/k9S_vRP9gHHEAANI-CjGqjOlLf8
Cc: "urn@ietf.org" <urn@ietf.org>
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC3986)
Precedence: list

Continuing my way through the thread...

> 3986 says some things which seem very much in sympathy with URNBIS:
> 
>   1.2.2.  Separating Identification from Interaction [2]
> 
>    . . .
> 
>    A common misunderstanding of URIs is that they are only used to
>    refer to accessible resources.  The URI itself only provides
>    identification; access to the resource is neither guaranteed nor
>    implied by the presence of a URI.  Instead, any operation
>    associated with a URI reference is defined by the protocol element,
>    data format attribute, or natural language text in which it
>    appears.
> 
>    Given a URI, a system may attempt to perform a variety of
>    operations on the resource, as might be characterized by words such
>    as "access", "update", "replace", or "find attributes".  Such
>    operations are defined by the protocols that make use of URIs, not
>    by this specification.

If I understand correctly, in the URN world, this is the distinction between on the one hand URN syntax (2414bis) and the namespace specific constraints laid down in the NID registrations (Identification) and on the other hand URN resolution (2483bis: Resolution). One might argue that wording like "such operations are defined by the *protocols* that make use of URIs" [emphasis added] is unfortunate since URN resolution services might or might not be tied to any specific protocol.

[...]
 
>  3.5 Fragment
> 
>    . . .
> 
>    As with any URI, use of a fragment identifier component does not
>    imply that a retrieval action will take place.  A URI with a
>    fragment identifier may be used to refer to the secondary resource
>    without any implication that the primary resource is accessible or
>    will ever be accessed.
> 
> Nonetheless fragment( and query) seem to be at the heart of the
> perceived problems of URIs expressed in URNBIS [1]:
> 
>   8.  The role URI fragment and query could or should have in
>        identification is unclear and the statements in RFC 3986 are
>        definitely problematic from the points of view of existing
>        identifier systems and management of naming.
> 
>    Does fragment identify a location or a certain section of a resource?
>    In the evolving set of URN Internet standards, fragment will not be a
>    part of the Namespace Specific String.  Then fragment only indicates
>    a place / segment within the identified resource, but does not
>    identify it.  If fragment had a role in identification, fragments
>    would extend the scope of existing standard identifiers to component
>    parts of resources.  For instance, anyone could use URN based on ISBN
>    + fragment to identify chapters of electronic books.
> 
> There is certainly a fundamental tension with pretty fundamental IETF
> principles (beyond just what we find in 3986) implied here, if I
> understand it correctly.  Media types play a central role in the IETF
> architectural vision: given a character string and a media type, I
> know how to find out what I can do with the characters.  It doesn't
> matter how I got them and their type: from my local disk, via HTTP, in
> an email, or by carrier pigeon.  And it doesn't matter what name or
> names may have been involved in helping me get access to them.  One of
> the things I can do is use whatever fragment identifier syntax and
> semantics is provided by the definition of the media type in question
> to . . . identify 'fragments'.  And, crucially, those 'fragments'
> _need not_ be any locations or sections of the character string or its
> interpretation per the media type:
> 
>    The identified secondary resource may be some portion or subset of
>    the primary resource, some view on representations of the primary
>    resource, or _some other resource defined or described by those
>    representations_. [emphasis added] [3]
> 
> To make this absolute concrete, given a text/html representation and a
> fragment identifier "foo", the secondary resource in question is
> e.g. a paragraph in the document resource represented by that html, in
> particular the paragraph corresponding to the markup including "<a
> name='foo'>" in the representation.  Or, given a text/turtle
> representation and a fragment "tbl", the secondary resource in
> question is e.g. the resource _described_ by the RDF node
> corresponding to a line consisting of "<#tbl>" in the representation.
> All of that follows straightforwardly from 3986 and the two respective
> media type registrations.

One big question is, what happens when you retrieve a resource where the media type registration does not specify what the meaning of a fragment identifier is (e. g. application/json [5]). More on that below.

> 3986 is clear that those secondary resources are _identified_ by those
> fragment identifiers.  You appear to be proposing that per URNBIS and
> 2141bis e.g. urn:example:w3cstaff#tbl would _identify_ the same
> resource as urn:example:w3cstaff (although it might be defined to
> 'indicate' something different. . .).

This is indeed something I fail to grasp, too.

> It seems pretty clear to me that this is a divisive and confusing
> thing to do.  Let's take the e-book example (I'm using a DOI, because
> I happen to remember where to find the resolver for them, but the
> point would be the same for any URN->http:URI resolver today):  The
> following _both_ identify the main body of an editorial in the journal
> _Nature_ of 28 May 2014:
> 
>   doi:10.1038/509534a#article
>   http://dx.doi.org/10.1038/509534a#article
> 
> Am I right that your proposal would change this (assuming a URN
> instead of a DOI)?  I.e., that those two things would no longer
> identify the same secondary resource, despite both, minus the
> '#article', being usable to retrieve the same text/html representation
> of the same resource?

This question is of course something John will have to answer, and I can only say that I find this counter-intuitive, too. That, together with the media type problems connected with fragment identifiers, makes me propose that we should not allow fragment identifiers in URNs.

[adding discussion about DOIs]

> >> Actually, they don't.  The DOI maps to a bunch of bibliographic info
> >> which you get if you fetch that URL and ask for JSON or bibtex.  (See
> >> JSON below.) ...
> 
> > Fetch?  You're right, of course, that I should have tried the official
> > resolution mechanism to see what I got, which I didn't.  How should I
> > have done so?
> 
> As far as I can tell you can't, since nobody implements the protocol
> defined by the handle system on which DOIs are based.
> 
> >> As a hack, albeit the one that people use 99% of the time, if the
> >> publisher provides a URL, the DOI web server responds with a redirect
> >> to the publisher's URL if you ask for html
> >
> > I didn't ask for anything -- I did a bare-naked
> >
> > GET http://dx.doi.org/10.1038/509534a HTTP/1.1
> > Host: dx.doi.org
> >
> > and followed the redirects.
> 
> Yes, that's the hack.  If your GET asks for HTML, which is of course the
> default, it returns the redirect, and any fragment name is applied to
> whatever the publisher returns as the redirected page.  If you ask for
> JSON, you get the bibliographic stuff:
> 
[...]
> 
> It's not surprising if you find this confusing -- I had to poke around for
> a long time before I could figure out what was actually going on.

The problem I'm having with this is that if we take the original URI json' http://dx.doi.org/10.1038/509534a#article, ask for 'application/json' and follow the redirects, we end up with the following:

svensson@F-NBW7-04660 ~
$ curl -i -H 'Accept: application/json' http://dx.doi.org/10.1038/509534a#article
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Vary: Accept
Location: http://data.crossref.org/10.1038%2F509534a
Expires: Tue, 03 Jun 2014 12:28:34 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 170
Date: Tue, 03 Jun 2014 09:40:28 GMT

<HTML><HEAD><TITLE>Handle Redirect</TITLE></HEAD>
<BODY><A HREF="http://data.crossref.org/10.1038%2F509534a">http://data.crossref.org/10.1038%2F509534a</A></BODY></HTML> 

svensson@F-NBW7-04660 ~
$ curl -i -H 'Accept: application/json' http://data.crossref.org/10.1038%2F509534a#article
HTTP/1.1 200 OK
Access-Control-Allow-Headers: X-Requested-With
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 589
Server: http-kit
Date: Tue, 03 Jun 2014 09:42:31 GMT
Connection: close

{"subtitle":[],"subject":["General"],"issued":{"date-parts":[[2014,5,28]]},"score":1.0,"prefix":"http:\/\/id.crossref.org\/prefix\/10.1038","container-title":"Nature","reference-count":0,"page":"534-534","deposited":{"date-parts":[[2014,5,28]],"timestamp":1401235200000},"issue":"7502","title":"Welcome, Scientific Data!","type":"journal-article","DOI":"10.1038\/509534a","ISSN":["0028-0836","1476-4687"],"URL":"http:\/\/dx.doi.org\/10.1038\/509534a","source":"CrossRef","publisher":"Nature Publishing Group","indexed":{"date-parts":[[2014,5,31]],"timestamp":1401579700372},"volume":"509"}

The JSON above has no reference to the fragment 'article' and since the registration of 'application/json' [5] does not mention fragment identifiers, my question is if the fragment has any meaning at all (apart from identifying a secondary resource, whichever that might be).

> [1] http://tools.ietf.org/html/draft-ietf-urnbis-urns-are-not-uris-00
> [2] http://tools.ietf.org/html/rfc3986#section-1.2.2
> [3] http://tools.ietf.org/html/rfc3986#section-3.5
> [4] http://www.ietf.org/mail-archive/web/urn/current/msg02249.html
[5] http://tools.ietf.org/html/rfc4627

Best,

Lars

*** Lesen. Hören. Wissen. Deutsche Nationalbibliothek *** 
-- 
Dr. Lars G. Svensson
Deutsche Nationalbibliothek
Informationstechnologie
Telefon: +49-69-1525-1752
mailto:l.svensson@dnb.de 
http://www.dnb.de

Re: [urn] [apps-discuss] URNs are not URIs (anoth… Svensson, Lars
Re: [urn] [apps-discuss] URNs are not URIs (anoth… Svensson, Lars