Re: [urn] Fragments Re: I-D Action: draft-ietf-urnbis-rfc2141bis-urn-02.txt

Juha Hakala <juha.hakala@helsinki.fi> Wed, 28 March 2012 06:29 UTC

Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 220A721F85C3 for <urn@ietfa.amsl.com>; Tue, 27 Mar 2012 23:29:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.824
X-Spam-Level:
X-Spam-Status: No, score=-5.824 tagged_above=-999 required=5 tests=[AWL=0.475, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PzSv00SLi5pQ for <urn@ietfa.amsl.com>; Tue, 27 Mar 2012 23:29:00 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id 0A44D21F873D for <urn@ietf.org>; Tue, 27 Mar 2012 23:28:58 -0700 (PDT)
Received: from [128.214.91.90] (kkkl25.lib.helsinki.fi [128.214.91.90]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id q2S6SrdH032709 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 28 Mar 2012 09:28:55 +0300
Message-ID: <4F72AFA5.10508@helsinki.fi>
Date: Wed, 28 Mar 2012 09:28:53 +0300
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
References: <20120312010553.16681.34930.idtracker@ietfa.amsl.com> <4F6F16E0.70703@thinkingcat.com> <4F7019DC.5080703@it.aoyama.ac.jp> <4F71DF07.6020803@thinkingcat.com> <4F7298B7.2050808@it.aoyama.ac.jp>
In-Reply-To: <4F7298B7.2050808@it.aoyama.ac.jp>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: urn@ietf.org
Subject: Re: [urn] Fragments Re: I-D Action: draft-ietf-urnbis-rfc2141bis-urn-02.txt
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Mar 2012 06:29:01 -0000

Hello,

A few comments below.

Martin J. Dürst wrote:
> Hello Leslie,
> 
> On 2012/03/28 0:38, Leslie Daigle wrote:
> 
>> On 3/26/12 3:25 AM, "Martin J. Dürst" wrote:
> 
>>> Could this be relaxed to "responses for which there is a commitment that
>>> the fragments, if interpretable, are interpreted consistenly"? This
>>> would not exclude media types where a fragment identifier doesn't make
>>> sense, but would, if implemented diligently, produce consistency across
>>> all those media types where fragment identifiers make sense.
>>
>> I'm really not sure what this means, in practice.
> 
> Okay. Assume as an example that the resource can be served as HTML, as 
> SVG, and as PDF, and that the last one doesn't allow fragment 
> identifiers (that could be wrong, now or in the future). Then what this 
> would mean is that one and the same fragment identifier would identify 
> one and the same thing (chapter, paragraph, figure, whatever) in both 
> the HTML and the SVG version. It's definitely possible to make this 
> work, although it's of course also possible to mess this up.

The URN namespaces in which I consider it is safe to use <fragment> are 
those and only those which will always assign separate identifiers to 
the different (HTML, SVG, PDF and so on) representations / versions / 
manifestations of the resource. In these namespaces it is never 
necessary to investigate if the <fragment> might work correctly with 
different manifestations of the resource.

Of course these identifier systems also require that if the intellectual 
content of the resource changes (which may have an impact on fragments) 
even if the file format does not change, a new identifier must be assigned.

As Leslie has pointed, it is possible that these identifier systems 
(ISBNs, NBNs, etc.) can be misused. But IMHO we should not drop useful 
functionality from URN docs only because some people (a minority ot all 
users) abuse knowingly or unknowingly identifier systems like ISBN.

As regards identifiers which are not representation-specific: although 
it is possible that different manifestations of the resource have the 
same internal structure now, there is no way we can guarantee this 10 or 
100 years from now. Therefore, if we have an identifier which 
encompasses several or all manifestations of a resource such as 
International Standard Text Code (ISTC), I would never recommend the use 
of <fragment> with it.

You may wonder where the persistence comes from if the URNs are tied to 
a particular representation of a resource. The idea the national 
libraries are considering is that the most persistent thing related to 
the resource is the description of it as immaterial work; a metadata 
record which will have its own URN. All the representations of the work 
will be linked to this record, so the user can pick for instance the 
eldest representation (most authentic, but hard to utilize without 
specialised rendering application) or the newest one (easy to use, but 
with altered look and feel).

It might be useful to discuss somewhere how national libraries and other 
institutions which are involved with the long term (meaning centuries) 
preservation of digital resources intend to do this, and how we plan to 
use URNs  in this work.

>> And, I find myself wondering: what's the _specified_ behaviour when a
>> fragment reference cannot be resolved (by the client)?
> 
> A fragment identifier identifies a secondary resource, so if that 
> doesn't work, all we know about is the resource. I don't think there's a 
> spec for this, but I'd assume that most clients would resolve to the 
> resource and then give up. And 'give up' would usually place the user at 
> the top of the document.

This sounds like a sensible way of solving the problems which occur when 
the client cannot find the correct place within the resource, or when it 
cannot interprete the <fragment>.
> 
> 
> Actually, I think RFC 3986 is pretty close to what I'm suggesting (third 
> paragraph of http://tools.ietf.org/html/rfc3986#section-3.5):
> 
>                                                             If the
>    primary resource has multiple representations, as is often the case
>    for resources whose representation is selected based on attributes of
>    the retrieval request (a.k.a., content negotiation), then whatever is
>    identified by the fragment should be consistent across all of those
>    representations.  Each representation should either define the
>    fragment so that it corresponds to the same secondary resource,
>    regardless of how it is represented, or should leave the fragment
>    undefined (i.e., not found).
> 
> 
> So I think I agree with you the URN spec shouldn't mess around with 
> fragment identifiers, and that individual URN namespaces also should not 
> mess around with fragment identifiers. The only thing these specs may 
> want to do is to point to RFC 3986.

This is pretty much what we have been trying to do in the URNbis I-Ds. 
But given that URNs are persistent, we have to make it clear that it is 
only safe to use <fragment> if the URN is tied to a single 
representation / manifestation / etc. of the resource.

Juha
> 
> Regards,    Martin.
> _______________________________________________
> urn mailing list
> urn@ietf.org
> https://www.ietf.org/mailman/listinfo/urn

-- 

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email juha.hakala@helsinki.fi, tel +358 50 382 7678