Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt

Juha Hakala <juha.hakala@helsinki.fi> Tue, 15 November 2011 12:28 UTC

Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47F5821F8BB3 for <urn@ietfa.amsl.com>; Tue, 15 Nov 2011 04:28:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.834
X-Spam-Level:
X-Spam-Status: No, score=-2.834 tagged_above=-999 required=5 tests=[AWL=0.865, BAYES_00=-2.599, J_CHICKENPOX_34=0.6, MANGLED_STOP=2.3, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Aoj8Y-p7H-Rr for <urn@ietfa.amsl.com>; Tue, 15 Nov 2011 04:28:01 -0800 (PST)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id 8299A21F8DD0 for <urn@ietf.org>; Tue, 15 Nov 2011 04:27:56 -0800 (PST)
Received: from [128.214.91.90] (kkkl25.lib.helsinki.fi [128.214.91.90]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id pAFCRo3T022641 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 15 Nov 2011 14:27:51 +0200
Message-ID: <4EC25AC6.9060404@helsinki.fi>
Date: Tue, 15 Nov 2011 14:27:50 +0200
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: Alfred Hoenes <ah@TR-Sys.de>
References: <201110312002.VAA11600@TR-Sys.de>
In-Reply-To: <201110312002.VAA11600@TR-Sys.de>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: urn@ietf.org
Subject: Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Discussions about possible revisions to the definition of Uniform Resource Names <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Nov 2011 12:28:05 -0000

Hello Alfred; all,

Thank you for submitting a new, improved version of RFC2141bis.

I have a few comments to the draft. Most concern the text itself; some 
are more generic issues which are just implied in the document.

In chapter 1.2 (Background) we quote RFC1738:

o  Persistence: It is intended that the lifetime of a URN be
    permanent.  That is, the URN will be globally unique forever, and
    may well be used as a reference to a resource well beyond the
    lifetime of the resource it identifies or of any naming authority
    involved in the assignment of its name.

Instances of digital resources do have a short life time. After a few 
decades we may no longer have software capable of interpreting the bits. 
Metadata about the resource (including technical metadata which may help 
digital archaeologists) may however still be available. And if migration 
has been used as a preservation strategy, there will be other instances 
of the resource which are still accessible.

Therefore should may adjust the purpose / function of a URN (the first 
bullet point in 1.2) as expressed in RFC 2141 slightly, to say that URNs 
are used for recognition, or for access to diverse metadata (formerly 
characteristics) about the resource, or access to the resource itself or 
  other resources related to it, such as preceding or later versions of 
the original resource.

This is still a simple use scenario, where I have (in library slang) 
different manifestations of a single expression of a work (for instance, 
a Finnish translation of Joyce's Ulysses in Word 97 and PDF/A. I may 
also have multiple expressions of a work, and 1-n versions of each one 
of these. URN resolution must support resource interlinking, since a 
user looking for e.g. the first Finnish translation of Ulysses may also 
be able to use a more modern Finnish translation or the original text.

We have not restarted the discussion of what URNs can be applied to. In 
7.1 (bottom of the page 16) we say that URNs serve as identifiers for 
concrete and abstract objects that have network accessible instances 
and/or metadata. In short, URNs must be actionable one way or another; 
resolution should provide some kind of result.

This specification is OK, especially if we keep in mind that the 
abstract object itself can be a metadata record. For instance, some 
national libraries routinely describe two variants of a printed book 
(paperback & hardcover, for instance) in the same metadata record. If 
record has an NBN, one may argue that it identifies the record, not the 
books. From the URN resolution process point of view this makes sense, 
because the URN will resolve to the metadata record.

As the RFC2141bis already says, URNs may also be assigned to works, 
which are abstract objects having 0-n manifestations (there are plenty 
of works that have been lost, and many have reached us in truncated form).

In chapter 2 <query> is discussed in the bottom of page 9 & top of the 
page 10. We should refer here to RFC2483 (which specifies the resolution 
services) and use an example which is based on an existing service. The 
current example is a bit puzzling; for the time being it is not possible 
to specify the type of metadata wanted because no such service exists.

A note should be added, saying that the services nailed down in RFC2483 
are not sufficient. For instance, it is not possible to specify what 
type of metadata is needed (descriptive / administrative / structural) 
and in which format (there are plenty of formats for descriptive 
metadata, such as MARC21 or Dublin Core). RFC2483 refers to URC (Uniform 
Resource Characteristics) which was never implemented in practice. (As 
an aside, there are plenty of other reasons for updating that RFC.)

On page 10, two options for supporting fragment identifiers are 
specified. I am not sure this dichotomy works. Method a) (fragment 
identifiers are assigned individually) is of course OK. But if fragment 
identifiers are generally applicable (method b), then there is no need 
to repeat the specification at the namespace level. Assuming that 
fragments can be used with PDF documents, then the same principles 
should apply across all namespaces which do approve fragment usage.

In chapter 2.3.3 (in the middle of the page 14) we say:

"In a textual context for a URN, the NSS part ends when an octet/
character from the excluded character set (<excluded>) is
encountered.  The character from the excluded character set is NOT
part of the NSS."

This does not take into account the discussion we've had on the list.

First, whatever is said here applies to plain text only. In structured 
text parsing URNs should be easy.

Second, most standard identifiers have well known syntax. ISSN has 8 
characters, ISBN either 10 or 13, and ISTC 16. Parsing urn:issn is easy; 
after 8 characters you are done, even if the next character is not from 
the excluded set. Any namespace specific rules for parsing and lexical 
equivalence must be expressed in the namespace registration.

In chapter 5. (top of the page 15), examples should include <query> and 
<fragment>. As the former is not part of the URN, <query> must be 
ignored in the analysis, while <fragment> must not.

Terminological comment

We speak (almost) interchangeably about objects which have instances, or 
resources which have versions. We may also refer either to resource 
characteristics or object metadata, or use library related concepts of 
work, expression and manifestation.

Consolidation of the terminology used would make the documents easier to 
understand. I suggest that we carry out such a task between the authors 
before the next versions of these I-Ds are published.

All the best,

Juha

Alfred � wrote:
> The IETF I-D Submission Tool <internet-drafts at ietf.org> wrote:
> 
>> A New Internet-Draft is available from the on-line Internet-Drafts
>> directories. This draft is a work item of the
>> Uniform Resource Names, Revised Working Group of the IETF.
>>
>>   Title      : Uniform Resource Name (URN) Syntax
>>   Author(s)  : Alfred Hoenes
>>   Filename   : draft-ietf-urnbis-rfc2141bis-urn-01.txt
>>   Pages      : 28
>>   Date       : 2011-10-31
>>
>>   Uniform Resource Names (URNs) are intended to serve as persistent,
>>   location-independent, resource identifiers.  This document serves as
>>   the foundation of the 'urn' URI Scheme according to RFC 3986 and sets
>>   forward the canonical syntax for URNs, which subdivides URNs into
>>   "namespaces".  A discussion of both existing legacy and new
>>   namespaces and requirements for URN presentation and transmission are
>>   presented.  Finally, there is a discussion of URN equivalence and how
>>   to determine it.  This document supersedes RFC 2141.
>>
>>    The requirements and procedures for URN Namespace registration
>>    documents are currently set forth in RFC 3406, which is also being
>>    updated by a companion, revised specification dubbed RFC 3406bis.
>>
>>
>> A URL for this Internet-Draft is:
>> http://www.ietf.org/internet-drafts/draft-ietf-urnbis-rfc2141bis-urn-01.txt
>>
>> Internet-Drafts are also available by anonymous FTP at:
>> ftp://ftp.ietf.org/internet-drafts/
>>
>> This Internet-Draft can be retrieved at:
>> ftp://ftp.ietf.org/internet-drafts/draft-ietf-urnbis-rfc2141bis-urn-01.txt
>> _______________________________________________
>> urn mailing list
>> urn@ietf.org
>> https://www.ietf.org/mailman/listinfo/urn
> 
> 
> [[ speaking as the document editor ]]
> 
> This draft version contains many updates,
> as outlined in the new Appendix D.5 of the draft.
> 
> Most importantly, based on the list discussion, the open issue
> regarding the NSS character repertoire is now regarded closed;
> there have been no concerns raised against now allowing "&" and "~"
> in the NSS syntax, and hence bringing it in alignment with RFC 3986.
> Hence, much of the material from s2.2 has been moved to a new
> Appendix (C), and Appendix B (previously: C) has been filled in now.
> Please note also that the previous Appendix A has been moved to the
> end of the memo and now has become Appendix E, which has caused
> some renumbering of the more persistent Appendices of the draft.
> 
> Due to time constraints and technical issues I had in the past with
> Internet/email access, the elaborations on the fragment identifier
> issues have not yet been fully aligned with the vast amount of list
> discussion we had in the past regarding this topic.  I regard this
> topic as not yet finally closed, and will bring my considerations
> to the list a.s.a.p.
> 
> So, in order to bring forward the discussion on the draft, please
> currently focus on the other open issues tagged in (editorial) Notes
> inside the draft, which have not received much comments so far.
> In particular, we should hopefully be able to close the NID syntax
> issues discussed in section 2.1 soon, with your help!
> 
> I plan to submit another revision of this draft during the IETF 82
> week, once draft submission is open again.
> 
> Kind regards,
>   Alfred.
> 
> _______________________________________________
> urn mailing list
> urn@ietf.org
> https://www.ietf.org/mailman/listinfo/urn
> 

-- 

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email juha.hakala@helsinki.fi, tel +358 50 382 7678