Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt

Hello all,

Plenty of comments below (sorry).

Peter Saint-Andre wrote:

>> We have not restarted the discussion of what URNs can be applied to. In
>> 7.1 (bottom of the page 16) we say that URNs serve as identifiers for
>> concrete and abstract objects that have network accessible instances
>> and/or metadata. In short, URNs must be actionable one way or another;
>> resolution should provide some kind of result.
> 
> Really? As far as I can see, this idea is not present in RFC 2141. Has
> something changed since then that would compel us to say that URNs must
> be actionable in the way you describe?

I don't think that anything has changed (or it wasn't my intention). 
 From my point of view, if a resource has a network accessible instance 
(or manifestation, as we librarians call them) and/or metadata related 
to these instances, then URLs will provide temporary access, and URNs 
something more: persistent access to the resource or surrogates such as 
descriptive metadata. I used the term actionable to refer to anything 
that the URN resolution can deliver.
> 
>> This specification is OK, especially if we keep in mind that the
>> abstract object itself can be a metadata record. For instance, some
>> national libraries routinely describe two variants of a printed book
>> (paperback & hardcover, for instance) in the same metadata record. If
>> record has an NBN, one may argue that it identifies the record, not the
>> books. From the URN resolution process point of view this makes sense,
>> because the URN will resolve to the metadata record.
>>
>> As the RFC2141bis already says, URNs may also be assigned to works,
>> which are abstract objects having 0-n manifestations (there are plenty
>> of works that have been lost, and many have reached us in truncated form).
> 
> Juha, that is all quite interesting. Can you propose text changes that
> would incorporate those insights?

Yes. We might add something like this into 2141bis or elsewhere:

Resources identified with URNs may be abstract (e.g. works such as 
Shakespeare's Hamlet) or embodied in some physical form (PDF/A version 
of the Finnish translation of Hamlet). An abstract entity may have 0-n 
digital manifestations in the Internet (and other kind of manifestations 
in the physical world). Since any digital manifestation will be rendered 
unusable in a few decades because file formats get outdated, all digital 
manifestations of a work should be linked to one another using their 
URNs. This will help the user to find a usable version, even if the 
identified manifestation can no longer be interpreted by the software at 
hand.

Resources may be simple (a single file or a fragment thereof) or complex 
(a set of inter-related files). For large sets surrogates may replace 
the delivery of the actual resources. Generally, services available 
shall depend both on the identified resources and the target systems. 
For works lacking physical representations in the Internet, descriptive 
metadata and location information can be provided. A long term 
preservation system can supply technical metadata about the file format 
of the identified manifestation of the work. A publisher's digital asset 
management system may be able to supply rights metadata about the 
identified resource, even if the resource itself is not accessible. (In 
the future, URN resolution services should enable the users to pinpoint 
the kind of metadata they need.)
> 
>> In chapter 2 <query> is discussed in the bottom of page 9 & top of the
>> page 10. We should refer here to RFC2483 (which specifies the resolution
>> services) and use an example which is based on an existing service. The
>> current example is a bit puzzling; for the time being it is not possible
>> to specify the type of metadata wanted because no such service exists.
> 
> Could you clarify whether your statement refers to that particular
> example or to resolution services as such?

I refer to the example in 2141bis. But my comment also makes the point 
that we should revise the URN services in such a way that the user can 
specify the type of metadata desired (the point made above in 
parenthesis).
> 
>> A note should be added, saying that the services nailed down in RFC2483
>> are not sufficient. For instance, it is not possible to specify what
>> type of metadata is needed (descriptive / administrative / structural)
>> and in which format (there are plenty of formats for descriptive
>> metadata, such as MARC21 or Dublin Core). RFC2483 refers to URC (Uniform
>> Resource Characteristics) which was never implemented in practice. (As
>> an aside, there are plenty of other reasons for updating that RFC.)
> 
> Isn't that a matter for RFC 2483 (or 2483bis), not the core URN spec?
> There is no necessary connection between URNs and resolution, I think.

There is no need to add the note if revision of RFC 2483 is added to the 
charter of the URNBIS WG ;-). My intention is to complete the first 
version of 2483bis in January and send it as a private contribution.

URNs and resolution services are parts of the URN system. Without 
services URN assignment would not make sense. But technically the two 
are not interconnected since we can keep on adding new services without 
tweaking the URNs themselves. In the resolution process the service 
request must never be part of the URN itself (although we can use 
<query> to piggyback service parameters).

>> On page 10, two options for supporting fragment identifiers are
>> specified. I am not sure this dichotomy works. Method a) (fragment
>> identifiers are assigned individually) is of course OK. But if fragment
>> identifiers are generally applicable (method b), then there is no need
>> to repeat the specification at the namespace level. Assuming that
>> fragments can be used with PDF documents, then the same principles
>> should apply across all namespaces which do approve fragment usage.
> 
> Please clarify. Do you mean the syntax of fragment identifiers, or the
> semantics? I think the semantics differ based on the media type of the
> resource that might be retrieved, as we've discussed.

If a file format (for instance, PDF) has well undertood rules as regards 
URI fragment usage, then these rules should apply in all namespaces 
where identifiers can be assigned to PDF fragments. In the namespace 
registration request all that is needed is to tell that fragment 
identification is allowed.
> 
> See Section 3.5 of RFC 3986:
> 
>    The fragment's format and resolution is therefore
>    dependent on the media type [RFC2046] of a potentially retrieved
>    representation, even though such a retrieval is only performed if the
>    URI is dereferenced.

As an aside, I don't know how far we can get with media types (2046) 
only. Fragment usage is dependent on file formats. Rules for text/plain 
are not the same as the rules for OOXML text documents. And the 
identified resources may encompass multiple media types; for instance, 
an e-book in EPUB 3.0 may contain the text of the book itself, sound 
recording of an interview of the author, an trailer of a film based on 
the book, and so on. It remains to be seen how such bundles will be 
identified (although my gut feeling is that each EPUB 3 book may consume 
a whole set of identfiers, one for its each constituent part).

>> Second, most standard identifiers have well known syntax. ISSN has 8
>> characters, ISBN either 10 or 13, and ISTC 16. Parsing urn:issn is easy;
>> after 8 characters you are done, even if the next character is not from
>> the excluded set. Any namespace specific rules for parsing and lexical
>> equivalence must be expressed in the namespace registration.
> 
> How is this matter *not* addressed by Appendix C of RFC 3986?

Appendix C is based on the idea that some kind of delimiting characters 
are used. I do admit that this will often be the case. But this does not 
need to happen. For instance, somebody may add a home made fragment into 
URN:ISBN. Such fragment should not be there, but no method described in 
Appendix C now would enable us to drop it.

If there are well understood rules on how NSS should look like in a 
given namespace (examples of this include urn:issn or urn:isbn), then we 
can parse namespace specific strings in ways not foreseen in RFC 3986. 
Not only do we always know where the string ends, we can also check if 
the string is correct (if the identifier contains a check digit).

>> In chapter 5. (top of the page 15), examples should include <query> and
>> <fragment>. As the former is not part of the URN, <query> must be
>> ignored in the analysis, while <fragment> must not.
> 
> Do you mean that all of the examples should include those components, or
> that some examples should be added?

I mean that examples with query and fragment should be added.

>> Terminological comment
>>
>> We speak (almost) interchangeably about objects which have instances, or
>> resources which have versions. We may also refer either to resource
>> characteristics or object metadata, or use library related concepts of
>> work, expression and manifestation.
>>
>> Consolidation of the terminology used would make the documents easier to
>> understand. I suggest that we carry out such a task between the authors
>> before the next versions of these I-Ds are published.
> 
> That sounds like a good idea. Do you have a plan for releasing the next
> version of the specification?

No dates have been fixed yet. I should be able to send revised versions 
of 3187 and 3188 to Alfred Hönes quite soon (that being the first 
priority), but we have not discussed when he would be able to publish 
them and the next version of 2141bis.

At this stage it would be highly useful to get feedback from other 
people on the list. I do recognize that we are just revising existing 
documents, and the revisions we have made should not be controversial, 
but actually confirming that the texts are fine would give us a more 
solid basis to move on - and show to the IETF that the work relevant. 
And it would be even better to receive criticism to help us in improving 
the documents.

Best regards,

Juha
> 
> Thanks!
> 
> Peter
> 
>> All the best,
>>
>> Juha
>>
>> Alfred � wrote:
>>> The IETF I-D Submission Tool <internet-drafts at ietf.org> wrote:
>>>
>>>> A New Internet-Draft is available from the on-line Internet-Drafts
>>>> directories. This draft is a work item of the
>>>> Uniform Resource Names, Revised Working Group of the IETF.
>>>>
>>>>   Title      : Uniform Resource Name (URN) Syntax
>>>>   Author(s)  : Alfred Hoenes
>>>>   Filename   : draft-ietf-urnbis-rfc2141bis-urn-01.txt
>>>>   Pages      : 28
>>>>   Date       : 2011-10-31
>>>>
>>>>   Uniform Resource Names (URNs) are intended to serve as persistent,
>>>>   location-independent, resource identifiers.  This document serves as
>>>>   the foundation of the 'urn' URI Scheme according to RFC 3986 and sets
>>>>   forward the canonical syntax for URNs, which subdivides URNs into
>>>>   "namespaces".  A discussion of both existing legacy and new
>>>>   namespaces and requirements for URN presentation and transmission are
>>>>   presented.  Finally, there is a discussion of URN equivalence and how
>>>>   to determine it.  This document supersedes RFC 2141.
>>>>
>>>>    The requirements and procedures for URN Namespace registration
>>>>    documents are currently set forth in RFC 3406, which is also being
>>>>    updated by a companion, revised specification dubbed RFC 3406bis.
>>>>
>>>>
>>>> A URL for this Internet-Draft is:
>>>> http://www.ietf.org/internet-drafts/draft-ietf-urnbis-rfc2141bis-urn-01.txt
>>>>
>>>>
>>>> Internet-Drafts are also available by anonymous FTP at:
>>>> ftp://ftp.ietf.org/internet-drafts/
>>>>
>>>> This Internet-Draft can be retrieved at:
>>>> ftp://ftp.ietf.org/internet-drafts/draft-ietf-urnbis-rfc2141bis-urn-01.txt
>>>>
>>>> _______________________________________________
>>>> urn mailing list
>>>> urn@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/urn
>>>
>>> [[ speaking as the document editor ]]
>>>
>>> This draft version contains many updates,
>>> as outlined in the new Appendix D.5 of the draft.
>>>
>>> Most importantly, based on the list discussion, the open issue
>>> regarding the NSS character repertoire is now regarded closed;
>>> there have been no concerns raised against now allowing "&" and "~"
>>> in the NSS syntax, and hence bringing it in alignment with RFC 3986.
>>> Hence, much of the material from s2.2 has been moved to a new
>>> Appendix (C), and Appendix B (previously: C) has been filled in now.
>>> Please note also that the previous Appendix A has been moved to the
>>> end of the memo and now has become Appendix E, which has caused
>>> some renumbering of the more persistent Appendices of the draft.
>>>
>>> Due to time constraints and technical issues I had in the past with
>>> Internet/email access, the elaborations on the fragment identifier
>>> issues have not yet been fully aligned with the vast amount of list
>>> discussion we had in the past regarding this topic.  I regard this
>>> topic as not yet finally closed, and will bring my considerations
>>> to the list a.s.a.p.
>>>
>>> So, in order to bring forward the discussion on the draft, please
>>> currently focus on the other open issues tagged in (editorial) Notes
>>> inside the draft, which have not received much comments so far.
>>> In particular, we should hopefully be able to close the NID syntax
>>> issues discussed in section 2.1 soon, with your help!
>>>
>>> I plan to submit another revision of this draft during the IETF 82
>>> week, once draft submission is open again.
>>>
>>> Kind regards,
>>>   Alfred.
>>>
>>> _______________________________________________
>>> urn mailing list
>>> urn@ietf.org
>>> https://www.ietf.org/mailman/listinfo/urn
>>>
> 

-- 

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email juha.hakala@helsinki.fi, tel +358 50 382 7678