Re: [urn] Request for review and comments draft-ietf-urnbis-rfc2141bis-urn-16

John C Klensin <john-ietf@jck.com> Thu, 05 May 2016 19:17 UTC

Date: Thu, 05 May 2016 15:17:05 -0400
From: John C Klensin <john-ietf@jck.com>
To: "Hakala, Juha E" <juha.hakala@helsinki.fi>, urn@ietf.org
Message-ID: <82871E546D6FB8DD942E99B7@JcK-HP8200.jck.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Archived-At: <http://mailarchive.ietf.org/arch/msg/urn/r3MnqLMg10T6nRBCLDJ7MNonL9c>
Cc: jonathanmtclark@gmail.com
Subject: Re: [urn] Request for review and comments draft-ietf-urnbis-rfc2141bis-urn-16
Precedence: list

Disclaimer: everything I write in this note is a personal
opinion, with no connection to my editor role(s) except where
notes are explicitly identified as "Editor:"/

--On Monday, April 25, 2016 10:18 +0000 "Hakala, Juha E"
<juha.hakala@helsinki.fi> wrote:

> Hello John; all, 
> 
> Further comments below. 
> 
>> -----Original Message-----
>> From: John C Klensin [mailto:john-ietf@jck.com]
>> Sent: 21. huhtikuuta 2016 20:30
>> To: Hakala, Juha E <juha.hakala@helsinki.fi>; urn@ietf.org
>> Subject: RE: [urn] Request for review and comments
>> draft-ietf-urnbis- rfc2141bis-urn-16
>> 
>> Juha,
>> 
>> Thanks for the careful reading.  A few comments inline below.
>> 
>> --On Thursday, April 21, 2016 06:57 +0000 "Hakala, Juha E"
>> <juha.hakala@helsinki.fi> wrote:
>> 
>> > Hello John; all,
>> > 
>> > this new version of rfc2141bis is an improvement, and I
>> > believe we are getting closer to the goal of this process.
>> > 
>> > I am fine with the decision to postpone work on
>> > r-component, although that means further delay of some
>> > development projects the library community has been talking
>> > about.
>> 
>> Speaking personally (not as editor), I'd like to understand
>> those projects, particularly the needs they are trying to
>> address, in more detail.  That should, however, probably be
>> off-list or delayed until we've gotten 2141bis and its
>> relatives finished.
> 
> In short, resolvers must be made a lot smarter. All PIDs are
> facing the same problem. Future resolvers must offer more
> options to the users instead of just supporting persistent
> linking to the identified resource. Improvements in systems
> with which libraries, publishers etc. are managing digital
> resources both enable and require such improvements in
> resolvers. Currently our (and for instance publishers')
> applications are not too smart, but things will change in few
> years' time, and URN syntax should be ready to accommodate
> these new requirements when they materialize.  

This seems entirely reasonable, but it is hard to figure out how
to design things to accommodate future needs in any detail
(beyond leaving placeholders in syntax and documentation for
future extensions) unless one has much more clear distinctions
and understanding about what will be needed beyond "smarter" and
"more options".  In particular, it seems to me that trying to
make those provisions based on your comments requires that we
address the issue as to whether choices and functionality of
resolvers should be under the control of those who manage the
namespace (very much implied by 2141 and the was 2141bis is now
written) and the ideas of Sean and others about specifying the
resolver itself as part of URN syntax.  That decision may (or
might not, but I don't see a way to avoid it) get tied up with
the traditional information science (and epistemological)
question of authority.  More broadly and referring back to some
of my earlier comments, it is not obvious that trying to deal
with all future functional requirements and offerings by loading
them onto URNs or even URIs is the right answer.  Just as we
need an increasingly-powerful HTTP and HTTPS as well as the
associated URLs, there may be places where offering the user
additional options along the lines you outline may require
protocol mechanisms above and beyond what can be (or should
reasonably be attempted to be) accommodated in a URI of any sort.

> For instance, when a future user locates a relevant document,
> she should be able to easily retrieve rights metadata
> (copyright owner, license terms etc.) related to the document.
> It would be easy to list many other requirements and
> functionalities which have to do with metadata (descriptive /
> administrative / structural) about digital objects.  

Sure.  But that, again, raises the authority issue as well as,
at least IMO, making it clear that we don't fully understand the
boundary between pseudo-queries and r-components.   If the
document were a classical web object rather than something for
which we want a persistent (and perhaps a little bit more
abstract) identifier, then something like:


http://www.example.com/documentID?rights-metadata=copyright-owner

would be entirely reasonable.  I'd actually be surprised if
similar applications and syntax were not in use in the world
today.  Things do get more complex when the document identifier
is persistent and sufficiently abstract that one might need to
look in different ways or places to find the document and the
associated metadata, but I'm not sure we understand how to
generalize that yet, much less how to turn it into a URN spec.
The problem is further complicated the broader the range of
objects, pseudo-objects, and abstract designators (whether all
of them are "resources" or not) we expect to associate with URNs
because the right answers for some may not be the right answers
for all of them.

>> > There are some minor issues in the text that should be
>> > corrected.
>> > 
>> > In 2.3.1 the draft says:
>> > 
>> > "If a URN resolves to a URL, the q-component from the URN
>> > is copied verbatim to the query component of the URL."
>> > 
>> > However, we are no longer using URI query component syntax
>> > and our added "=" may cause problems. It is probably better
>> > to say something like:
>> > 
>> > "If a URN resolves to a URL, the q-component from the URN
>> > is copied to the query component of the URL without "=", so
>> > that the query syntax matches URI query component."
>> 
>> I actually checked that sentence when I rewrote things for the
>> "?=" convention and I believe it is correct.    The reason is
>> that, in the ABNF (and I hope consistently in the text), the
>> q-component is consistently the string without the introducer.
> 
> OK. I missed that. But it might be a good idea to minimize the
> chance that others make the same mistake.

Rereading the subsequent sentence, which talked about
q-components "beginning with" the "=" could obviously increase
the risk of confusion.  
Editor: text changed for -17 (given an obvious assumption, see
subsequent status note).

>> For example, Section 2 of 2141bis-16 includes
>> 
>>       rq-components =  ( "?="  q-component
>>                           [ "?+" r-component ] ) /
>>                        ( "?+" r-component
>>                           [ "?="  q-component ] )
>>       q-component   = pchar *( pchar / "/" / "?" )
>> 
>> That is consistent with the way <query> is defined in 3968,
>> i.e., from Section 3 of that document:
>> 
>>     URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
>> 
>> So copying the q-component (which does not include the "?=")
>> into the query (which does not contain the "?") actually is
>> correct.  I have doubts about whether "verbatim" is actually
>> helpful, but that is another issue.
>> 
>> If you and others think it would be helpful, it would be
>> fairly easy to incorporate a sentence into the paragraph
>> after the 2141bis syntax partially quoted above and/or after
>> the "copied"
>> sentence warning that "q-component", etc., refer to the
>> string and do not contain the introducing delimiter.  An even
>> better solution might be to incorporate an example if someone
>> wants to suggest one.
> 
> Such sentence + an example would be useful. 
> 
> An example can be built by pre-using the example further down
> in the text: 
> 
> urn:example:weather?=op=map&lat=39.56
>          &lon=-104.85
> 
> could be resolved to something like
> 
> https://weatherapp.org?op=map&lat=39.56
>          &lon=-104.85

Editor: Done

 
>> > In 2.3.3 there is a paragraph
>> > 
>> > "Clients SHOULD NOT pass f-components to resolution
>> > services unless those services also perform object
>> > retrieval and interpretation functions."
>> > 
>> > The purpose of this is to allow the resolution service to
>> > act as a "client" which retrieves the entire identified
>> > object, applies the fragment to it and sends the result to
>> > the original client. IMO this would often give no added
>> > value. If the only purpose of the fragment is to take the
>> > user to a certain point within the document, there is no
>> > way the resolver can do this on behalf of the client the
>> > customer is using. So even if the fragment were included in
>> > the URN, the resolver should just ignore it, and pass the
>> > entire document to the client which then applies the
>> > fragment to it.
>> 
>> Actually, the reason for it was a little different and that
>> is to not constrain implementation models.  In one resolution
>> model (using the URN-> URL case as the obvious example), the
>> one I think you have in mind, the approach is
>> 
>>     client parses URN into pieces and
>>     client sends NID and NSS
>>            -> URN-resolution-service
>>            <-   (resulting URL)
>>        -or-  sends NSS
>>            -> NID-specific resolution-service
>>            <-  (resulting URL)
>> 
>>     client adds whatever needs to be added to URL
>>     client -> URL-resolution-service
>>            <-  (resulting resource/ object)
>>     client evaluates fragment, if any, against object
>>     client returns object, or object subset, to user
>> 
>> The other model, which I think is worth preserving, is
>> 
>>     client does not parse URN but
>>         passes the whole thing
>>             -> URN processor
>>             <- (resulting resource / object and
>>                 fragment ID
>>     client evaluates fragment, if any, against object
>>     clients returns object, object subset, or other
>>         result to user.
>> 
>> There are several variations on those themes, but the basic
>> difference in that, in the first model, the client has to
>> break the URI into pieces and then manage the processing/
>> evaluation of the pieces.  In the second, most of that is
>> handed off to what looks to the client like a black box that
>> emits a resource or, in extreme cases, conformation that the
>> desired action has been completed.
>> 
>> In addition to suggesting two different processing models,
>> either of which might be implemented in libraries or the like
>> for a broad range of URNs, the relationship between the two
>> also suggests a distinction that 2141bis does not make.  That
>> specification now describes two types of URNs, those that
>> resolve in some way (for some very broad definition of
>> "resolve") and those that do not resolve at all, with the
>> latter called "abstract designators".   It is also possible to
>> distinguish between namespaces of "objects" that are similar
>> to documents and "objects" that are really actions to be
>> performed.
> 
> I cannot see any immediate use for utilizing identifiers for
> actions like this, at least in library domain, but eventually
> such approach may become relevant. 

I think this is more about how about how things are implemented
than about any particular area of application.  I could try to
justify/ rationalize it further but let's just stick with the
belief that invalidating an otherwise-reasonable implementation
option by our choice of words does not seem wise.  It may or may
not be worth noting that, if anyone still believes in Object or
Method-oriented programming with high level objects, the very
same arguments that justify talking about generic URI parsers
equally justify creating a generic URN method that is called
upon to do _all_ URN processing and that might return either an
object or some success or failure indication structure.  

>...
>> > In 4.2 there is a somewhat challenging (at least for a
>> > non-English speaker) sentence
>> > 
>> > "In part because of the separation of URN semantics from
>> > more general    URI syntax
>> > [I-D.ietf-urnbis-semantics-clarif], generic URI processors
>> > need to pay special attention to the parsing and analysis
>> > rules of    RFC 3986 and, in particular, must treat the URI
>> > as opaque unless the    scheme and its requirements are
>> > recognized."
>>...
>> What that admittedly-difficult sentence is trying to say is
>> closer to:
>> 
>> 	'If one has a generic processor for URI, the rules of
>> 	3986 apply and must be followed.  Only if the processor
>> 	recognizes the "urn" scheme and understands the
>> 	applicability of this specification as a result it is
>> 	appropriate to try to apply its extended rules.'
>> 
>> Put that way, the statement is very nearly tautologically
>> trivial and should simply be dropped.  But I think we added
>> it to help define the "syntax only but, like other schemes,
>> we get to further restrict and interpret the syntax" boundary
>> between 2141bis and 3986.  Suggestions welcome.
> 
> I should not have used the term "identifier scheme". What I
> was aiming at was that parsing a URN is a double challenge for
> a generic URI processor. First, the processor must recognize
> the "urn" scheme. And next, the processor must know how to
> deal with identifier system (URN namespace). A parser capable
> of resolving urn:isbn's may not know itself how to deal with
> urn:issn's. In an ideal world equipped with resolver discovery
> service, urn:isbn resolver would know how to find the urn:issn
> resolver or any other urn namespace specific resolver out
> there. For the time being, finding the right resolver (there
> may be several for a single namespace) can be a challenge
> unless the resolver location is embedded in the URI with the
> URN.
 
> Parsing URNs is to some extent namespace specific . Some
> namespaces may not provide any services, some may have a whole
> set of them. "Scheme-appropriate processing" seems to relate
> to URNs in general. Recognizing that a string is a URN is the
> necessary first step, but from practical point of view it is
> more important to provide appropriate namespace specific
> parsing of URNs.    


Indeed.  That particular set of problems are at the core of the
reasons why we went down the path of trying to separate URNs
from (generic) URIs entirely.  Doing so would allow a clear
conceptual model in which a URN was a high-level and fairly
abstract identifier with ways of talking about, not just objects
or classes of objects but "metadata" about each.  The WG (or at
least those who showed up, some of them outraged, at a meeting
or two) didn't like that idea, so we did the syntax-only
approach instead, treating ourselves to the difficulties about
what can and cannot done in a generic way, discussions of the
True Meaning of URIs, and so on.

If people really want to revisit issues and decisions that were
apparently closed, perhaps we should come back to that one.
 
>> > Chapter 5 specifies three constraints: uniqueness,
>> > consistent assignment and assignment according to a common
>> > definition. I would like to add fourth constraint,
>> > persistence. It is already there between the lines (URN is
>> > never reassigned to a different resource) but we could also
>> > make the point that sometimes URNs may even outlive
>> > identified resources.
>> 
>> Send text.  I can't remember whether it was an explicit
>> decision or not, but persistence may have been omitted from
>> Section 5 after it became necessary to add what is now
>> Section 1.1 to reflect the fact that we couldn't really agree
>> on a
>> universally-applicable definition for persistence.   Of
>> course, we could simply insert a persistence constraint to
>> Section 5 and point back to 1.1 for a [non-]definition.
> 
> What about this formulation to replace the current requirement
> 1 in chapter 5: 
> 
> The "uniqueness" and "persistence" constraints mean that once
> assigned, an identifier within the        namespace is never
> altered and never reassigned to a different resource (for the
> kind of "resource"        identified by URNs assigned within
> the namespace) even If the URN is found to have been issued in
> error. This holds        true even if the identifier itself is
> deprecated or becomes obsolete.
> 
> Note that I am not comfortable with "identifier within the
> namespace is never assigned to more than one resource ". In
> ISBN namespace, for instance, there are separate ISBNs for all
> three parts of The Lord of the Rings, but there is also 4th
> ISBN which covers all three parts. So ISBN namespace does not
> meet the requirement "never assigned to more than one
> resource", since in the ISBN namespace the concept of book is
> complex and covers both multivolume works as single entity in
> addition to each volume.   

Works for me.  To others have comments before I put the editor
hat on and put this in?

>...
>> > Finally, I am OK with the idea of having two kinds of URNs:
>> > resource identifiers and abstract designators which do not
>> > and will not support resolution services. But I find it
>> > difficult to accept (as stated in Introduction) that these
>> > abstract designators, although persistent, do not identify
>> > a resource.
>> > What is the purpose of a URN that does not identify
>> > anything? It is definitely possible to identify something
>> > without being resolvable.
>> 
>> First of all, I agree that things are still a little bit
>> confused (see discussion above).  I am hoping that complete
>> clarification is not in the critical path. 
 
> I don't think so. 
 
>>   But the specific
>> answer to your question is that we've got several situations
>> (the XMPP one is the most-cited case) in which the presence
>> or absence of a URN, or the presence of a URN with a
>> particular set of values in the NSS, simply tells whatever
>> (e.g., client) is interpreting it to take or not take some
>> action or to behave in
>> one way or not another.   For some months, I referred to such
>> URNs as "indicators" (sometimes prefixed by "pure" or
>> "abstract") to distinguish them from "identifiers" in the
>> sense that your comments above implied.  There are some
>> disadvantages to the "indicator" terminology.  While I don't
>> remember an explicit discussion of that terminology, Peter
>> preferred "abstract designator", no one else seemed to
>> object, and I didn't think it was worth arguing about.  If
>> you see a useful way to clarify that situation, please
>> suggest it.
 
> I cannot... but I can live with this situation. This part of
> the spec may become a headache if and when it becomes
> necessary to translate and explain "abstract designator" in
> Finnish ;-). 

Or any number of other languages that do share many or most
concepts with English.  But difficulties in translating IETF
technical specifications to other languages without loss of
information are, unfortunately, nothing new.

thanks,
    john

[urn] Request for review and comments draft-ietf-… John C Klensin
Re: [urn] Request for review and comments draft-i… Hakala, Juha E
Re: [urn] Request for review and comments draft-i… John C Klensin
Re: [urn] Request for review and comments draft-i… Hakala, Juha E
Re: [urn] Request for review and comments draft-i… John C Klensin