Re: [urn] Request for review and comments draft-ietf-urnbis-rfc2141bis-urn-16

"Hakala, Juha E" <juha.hakala@helsinki.fi> Mon, 25 April 2016 10:18 UTC

From: "Hakala, Juha E" <juha.hakala@helsinki.fi>
To: John C Klensin <john-ietf@jck.com>, "urn@ietf.org" <urn@ietf.org>
Thread-Topic: [urn] Request for review and comments draft-ietf-urnbis-rfc2141bis-urn-16
Thread-Index: AQHRmR6Q2fk1u9GVBEu61tt2o2725J+T7aPwgADG9ICABawhEA==
Date: Mon, 25 Apr 2016 10:18:21 +0000
Message-ID: <VI1PR07MB1727924C29031607F4776D57FA620@VI1PR07MB1727.eurprd07.prod.outlook.com>
References: <8E2E925C5A90E22F4B1BF36D@JcK-HP8200.jck.com> <VI1PR07MB172742D38D06B7523B514414FA6E0@VI1PR07MB1727.eurprd07.prod.outlook.com> <CE3CB3796DF8CBEFB802D9C6@JcK-HP8200.jck.com>
In-Reply-To: <CE3CB3796DF8CBEFB802D9C6@JcK-HP8200.jck.com>
Accept-Language: fi-FI, en-US
Content-Language: en-US
spamdiagnosticoutput: 1:23
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Apr 2016 10:18:21.9050 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 98ae7559-10dc-4288-8e2e-4593e62fe3ee
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR07MB1725
Archived-At: <http://mailarchive.ietf.org/arch/msg/urn/GKqLj9s-B5GF3Gxtgi5U5gPGNYM>
Cc: "jonathanmtclark@gmail.com" <jonathanmtclark@gmail.com>
Subject: Re: [urn] Request for review and comments draft-ietf-urnbis-rfc2141bis-urn-16
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Apr 2016 10:18:49 -0000

Hello John; all, 

Further comments below. 

> -----Original Message-----
> From: John C Klensin [mailto:john-ietf@jck.com]
> Sent: 21. huhtikuuta 2016 20:30
> To: Hakala, Juha E <juha.hakala@helsinki.fi>; urn@ietf.org
> Subject: RE: [urn] Request for review and comments draft-ietf-urnbis-
> rfc2141bis-urn-16
> 
> Juha,
> 
> Thanks for the careful reading.  A few comments inline below.
> 
> --On Thursday, April 21, 2016 06:57 +0000 "Hakala, Juha E"
> <juha.hakala@helsinki.fi> wrote:
> 
> > Hello John; all,
> >
> > this new version of rfc2141bis is an improvement, and I believe we are
> > getting closer to the goal of this process.
> >
> > I am fine with the decision to postpone work on r-component, although
> > that means further delay of some development projects the library
> > community has been talking about.
> 
> Speaking personally (not as editor), I'd like to understand those projects,
> particularly the needs they are trying to address, in more detail.  That
> should, however, probably be off-list or delayed until we've gotten 2141bis
> and its relatives finished.

In short, resolvers must be made a lot smarter. All PIDs are facing the same problem. Future resolvers must offer more options to the users instead of just supporting persistent linking to the identified resource. Improvements in systems with which libraries, publishers etc. are managing digital resources both enable and require such improvements in resolvers. Currently our (and for instance publishers') applications are not too smart, but things will change in few years' time, and URN syntax should be ready to accommodate these new requirements when they materialize.  

For instance, when a future user locates a relevant document, she should be able to easily retrieve rights metadata (copyright owner, license terms etc.) related to the document. It would be easy to list many other requirements and functionalities which have to do with metadata (descriptive / administrative / structural) about digital objects.   

> 
> > There are some minor issues in the text that should be corrected.
> >
> > In 2.3.1 the draft says:
> >
> > "If a URN resolves to a URL, the q-component from the URN is copied
> > verbatim to the query component of the URL."
> >
> > However, we are no longer using URI query component syntax and our
> > added "=" may cause problems. It is probably better to say something
> > like:
> >
> > "If a URN resolves to a URL, the q-component from the URN is copied to
> > the query component of the URL without "=", so that
> > the query syntax matches URI query component."
> 
> I actually checked that sentence when I rewrote things for the
> "?=" convention and I believe it is correct.    The reason is
> that, in the ABNF (and I hope consistently in the text), the q-component is
> consistently the string without the introducer.

OK. I missed that. But it might be a good idea to minimize the chance that others make the same mistake.

> For example, Section 2 of 2141bis-16 includes
> 
>       rq-components =  ( "?="  q-component
>                           [ "?+" r-component ] ) /
>                        ( "?+" r-component
>                           [ "?="  q-component ] )
>       q-component   = pchar *( pchar / "/" / "?" )
> 
> That is consistent with the way <query> is defined in 3968, i.e., from Section
> 3 of that document:
> 
>     URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
> 
> So copying the q-component (which does not include the "?=") into the query
> (which does not contain the "?") actually is correct.  I have doubts about
> whether "verbatim" is actually helpful, but that is another issue.
> 
> If you and others think it would be helpful, it would be fairly easy to
> incorporate a sentence into the paragraph after the 2141bis syntax partially
> quoted above and/or after the "copied"
> sentence warning that "q-component", etc., refer to the string and do not
> contain the introducing delimiter.  An even better solution might be to
> incorporate an example if someone wants to suggest one.

Such sentence + an example would be useful. 

An example can be built by pre-using the example further down in the text: 

urn:example:weather?=op=map&lat=39.56
         &lon=-104.85

could be resolved to something like

https://weatherapp.org?op=map&lat=39.56
         &lon=-104.85
 
> 
> > In 2.3.3 there is a paragraph
> >
> > "Clients SHOULD NOT pass f-components to resolution services unless
> > those services also perform object retrieval and interpretation
> > functions."
> >
> > The purpose of this is to allow the resolution service to act as a
> > "client" which retrieves the entire identified object, applies the
> > fragment to it and sends the result to the original client. IMO this
> > would often give no added value. If the only purpose of the fragment
> > is to take the user to a certain point within the document, there is
> > no way the resolver can do this on behalf of the client the customer
> > is using. So even if the fragment were included in the URN, the
> > resolver should just ignore it, and pass the entire document to the
> > client which then applies the fragment to it.
> 
> Actually, the reason for it was a little different and that is to not constrain
> implementation models.  In one resolution model (using the URN-> URL
> case as the obvious example), the one I think you have in mind, the approach
> is
> 
>     client parses URN into pieces and
>     client sends NID and NSS
>            -> URN-resolution-service
>            <-   (resulting URL)
>        -or-  sends NSS
>            -> NID-specific resolution-service
>            <-  (resulting URL)
> 
>     client adds whatever needs to be added to URL
>     client -> URL-resolution-service
>            <-  (resulting resource/ object)
>     client evaluates fragment, if any, against object
>     client returns object, or object subset, to user
> 
> The other model, which I think is worth preserving, is
> 
>     client does not parse URN but
>         passes the whole thing
>             -> URN processor
>             <- (resulting resource / object and
>                 fragment ID
>     client evaluates fragment, if any, against object
>     clients returns object, object subset, or other
>         result to user.
> 
> There are several variations on those themes, but the basic difference in
> that, in the first model, the client has to break the URI into pieces and then
> manage the processing/ evaluation of the pieces.  In the second, most of
> that is handed off to what looks to the client like a black box that emits a
> resource or, in extreme cases, conformation that the desired action has been
> completed.
> 
> In addition to suggesting two different processing models, either of which
> might be implemented in libraries or the like for a broad range of URNs, the
> relationship between the two also suggests a distinction that 2141bis does
> not make.  That specification now describes two types of URNs, those that
> resolve in some way (for some very broad definition of
> "resolve") and those that do not resolve at all, with the latter
> called "abstract designators".   It is also possible to
> distinguish between namespaces of "objects" that are similar to documents
> and "objects" that are really actions to be performed.

I cannot see any immediate use for utilizing identifiers for actions like this, at least in library domain, but eventually such approach may become relevant. 

> The latter would strongly favor the second implementation model (as well
> as concepts closer to some views of what r-components
> should be about).   The distinction has not been made in 2141bis
> for a number of reasons, including its non-discussion in the WG and some
> concern, reflected in a different form in my recent note to Sean, as to
> whether such creatures should be considered as URNs or perhaps as URAs
> ("... resource actions"?) or individual URI schemes.
> 
> > On the other hand, clients usually do not send fragments to servers /
> > resolvers, so it might be non-trivial to change this behavior even if
> > the resolver could do something meaningful with fragments.
> 
> Yes.  Indeed, 3986 can be read as prohibiting anything but the
> client from doing anything at all with fragments.   On the other
> hand, interpreting 3986 that way may violate the principle that Internet
> standards specify what happens (and is visible "on the
> wire") rather than where and how it is done.
> 
> > The easiest solution is to just write
> >
> > Clients SHOULD NOT pass f-components to resolution services functions
> > even if those services also perform object retrieval and
> > interpretation functions."
> 
> > since then there is no pressure to change the normal behavior of
> > clients, or to add entirely new kind of functionality to resolvers,
> > which in the end of the day might not be that useful.
> 
> I can live with that although I believe the restriction is unnecessary and
> might, in some edge cases, might be harmful.
> Need to hear from others about this.

Now that I understand the purpose of the text, I take back my suggestion. Let's keep the current formulation. 
> 
> > In 4.2 there is a somewhat challenging (at least for a non-English
> > speaker) sentence
> >
> > "In part because of the separation of URN semantics from more
> > general    URI syntax [I-D.ietf-urnbis-semantics-clarif],
> > generic URI processors    need to pay special attention to the
> > parsing and analysis rules of    RFC 3986 and, in particular,
> > must treat the URI as opaque unless the    scheme and its
> > requirements are recognized."
> >
> > It might be better to say something like "... must treat the URN as
> > opaque unless the identifier scheme is recognized and its requirements
> > are known and met."
> 
> I wonder.  Noting that 3986 uses a definition of "identifier"
> that you and others have questioned and that it doesn't use the term
> "identifier scheme" at all, might this just add confusion?
> In 3986-speak, the scheme associated with (or part of) the identifier is well
> defined and, for the purposes of 2141bis, is always "urn" unless we are
> specifically talking about URIs, I'm not even sure how to read your modified
> sentence.
> 
> What that admittedly-difficult sentence is trying to say is closer to:
> 
> 	'If one has a generic processor for URI, the rules of
> 	3986 apply and must be followed.  Only if the processor
> 	recognizes the "urn" scheme and understands the
> 	applicability of this specification as a result it is
> 	appropriate to try to apply its extended rules.'
> 
> Put that way, the statement is very nearly tautologically trivial and should
> simply be dropped.  But I think we added it to help define the "syntax only
> but, like other schemes, we get to further restrict and interpret the syntax"
> boundary between 2141bis and 3986.  Suggestions welcome.

I should not have used the term "identifier scheme". What I was aiming at was that parsing a URN is a double challenge for a generic URI processor. First, the processor must recognize the "urn" scheme. And next, the processor must know how to deal with identifier system (URN namespace). A parser capable of resolving urn:isbn's may not know itself how to deal with urn:issn's. In an ideal world equipped with resolver discovery service, urn:isbn resolver would know how to find the urn:issn resolver or any other urn namespace specific resolver out there. For the time being, finding the right resolver (there may be several for a single namespace) can be a challenge unless the resolver location is embedded in the URI with the URN.

Parsing URNs is to some extent namespace specific . Some namespaces may not provide any services, some may have a whole set of them. "Scheme-appropriate processing" seems to relate to URNs in general. Recognizing that a string is a URN is the necessary first step, but from practical point of view it is more important to provide appropriate namespace specific parsing of URNs.    

> 
> > Chapter 5 specifies three constraints: uniqueness, consistent
> > assignment and assignment according to a common definition. I would
> > like to add fourth constraint, persistence. It is already there
> > between the lines (URN is never reassigned to a different resource)
> > but we could also make the point that sometimes URNs may even outlive
> > identified resources.
> 
> Send text.  I can't remember whether it was an explicit decision or not, but
> persistence may have been omitted from Section 5 after it became
> necessary to add what is now Section 1.1 to reflect the fact that we couldn't
> really agree on a
> universally-applicable definition for persistence.   Of course,
> we could simply insert a persistence constraint to Section 5 and point back
> to 1.1 for a [non-]definition.

What about this formulation to replace the current requirement 1 in chapter 5: 

The "uniqueness" and "persistence" constraints mean that once assigned, an identifier within the
       namespace is never altered and never reassigned to a different resource (for the kind of "resource"
       identified by URNs assigned within the namespace) even If the URN is found to have been issued in error. This holds
       true even if the identifier itself is deprecated or becomes obsolete.

Note that I am not comfortable with "identifier within the namespace is never assigned to more than one resource ". In ISBN namespace, for instance, there are separate ISBNs for all three parts of The Lord of the Rings, but there is also 4th ISBN which covers all three parts. So ISBN namespace does not meet the requirement "never assigned to more than one resource", since in the ISBN namespace the concept of book is complex and covers both multivolume works as single entity in addition to each volume.   


> > There is a typo in 6.4.6: handilng should be handling.
> 
> Fixed in working draft for -17.  Thanks.
> 
> > Finally, I am OK with the idea of having two kinds of URNs:
> > resource identifiers and abstract designators which do not and will
> > not support resolution services. But I find it difficult to accept (as
> > stated in Introduction) that these abstract designators, although
> > persistent, do not identify a resource.
> > What is the purpose of a URN that does not identify anything?
> > It is definitely possible to identify something without being
> > resolvable.
> 
> First of all, I agree that things are still a little bit confused (see discussion
> above).  I am hoping that complete
> clarification is not in the critical path. 

I don't think so. 

  But the specific
> answer to your question is that we've got several situations (the XMPP one
> is the most-cited case) in which the presence or absence of a URN, or the
> presence of a URN with a particular set of values in the NSS, simply tells
> whatever (e.g., client) is interpreting it to take or not take some action or to
> behave in
> one way or not another.   For some months, I referred to such
> URNs as "indicators" (sometimes prefixed by "pure" or
> "abstract") to distinguish them from "identifiers" in the sense that your
> comments above implied.  There are some disadvantages to the "indicator"
> terminology.  While I don't remember an explicit discussion of that
> terminology, Peter preferred "abstract designator", no one else seemed to
> object, and I didn't think it was worth arguing about.  If you see a useful way
> to clarify that situation, please suggest it.

I cannot... but I can live with this situation. This part of the spec may become a headache if and when it becomes necessary to translate and explain "abstract designator" in Finnish ;-). 

Best regards, 

Juha

> 
> thanks again,
>     john

[urn] Request for review and comments draft-ietf-… John C Klensin
Re: [urn] Request for review and comments draft-i… Hakala, Juha E
Re: [urn] Request for review and comments draft-i… John C Klensin
Re: [urn] Request for review and comments draft-i… Hakala, Juha E
Re: [urn] Request for review and comments draft-i… John C Klensin