Re: [urn] Suggested changes to rfc2141bis-14

"Hakala, Juha E" <juha.hakala@helsinki.fi> Tue, 24 November 2015 07:55 UTC

From: "Hakala, Juha E" <juha.hakala@helsinki.fi>
To: John C Klensin <john-ietf@jck.com>, "urn@ietf.org" <urn@ietf.org>
Thread-Topic: [urn] Suggested changes to rfc2141bis-14
Thread-Index: AdEjkVylORy7rhaTRz2w2Ey3YYKOowAW+vIAAKPT6aA=
Date: Tue, 24 Nov 2015 07:54:32 +0000
Message-ID: <AMSPR07MB4540C7EF31B548ABA068AD1FA060@AMSPR07MB454.eurprd07.prod.outlook.com>
References: <AMSPR07MB45438F00E2C6E96184F112BFA1A0@AMSPR07MB454.eurprd07.prod.outlook.com> <137731ACF712EA751660E871@JcK-HP8200.jck.com>
In-Reply-To: <137731ACF712EA751660E871@JcK-HP8200.jck.com>
Accept-Language: fi-FI, en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: helsinki.fi does not designate permitted sender hosts)
spamdiagnosticoutput: 1:23
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Nov 2015 07:54:32.7409 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 98ae7559-10dc-4288-8e2e-4593e62fe3ee
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AMSPR07MB453
Archived-At: <http://mailarchive.ietf.org/arch/msg/urn/xxrYtpHwVoZf0rjAsEXi8R063SM>
Cc: Tobias Weigel <weigel@dkrz.de>
Subject: Re: [urn] Suggested changes to rfc2141bis-14
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Nov 2015 07:55:01 -0000

Hello John,

A few comments and answers below. 

> -----Original Message-----
> From: John C Klensin [mailto:john-ietf@jck.com]
> Sent: 21. marraskuuta 2015 1:44
> To: Hakala, Juha E <juha.hakala@helsinki.fi>; urn@ietf.org
> Cc: Tobias Weigel <weigel@dkrz.de>
> Subject: Re: [urn] Suggested changes to rfc2141bis-14
> 
> (Note: Editor hat on.  I.e., as noted some of the comments below are headed
> directly to the -15 text unless I hear objections)

Good. 

> --On Friday, November 20, 2015 13:08 +0000 "Hakala, Juha E"
> <juha.hakala@helsinki.fi> wrote:
> 
> > Hello,
> >
> > Four minor change requests to the latest URN syntax
> > specification:
> >
> > 1. rfc2141bis-14 states in chapter 3.3.1 that "q-component SHALL NOT
> > be taken into account when resolving a URN to a URL".
> >
> > There are two problems with this:
> >
> > First, a URN can be resolved to 1-n URLs.
> 
> Or, more accurately, 0-n URLs.  I had already picked this up, although I like
> your reasoning and suggestion better.

0-n is fine with me. I was only thinking about URNs that are actionable. 
> 
> > Second, depending on what service is being requested, it may be
> > necessary to resolve a URN to different URLs. For instance, if a user
> > wants the identified resource, the URL will be the one belonging to a
> > digital asset management system holding the document. But if the just
> > descriptive metadata about the resource is requested, the most
> > appropriate URL may be the one of the national bibliography.
> >
> > I suggest the following modification of 21421bis:
> >
> > "As described under Section 4, the q-component SHALL NOT be taken into
> > account when determining URN equivalence.
> >
> > Similarly, the q-component MUST NOT be used to communicate service
> > related requests and parameters to a resolution service.
> 
> I'm a little reluctant to make that a MUST NOT, which is a very strong
> requirement.  It might actually be appropriate when the URN can (and will)
> be resolved to exactly one URL (and not multiple URLs, selected URLs, or
> anything else), but, if that was the intended meaning, someone should speak
> up so I can try to make it a lot more clear.   

I'm a little reluctant to make decisions based on an assumption that some URN can and will be resolved to one and only one URL. Even if at first there is just one URL to resolve to, there may eventually be several if the document is preserved for long term. Documents are copied from institutional repositories to digital archives, with accompanying metadata. 

I used MUST NOT to underline the difference between r- and q-components. If clients are allowed to send service related requests intended to the resolvers in q-components, then resolvers must check q-components and determine if they should do something with them (beyond just deciding the correct target system to which the q-component can be sent un-altered). This would complicate resolver implementation a lot.    

> There is also the issue of
> whether a client "communicates" information to a resolution service that
> returns values so the client can then do something else versus communicating
> with a resolution service that retrieves objects (discussed at length a few
> weeks ago).
> Especially if we are not going to try to resolve that difference and create a
> requirement, I'd prefer "SHOULD NOT", perhaps restructuring the sentence to
> read
> 
> 	"Namespaces and associated information placement in
> 	syntax SHOULD be designed to avoid any need for a
> 	resolution service to consider the q-component.
> 	Namespace-specific and more generic resolution systems
> 	MUST NOT require that q-component information be passed
> 	to them for processing."
> 
> > A resolution service MAY parse the q-component in order to determine
> > an appropriate target system to supply the requested service. "
> 
> Here I get confused.  I have assumed that determining an appropriate [target]
> system to supply a requested service falls
> well within the scope of r-components.   Can you explain more
> about what you had in mind?

Let us assume that a user wants descriptive metadata about the identified resource. This can be done either with r-component or with q-component, and both of them may be constructed on the fly by the client the customer is using. 

If r-component is used, it is obligatory to specify the service and service related parameters (if any); in this case the service is (in RFC 2483 syntax) I2C (URI to URC) and the parameter is the appropriate metadata format. The client may also specify a target system (server) to which the request should be sent. In plain text a typical URN resolution request could be something like "retrieve from the Library of Congress online catalogue a MARC record describing the resource with ISBN XXX". It is up to the resolver to modify the r-component containing this request into something (such as SRU search URI) that a server in the LoC application can support. Eventually some systems may be able to deal with "native" r-components, but at first this will not be the case.   

Note that "hard-coding" the target system in the r-component may be risky, because the preferred target system may no longer exist or it may not be accessible because of a paywall or technical issues. The resolver may be aware of 1-n other systems that can supply the requested service, so it must be able to override the client's preference when necessary.

If a client relies on q-component it may construct an SRU query, and pass it to the resolver which then sends it unaltered to an appropriate target system. The target server may be hardcoded in the URN, as in this example:

urn:isbn:9789522222725?http://lx2.loc.gov:210/lcdb?version=1.1&operation=searchRetrieve&query="urn:isbn:9789522222725"
&startRecord=1&maximumRecords=1&recordSchema=dc

In this case, URN resolution does not provide much added value. Q-component of the URN is just passed on to the SRU server in the Library of Congress online catalogue in order to retrieve a Dublin Core record of the identified book. Even in this case the resolver needs to pass the q-component on. But at least in principle the client could omit the server information and leave it to the resolver to use its own preference for SRU protocol. Or, if the server address is known to be invalid, the resolver can replace it with something else.  

Current URN resolvers are not smart like this; none of them can deal with q- or r-components yet. This "intelligence" needs to be built into them, and we must be careful not to impose any artificial limitations to this, like hardcoded server addresses which cannot be overridden.  
> 
> 
> > 2. IMO there is a slight problem with this sentence from chapter 3.3.:
> >
> > "The q-component is intended for parameters to be transmitted to the
> > named resource and interpreted by that resource."
> >
> > It is not always the case that the result of the resolution is the
> > named resource. A more accurate wording might be the
> > following:
> >
> > The q-component is intended for parameters to be transmitted to either
> > the named resource or a system that can supply the requested service,
> > and interpreted by that resource or system.
> 
> Wfm.  Any objections?
> 
> > 3. In chapter 3.3.3, the draft currently says:
> >
> > "When a URN containing an f-component resolves to a URL, the
> > f-component from the URN is copied verbatim into the fragment of that
> > URL."
> >
> > and a bit later:
> >
> > "Similarly, the f-component MUST NOT be passed to resolution servers
> > when querying them for resource locations or metadata."
> >
> > The problem with the former sentence is that the URN does not always
> > resolve to the URL of the identified resource. If the result is an
> > information page describing the resource or a metadata record, adding
> > a fragment does not make sense. And IMO the latter sentence is
> > meaningless since resolution servers should a priori ignore
> > f-components no matter what service is being requested. Of course, web
> > browsers usually do not send fragments at all to HTTP servers, so at
> > least in the case of HTTP it would be a bad idea to use f-component in
> > the resolution process.
> >
> > We are better off saying for instance:
> >
> > "If a URN containing an f-component resolves to a URL of the named
> > resource, the f-component from the URN can be applied (usually by the
> > client) verbatim as the fragment of that URL.
> 
> Ok.  But see the comment below the next paragraph and the separate
> "metacomment" note.
> 
> > Clients SHOULD NOT pass f-components to resolution servers. If a URN
> > containing an f-component is received by a resolution server, the
> > server SHOULD ignore the f-component when processing the URN.
> 
> I know that just about everyone who thinks about this has a particular
> processing model and even a style of API in mind, but I think we need to be
> very careful to not build those models into the text unless (i) that is a
> requirement we want to make and (ii) we have clear consensus about it.  We
> have a convention (not always followed) that one should not say "SHOULD" (or
> "SHOULD NOT" without at least a general description of the
> exception case(s).    The above could be improved considerably
> by saying something like
> 
> 	"Clients SHOULD NOT pass f-components to resolution
> 	servers unless those servers also perform object
> 	retrieval and interpretation functions."
> 
> and then either hope that "when processing the URN" is interpreted very
> narrowly or put in some further qualifying language.  In that context, what do
> you (and others) think?

There may be cases in which fragments have a more important or different role from what RFC 3986 assigned for them. For complex objects such as research data sets fragments may be essential for retrieving what is needed. But it is target systems which are aware of this. I don't know if there is no need to extend this kind of knowledge about fragment usage to URN resolvers as well. But your version of the text is OK, since it allows yet another possibility to make resolvers smarter. 

> 
> > 4. A bit later in the same chapter, there is a sentence
> >
> > "Thus, for URNs that resolve to URLs, the semantics of an f-component
> > are defined by the media type of those resources, not by the
> > namespace."
> >
> > A more accurate version:
> >
> > "Thus, for URNs that resolve to URLs of the named resources, the
> > semantics of an f-component are defined by the media type of those
> > resources, not by the namespace."
> >
> > Some URNs may resolve only to e.g. URLs of metadata records describing
> > the named resources, and in such case f-component is not applicable.
> 
> I see where you are going, but...
> 
> > Of course it might be possible that the media type of the metadata
> > record would allow use of f-component / fragment, but in that case the
> > appropriate f-component can only be applied to the URN of the metadata
> > record itself. Resolution processes will become increasingly complex,
> > and there is no way of knowing the media type of the thing resolution
> > process produces unless it is the named resource itself.
> 
> However, I don't think one can tell.   We've been thinking about
> things like (using the -14 r-component syntax):
> 
> 
> urn:isbn:xxx-yxx-...??something-that-specifies-metadata-not-book

Yes, r-component is an obvious way of doing this. Q-component should be non-problematic as well, especially if well-known protocols are used. 
> 
> but one could equally well have (although I hope we don't)
> 
>    urn:isbn-metadata:xxx-yxx-...

In practice, this namespace will not get registered, since having ISBN embedded in the NID would create confusion. Metadata record identifiers have nothing to do with ISBNs; they tend to be database internal and if not, NBN may be used.  So URNs for metadata records may be something like urn:nbn:xxx.  
> 
> Now, it seems to me that, at least if the metadata are returned in a form that
> has a media type, it would be equally rational to put
>     #author
> 
> at the end of either of those URN-strings.  But I think your language prohibits
> that for one case and not the other.  Or maybe it prohibits the fragment for
> both.

Resources and resource metadata tend to have different fragments, or same fragment would give different results. It is OK to do this:  

urn:isbn:xxx#chapter2 

if the e-book media type supports this. And it is OK to do this (if the metadata format allows it): 

urn:nbn:xxx#author

to find out who the author of the book is. But usually you cannot use metadata related fragments for the resource itself, or resource related fragments for metadata, since that would either not make sense at all, or would yield unpredictable results. There are no chapter 2's in metadata records. 

Meaningful use of fragments is difficult if the user has no idea of what fragments are available in the identified resource. So it is possible that fragments will usually be applied when people cite publications. This is safer than using URL + fragment as long as the identifier is applied to single manifestation of the resource only. 

Best regards, 

Juha

> 
> So I think I need more comments on this from you and others before changing
> text.
> 
> 
> > Apart from these minor glitches the draft looks fine to me.
> 
> Encouraging.
> 
> best,
>     john
>

[urn] Suggested changes to rfc2141bis-14 Hakala, Juha E
Re: [urn] Suggested changes to rfc2141bis-14 John C Klensin
Re: [urn] Suggested changes to rfc2141bis-14 Hakala, Juha E
Re: [urn] Suggested changes to rfc2141bis-14 John C Klensin
Re: [urn] Suggested changes to rfc2141bis-14 Hakala, Juha E