Re: [urn] Alissa Cooper's Discuss on draft-ietf-urnbis-rfc2141bis-urn-21: (with DISCUSS and COMMENT)

John C Klensin <john-ietf@jck.com> Thu, 02 March 2017 02:18 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A5045129411; Wed, 1 Mar 2017 18:18:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id noXcsroxEHzA; Wed, 1 Mar 2017 18:18:44 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 91615129559; Wed, 1 Mar 2017 18:18:44 -0800 (PST)
Received: from [198.252.137.70] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1cjGKI-000AvC-RS; Wed, 01 Mar 2017 21:18:42 -0500
Date: Wed, 01 Mar 2017 21:18:36 -0500
From: John C Klensin <john-ietf@jck.com>
To: Alissa Cooper <alissa@cooperw.in>, The IESG <iesg@ietf.org>
Message-ID: <E1B86D7B58E7FBAABCAE9FD9@PSB>
In-Reply-To: <148838107824.7093.11755371556465062472.idtracker@ietfa.amsl.com>
References: <148838107824.7093.11755371556465062472.idtracker@ietfa.amsl.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.70
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/k9BBJZJJcga5Wx-yyTxyifdHd6o>
Cc: urn@ietf.org, urnbis-chairs@ietf.org, draft-ietf-urnbis-rfc2141bis-urn@ietf.org, barryleiba@computer.org
Subject: Re: [urn] Alissa Cooper's Discuss on draft-ietf-urnbis-rfc2141bis-urn-21: (with DISCUSS and COMMENT)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Mar 2017 02:18:50 -0000


--On Wednesday, March 1, 2017 07:11 -0800 Alissa Cooper
<alissa@cooperw.in> wrote:

> Alissa Cooper has entered the following ballot position for
> draft-ietf-urnbis-rfc2141bis-urn-21: Discuss
>....

> -------- DISCUSS: --------
> 
> What is the motivation behind specifying the r-component
> syntax at this point and then recommending against its use
> until further standardization is complete? Why not specify the
> syntax when those future standards get written? The current
> approach just seems like an invitation for people to start
> including r-components in URNs without independent
> implementations understanding their semantics.

Commenting at this time on this one issue only and speaking for
myself (if it is relevant, Barry or Alexey will need to comment
on whether I've captured WG consensus).

One of the important properties of URNs (more or less since they
were first conceived of) is that they were location-independent
(and, for things more like books than web pages,
particular-copy-independent) abstract identifiers that would
"resolve" into something more location and/or copy-dependent.
While some early documents assumed that URNs would be resolved
to locator-type URIs, that is not and never has been a
requirement: URNs can resolve into other sorts of things or not
resolve at all.  

When the "opportunity" came up to revise URNs, there were
several different communities who wanted extended capabilities
beyond those that can be supported under a strict interpretation
of RFC 2141.   The library community was important among them
and needed the capability to both identify an object (or
object-abstraction) and pass information to it.  An initial view
in the WG was to simple add a 3986-like query component to the
URN syntax and be done with it.  However, discussion on that
point rapidly led to the realization that there were at least
two different types of qualifying information: information that
was needed to resolve or dereference the URN and information
that was needed to work with, interpret, or get information from
the object.  That, in turn, led to the conclusion that the
distinction between the r-component and q-component was
necessary, even (or especially) if the WG didn't feel ready to
precisely define the syntax and semantics of r-component (it may
be worth noting that there was one proposal to do just that, but
the WG decided to be conservative about untried solutions to a
problem that was not itself, understood explicitly and in
detail).    It was, and is, still necessary, to define basic
syntax for r-component, both to distinguish it from q-component
and to address a more basic problem.

That problem is that the predecessors of RFC 3986 established a
generic, and quite prescriptive, syntax for URLs.  RFC 3986
expanded that to cover all URIs, arguably including URNs without
fully understanding the implications of that expansion.  The
result is that almost anything one puts in a URI either means
something, is part of some specific component, or both.  

The WG was told that there are generic URI parsers that isolate
those components according to the rules of 3986 and that the WG
could not reasonably do anything that would foul up such
parsers.  The result is that, if one is going to separate the
"part of URN processing" material that the document calls
r-components from the "something the URN is expected to make
available to the resolution object or equivalent if there is
one" material the document calls q-components, then it is
necessary to assign distinguishing syntax to both of them so it
is clear how to tell them apart.   That can be done without
specifying parsing of the substring that the spec assigns to
r-components, and, after a good deal of discussion about
options, that is what the WG told the editors to do.  But it
cannot be done without defining that distinguishing syntax -- if
that were done, if all that were defined now was the q-component
(or the RFC 3986 <query>) then it would be impossible to add the
r-component functionality later without complaints that some
potential use of q-component or functionality of some real of
hypothetical generic parser was being messed up.

The discussion above is the answer to another question, or maybe
another set of them.  The URNBIS effort was driven by demands
from several communities who had URNs in general use for
functionality beyond that allowed by RFC 2141, even though 2141
and its predecessors discussed some of it and reserved syntax
for future use.  At least one of those communities, whose
members include experts from several important repository
libraries, have national and international standards-defining
bodies of their own and a lead on the IETF in terms of thinking
about persistent identifiers of at least a few hundred years.
They could have simply gone off and defined and standardized
their own extended URN format.  Instead, they were, at least
IMO, far more adult and responsible than many of the SDOs we
deal with and preferred to bring requirements to us, ask us to
figure out what would work best with URNs, and then participated
in the process.   

Had the goal of the WG been simply to update and make minor
adjustments to RFC 2141 and possibly to combine the registration
issues of 3406 with it, the result would almost certainly have
been a shorter document than the sum of the page count of those
two.  When the requirement because to expand functionality in
ways  that changed syntax while remaining compatible with prior
practice and 3986 syntax and, necessarily, to address some of
the subtle issues above, the explanations and the document got
appreciably longer.  

Could it be streamlined?  Yes, probably.   Some of the issues
you identified are the result of pulling text out (including
some internal syntax for r-components and a longer discussion of
f-components/fragments) over the last year.  But, speaking from
experience, it isn't easy -- virtually every non-trivial
document change seems to stimulate someone in the WG who wants
to revisit an old issue.  Unless there are substantive issues, I
recommend just letting things go with a plan to sort those
issues out when URNs are progressed to full standard.   With
luck and some effort on the IESG's part, perhaps we will, by
then, have a revised and clarified 3986bis to work with,
something that would almost certainly make this document far
simpler.  On the other hand, if you object to a normative
statement in the registration template because we make an
arbitrary decision to put it in an appendix when 3406/3406bis
was moved in, I think that template could be turned into a
section of the main document (or a subsection of IANA
Considerations) by applying a very small effort to the XML
source.

Again, my personal perspective with which some people in the WG
might disagree.  

best,
   john