Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

John C Klensin <john-ietf@jck.com> Tue, 29 April 2014 20:46 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E709F1A09A0 for <urn@ietfa.amsl.com>; Tue, 29 Apr 2014 13:46:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.651
X-Spam-Level:
X-Spam-Status: No, score=-2.651 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_21=0.6, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R86mEMKjiKfk for <urn@ietfa.amsl.com>; Tue, 29 Apr 2014 13:46:16 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) by ietfa.amsl.com (Postfix) with ESMTP id F14F31A0984 for <urn@ietf.org>; Tue, 29 Apr 2014 13:46:15 -0700 (PDT)
Received: from [198.252.137.115] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1WfEuq-0009iH-7v; Tue, 29 Apr 2014 16:46:12 -0400
Date: Tue, 29 Apr 2014 16:46:07 -0400
From: John C Klensin <john-ietf@jck.com>
To: "Dale R. Worley" <worley@ariadne.com>
Message-ID: <71FCF3F23062A6AE7F8552C9@JcK-HP8200.jck.com>
In-Reply-To: <201404252110.s3PLAPM1031471@hobgoblin.ariadne.com>
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <534BED18.9090009@gmx.de> <3D39F1AA700A179F3C051DE2@JcK-HP8200.jck.com> <534D3410.50607@ninebynine.org> <54ecc96adba240159cf624c54c507136@BL2PR02MB307.namprd02.prod.outlook.com> <952E89C207E59D25CD5953D6@JCK-EEE10> <358467E0-F2C0-4468-A099-BBAA4F5438D2@mnot.net> <FAB32F8D-4BE4-4E49-AE8E-022D322C3BCC@pobox.com> <11B3A42537CE2D687E206A34@JcK-HP8200.jck.com> <201404252110.s3PLAPM1031471@hobgoblin.ariadne.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.115
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/n9GLpmZPbty-arhlRZH9QoO6Y6c
Cc: urn@ietf.org
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Apr 2014 20:46:21 -0000

Hi.

I needed to take a few days off from this.  I'm going to respond
to a few points of Dale's note below, then send a separate one
suggesting a way forward.

By means of preface, while I may have caused, or at least
worsened, the situation by the way
draft-ietf-urnbis-urns-are-not-uris-00 is written, I don't
believe that the arguments about "want is a name", "what is
persistent", and so on are really productive.  I think that, if
the WG is to make progress, we need to focus on requirements and
how to move forward, not philosophy of naming, categories,
and/or other hair-splitting exercises.

I'm going to try to avoid terms like "name" or "persistent" in
this note and its successor.  I've tried to be careful about
other terminology but this is a mess, so my apologies if I screw
up.  In particular, I use the term "http-style URLs" below with
the intent of avoid discussions of what a "locator" or how other
types of URLs or URIs might behave were they to exist.

Disclaimer: nothing in this note or its successor makes any
statement or inference about what has or might get WG consensus.
That determination can be hard and is Not My Job.  However, it
is fairly easy to observe things like "no traction" and "clear
lack of consensus" and I have made some of those observations.


--On Friday, April 25, 2014 17:10 -0400 "Dale R. Worley"
<worley@ariadne.com> wrote:

> From the point of view of this analysis, the items that I have
> a strong opinion are few, but I think that they bear heavily
> on the practical engineering of the situation, and so need to
> be resolved one way or the other.  The general idea is that
> being conservative regarding syntax will avoid large costs,
> because much existing software incorporates syntax
> assumptions.  But extending semantics is not likely to be so
> expensive, because little software depends on the semantic
> assumptions of a subset of UR* unless the software performs
> detailed processing on those particular UR*.

I would have said something a little different.   If 3986 had
adopted the "method:<stuff>" model of URIs, we wouldn't be
having this discussion.  If it has taken the position that,
while some characters were reserved, their interpretation
--other  than possibly how to tell when the associated
information string ended-- were strictly a function of the
method, we wouldn't be having this conversation either.  What
gets us into the current situation (more about that below and in
the note that follows) are a number of statements that amount to
"if the following character appears, what follows it has more or
less the following syntax, is interpreted as specified here, and
ends when the following condition is net".

>> From: John C Klensin <john-ietf@jck.com>
> 
>> Perhaps
>> as a corollary to the latter, perhaps independently, there
>> have been efforts to create a Grand Unified Identifier syntax,
>> sometimes in the very restricted "scheme:<stuff>" form that
>> Phillip proposed, sometimes with a lot of reserved characters
>> and associated semantics, and sometimes in between.
> 
> I have a strong sense that it's valuable to have and maintain
> a syntax within which all the UR* fit, because processors of
> UR* enforce and implement such syntaxes.  Currently the
> umbrella syntax is RFC 3986. The cost of expanding the UR*
> syntax beyond what 3986 permits is going to be relatively
> high, and will show up in systems that try to allow all
> possible UR*s in certain situations.

With one qualification, I think I agree.  However, there are at
least three models of "a syntax within which all the UR* fit"
and they are not equivalent:

* If all UR* are simply required to obey
	"method:<stuff>" as a syntax, then there is no problem.
	There is not necessarily a lot of advantages either
	other than the ability to recognize a UR* string and
	dispatch it to method-specific processing.  But that is
	actually a pretty good lever, given how little a UR*
	processor can actually do without sensitivity to the
	method.
* There is then the significantly more restrictive
	syntax and interpretation rules of 3986.  More
	commonality and potential for shared code, but also more
	restrictions.
* And there are a few attempts to redefine, or upgrade
	the definitions, of URLs, a process that may yield
	definitions that aren't completely compatible with 3986
	either.

While I think this WG needs to be aware of those possibilities
and issues, I don't think it should be its job to sort them out.

> Unfortunately, 3986 also specifies some semantics.  I believe
> that the semantics defined for '#' (fragment) is sufficiently
> precise that there is a risk that a considerable number of
> processors may be incompatible with any other use for '#'.
> That is, there will be a significant cost to changing its
> semantics.

Yes.  The WG more or less figured that out some months ago (or
longer).

> My current opinion is that the semantics of '?' (query) is
> sufficiently broad that it can be used for any purpose that
> the UR* scheme wishes define.

Yes.  But, based on discussions many months ago, there is a
question about what a query applies to (see below).  It can be
accommodated using use "?" syntax of 3986, but only by the use
of some reserved keywords or other instances of horrible
kludges.  One or two such kludges were proposed to the WG.  I
think it is accurate to report that they got no traction.

> URNs of the "urn" scheme are currently constrained to the
> subset of the syntax that is specified in RFC 2141.  2141 says
> that '#' and '?' are reserved for expansion, but unfortunately
> the BNF given does not specify their use.  This suggests that
> adding fragment and query to "urn" URNs will be expensive, as
> software is likely to be designed to the BNF.  However,
> defining additional URN schemes that do not have that
> restriction should not have such a cost.

I look at this a little differently, but we might have reached
the same conclusion (except that "URNs" that don't use the "urn"
method give me a bad headache because of the ease with which
going down that path deteriorates into complete confusion in
which people have no idea what each  other are talking about.
See the forthcoming note.

>> (i) There is a community, [...] that have found the syntax and
>> semantic constraints unreasonably restrictive given their
>> perceived needs.
> 
> Can you give us pointers to this?  In particular, regarding the
> syntax.  It seems to me that this is a deeply important point,
> but short of reading the entire URNBIS archive, I don't see any
> algorithmic way to obtain the supporting information for this
> assertion.  Surely, if there are large communities with this
> opinion, someone somewhere has made a clear statement and
> defense if this conclusion.

I'm not part of that community although I've observed it and
talked with some of its members.  Juha or others may be able to
conveniently supply references or even a reading list.  But, as
I understand it in its contemporary form, a key part of the
problem is that a lot of object identifiers are ultimately bound
to two-part (or more) things in the sense of the old Apple
two-fork HFS ("Mac OS Standard") file system and its
predecessors.  For an http-style URL, the query is addressed to
the store in which the object is located and may be used to
select the object, to select within it, etc.  In a two (or more)
fork environment, queries can, in principle, be addressed to
information about the object (aka "metadata"), to the selection
of the object or subsets of it, and so on.  They may specify if
retrieval is actually wanted and, if so, in which fork.  For
some types of objects (types presumably identified by NID) there
may be one fork, two forks, or more forks and actual retrieval
may be meaningful (or not) for each other them.  In principle,
one could have an NID (or NID NSS pair) that did not identify an
object at all but was a pure string for comparison purposes
(that is allowed by 2141 as I read it).  

Because of those combinations, it is desirable to be able to
identify where a query is intended to be processed and/or what
sort of query it is on a basis that applies to all urn-method
URNs and maybe to have abstractions about what happens when
queries cannot be satisfied that goes somewhat beyond what 3986
specifies (or allows other things to specify).  Because the
query model of 3986 is, at least IMO, pretty closely tied to the
interpretation of queries in http-style URLs, it is hard to make
those distinctions except, perhaps, by kludge.

And, if we are really trying to construct identifiers that will
be useful (or at least accurately interpretable) for centuries,
even if the presumed associated retrieval methods go away,
kludges that can be avoided are probably an extra-bad idea.

So that is, at least from my perspective and very limited model,
the problem statement that requires something about the
foundations of what we are talking about to change.

Watch for the other note (on "moving forward").

      john