Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

John C Klensin <john-ietf@jck.com> Tue, 15 April 2014 18:41 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D9BDE1A0322; Tue, 15 Apr 2014 11:41:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.172
X-Spam-Level:
X-Spam-Status: No, score=-0.172 tagged_above=-999 required=5 tests=[BAYES_50=0.8, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.272] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L_xSJgno9UBE; Tue, 15 Apr 2014 11:40:56 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) by ietfa.amsl.com (Postfix) with ESMTP id 8224E1A0503; Tue, 15 Apr 2014 11:40:56 -0700 (PDT)
Received: from localhost ([::1]) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1Wa8Hm-0007KV-Ct; Tue, 15 Apr 2014 14:40:46 -0400
Date: Tue, 15 Apr 2014 14:40:46 -0400
From: John C Klensin <john-ietf@jck.com>
To: Phillip Hallam-Baker <hallam@gmail.com>
Message-ID: <001976FFC9FE8FFCAA2E7990@JCK-EEE10>
In-Reply-To: <CAMm+Lwia99RdyO4RFScSwCaVHLsr_BRzmXK18eUoxGFti79Vog@mail.gmail.com>
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <CAMm+Lwia99RdyO4RFScSwCaVHLsr_BRzmXK18eUoxGFti79Vog@mail.gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/DdhnDyXCwLeJlm2EPKiYyU1l1fA
Cc: urn@ietf.org, General discussion of application-layer protocols <apps-discuss@ietf.org>
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2014 18:41:01 -0000

Actually, we don't disagree on anything but language and tactics.

Specifically, the underlying problem is that there are semantic
(and maybe syntax) constraints in RFC 3986 that make it, and its
definition of "URI" unsuitable for some plausible and important
class of identifiers (especially those, or a subset of them,
whose "methods" are not tied to a particular retrieval or
processing protocol (note that "SIP:" and "Mailto" are really no
different from "http" in that regard).

One can avoid recognizing that it is problem and try to force
all identifiers into the Procrustean bed defined by 3986.  That
isn't working well.  Its ultimate result is almost certain to be
the development of object identifier standards away from the
IETF and W3C and the need to adapt to that bed.   Indeed, that
has already occurred; note the Handle System and the
closely-related DOIs.  

Or one can recognize the problem and try to deal with it.  It
seems to me that there are several ways that problem can be
addressed, in more or less decreasing order of drasticness.

(1) Discard RFC 3986 entirely, adopt the syntax convention you
suggest (URI = label ':' anything), and leave each method
definition to its own devices.

(2) Revise and replace 3986 in a way that removes all of the
semantic definitions and constraints, leaving only the generic
syntax.  I think that the result would be workable but, unlike
the "every method defines its own syntax within 'anything'"
model above, I can't prove it.  And, having explored that
option, it turns out that removing the subtle semantic
constraints from 3986 is not easy.  (See the thread in the URN
mailing list for more about this.)

(3) Redefine 3986 as applying only to URLs.  If you have a URI
type (i.e., a method) and define it as a URL, you get to
incorporate 3986 by reference and use its syntax and semantics
without repeating them.  If you don't make that declaration, you
are on your own, much as in (1) above.

(4) Come up with a new typology of URIs and then remove one or
more of types from the scope of 3986.  Of course, as Larry's
insistence that there is really no such things as a persistent
identifier shows, coming up with such a typology and reaching
agreement about it is not straightforward.

(5) Remove URNs from the scope of 3986, leaving everything that
doesn't use the pseudo-method "urn:" within the scope of 3986
until and unless they demonstrate why they should be removed too.

Approach (5) is the one suggested in the draft, not because I
think it would be optimal in the best of all possible worlds,
but because it appears to be something that the URNBIS WG can do
and that would allow it to move forward.

FWIW, the practical difference among some of those approaches is
ultimately the old issue about the difference between
inclusion-based and exclusion-based models.  RFC 3986
effectively says "all identifiers are URIs and should (or must)
conform to this model.  As soon as we get to "except those that
don't", one can either identify those that don't or those that
do.  Of course, if we were to shift to the much more generic
model of (1), number of things that would need to be excluded
would presumably be far fewer.

best,
   john



--On Tuesday, 15 April, 2014 20:38 +0300 Phillip Hallam-Baker
<hallam@gmail.com> wrote:

> I disagree strongly.
> 
> A URI is any string that fits in the URI slot in existing
> protocols and will continue to be so regardless of decisions
> made here.
> 
> We should redefine the syntax of URIs to be a label followed
> by a colon followed by any sequence of non-whitespace
> characters.
> 
> URI = label ':' anything
> 
> label = [a-z, 1-9, A-Z] +
> anything = [not-whitespace]*
> 
> That should give the URN world more than enough scope. All
> they need to do is to make sure their URN encoding escapes
> significant whitespace which is probably an essential success
> criteria in any case.
> 
> The syntax I give is pretty much the definition of a URI used
> in pretty much all code that attempts to turn text into
> hyperlinks that is not limited to http and https.
> 
> On Mon, Apr 14, 2014 at 4:11 PM, John C Klensin
> <john-ietf@jck.com> wrote:
>> Hi.
>> 
>> It seems wise to call the attention of this broader group to
>> something that is going on in URNBIS (and more generally).
>> This message is a personal opinion.  It summarizes some
>> discussions in and around that WG but is not an attempt to
>> report on any sort of consensus.
>> 
>> RFC 3986 on Generic URI Syntax was an attempt to create a
>> general syntax (and, despite its title, partial semantics) for
>> an extremely general set of resource identifiers, using