Re: [urn] DOIs as an example of name/locator confusion

ht@inf.ed.ac.uk (Henry S. Thompson) Thu, 05 June 2014 20:15 UTC

Return-Path: <ht@inf.ed.ac.uk>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2A72E1A007B for <urn@ietfa.amsl.com>; Thu, 5 Jun 2014 13:15:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.852
X-Spam-Level:
X-Spam-Status: No, score=-4.852 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CYyfUGyMsluh for <urn@ietfa.amsl.com>; Thu, 5 Jun 2014 13:15:42 -0700 (PDT)
Received: from nougat.ucs.ed.ac.uk (nougat.ucs.ed.ac.uk [129.215.13.205]) by ietfa.amsl.com (Postfix) with ESMTP id CD5D81A0025 for <urn@ietf.org>; Thu, 5 Jun 2014 13:15:40 -0700 (PDT)
Received: from crunchie.inf.ed.ac.uk (crunchie.inf.ed.ac.uk [129.215.33.180]) by nougat.ucs.ed.ac.uk (8.13.8/8.13.4) with ESMTP id s55KFHJq022480; Thu, 5 Jun 2014 21:15:19 +0100 (BST)
Received: from troutbeck.inf.ed.ac.uk (troutbeck.inf.ed.ac.uk [129.215.25.32]) by crunchie.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s55KFFn5013914; Thu, 5 Jun 2014 21:15:16 +0100
Received: from troutbeck.inf.ed.ac.uk (localhost [127.0.0.1]) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s55KFGXj023476; Thu, 5 Jun 2014 21:15:16 +0100
Received: (from ht@localhost) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4/Submit) id s55KFGiB023472; Thu, 5 Jun 2014 21:15:16 +0100
X-Authentication-Warning: troutbeck.inf.ed.ac.uk: ht set sender to ht@inf.ed.ac.uk using -f
To: John Levine <johnl@taugh.com>
References: <20140531180107.24859.qmail@joyce.lan>
From: ht@inf.ed.ac.uk
Date: Thu, 05 Jun 2014 21:15:15 +0100
In-Reply-To: <20140531180107.24859.qmail@joyce.lan> (John Levine's message of "31 May 2014 18\:01\:07 -0000")
Message-ID: <f5b4mzz865o.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Edinburgh-Scanned: at nougat.ucs.ed.ac.uk with MIMEDefang 2.60, Sophie, Sophos Anti-Virus, Clam AntiVirus
X-Scanned-By: MIMEDefang 2.60 on 129.215.13.205
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/KNtbn99Yb4yQKJulkFl3mJi2huY
Cc: urn@ietf.org
Subject: Re: [urn] DOIs as an example of name/locator confusion
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2014 20:15:46 -0000

[Responding only to the change of subject, not the substance :-]

3986 [0] distinguishes between a resource (what a URI *identifies*)
and representations ( what you can *retrieve* ).  In most cases,
although not absolutely necessarily, these are distinct.

I think it's useful to distinguish at least four possible empirical
properties of (what we currently call) a URI and its scheme (and
sub-scheme in the case of a URN):

 1) No public generic action is defined for instances of its scheme
    and/or sub-scheme, either in principle (i.e. by the
    scheme/sub-scheme definition/registration) or in practice (e.g. by
    a widely available app or browser plugin).  We say that such a URI
    (scheme/sub-scheme) is *not actionable*;

 2) Such an action _is_ defined for at least some instances of its
    scheme and/or sub-scheme, and it is one such instance. We say it
    is *actionable*;

 3) Not only is it actionable, but one available action ('retrieval',
    see above), if successful, is defined to yield a representation of
    the resource it identifies.  We say that such a URI is
    *retrieval-enabled*;

 4) Its scheme/sub-scheme is actionable, but at least some of the
    in-principle available actions fail (for non-contingent reasons).
    We say that it *does not support* those actions.

Examples:

 1) 'tel:' -- No actions defined [1]
 2) 'mailto:' -- 'construct' and 'send' actions defined, at least
                  implicitly, but no notion of retrieval [2]
 3) 'ftp:' -- Retrieval is spelled out in the spec [3]
 4) PUTting to 'http://www.w3.org/' -- the PUT action is not supported

(4) implies that (2) and (3) are applicable in slightly different ways
to URI schemes/URN sub-schemes and to individual URIs (not (1),
because if a URI's scheme/sub-scheme is not actionable, it can't be
either):

 2) Even if a URI's scheme/sub-scheme _is_ actionable, a particular
    URI may not support any of the actions defined therein;

 3) Even if a URI's scheme/sub-scheme _is_ retrieval-enabled, a
    particular URI may not support the defined retrieval action;
                                        for that URI 

In many cases, the available actions for an actionable URI are defined
in the definition/registration of its scheme/sub-scheme by reference
to a particular protocol.  

For 'mailto:' this is SMTP [1], for 'ftp:' it's FTP, for http: it's
HTTP (although, interestingly, HTTP is not restricted to actions on
'http:' (and 'https:') URIs).

So where in this space does a name/locator distinction emerge?
Nowhere, as far as I can tell:

  "In practice, the line between 'locator' and 'name' has been
   difficult to draw: locators can be used as names, and names can be
   used as locators." [4]

All URIs are by definition names, not just because they are called
identifiers, but because the nature of what can be identified, and of
identification, is carefully spelled out [5].  Some URIs
(retrieval-enabled ones) are _also_ more-or-less useful as locators.

Examples:

 tel: URIs name "the canonical address-of-record or identifier for a
 termination point within a specific network" [1], but isn't actionable;

 mailto: URIs name mailable entities, is an actionable scheme, but is not
 retrieval-enabled, and doesn't obviously 'locate' anything;

 ftp: URIs name network resources by locating them.  It's
 retrieval-enabled.  It is probably the closest we still have to
 something that deserves to be called a locator in the
 ordinary-language sense of the word, i.e. an address.

 http: URIs name both network and non-network resources.  In its early
 days, most http: URIs were (intended to be) retrieval-enabled.  This
 is no longer true.  XML namespace names [6] are explicitly _not_
 required to be retrieval-enabled, and many of them are not.  For
 principled reasons, many http: URIs used in Semantic Web documents
 are not retrieval-enabled (although they _are_ actionable_).

 doi: URIs likewise name both network and non-networked resources.
 The original intention as stated [7] was for them to "[identify]
 entities of significance to the content industry" and to "reference a
 set of service descriptions".  At the very least they were intended
 to be actionable, and in practice they now are, perhaps not quite as
 originally intended.  Whether they are retrieval-enabled is less
 clear to me.

 urn:ietf:rfc URIs name IETF RFCs, and services exist which act as
 retrieval proxies, see for example
     http://wm-urn.org/urn:ietf:rfc:2119
 A plugin could be constructed to reproduce the simple
 transformation from urn:ietf:rfc:nnnn to
     http://tools.ietf.org/html/rfc2119
 but I don't _think_ the IETF has any published commitment to that
 mapping, or nor is it obvious what the status of this statement [8]
 is:

  The canonical URI is of the form:
  http://www.rfc-editor.org/rfc/rfcXXXX.txt.

The point, now perhaps overly belaboured, is that in practice there is
a _huge_ range of variation within a complex multi-dimensional space
as regards actionability, retrievability and comprehensiblity (that
is, can you find out what a URI identifies if all you have is the URI,
and if so, how).  Suggesting that there is a straightforward
"name/locator" distinction ignores the richness of the actual
phenomenena.

So, back to my core question: what are the requirements?  Saying "we
need names, not locators" is not a requirement until you tell me what
the operational distinction is that you are covertly appealing to.  If
you find the above terminology/taxonomy helpful in answering my
question, so much the better.

ht

[1] http://tools.ietf.org/html/rfc3966
]2] http://tools.ietf.org/html/rfc6068
[3] http://tools.ietf.org/html/rfc1738
[4] http://tools.ietf.org/html/bcp35
[5] http://tools.ietf.org/html/rfc3986#section-1.1
[6] http://www.w3.org/TR/REC-xml-names/
[7] http://tools.ietf.org/html/draft-paskin-doi-uri-04
[8] http://www.rfc-editor.org/pubprocess.html
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]