Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

ht@inf.ed.ac.uk (Henry S. Thompson) Thu, 29 May 2014 18:00 UTC

Return-Path: <ht@inf.ed.ac.uk>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D7F471A016D for <urn@ietfa.amsl.com>; Thu, 29 May 2014 11:00:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.852
X-Spam-Level:
X-Spam-Status: No, score=-4.852 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kn7riWOrnKQZ for <urn@ietfa.amsl.com>; Thu, 29 May 2014 11:00:09 -0700 (PDT)
Received: from treacle.ucs.ed.ac.uk (treacle.ucs.ed.ac.uk [129.215.16.102]) by ietfa.amsl.com (Postfix) with ESMTP id 9EC1C1A0164 for <urn@ietf.org>; Thu, 29 May 2014 11:00:08 -0700 (PDT)
Received: from crunchie.inf.ed.ac.uk (crunchie.inf.ed.ac.uk [129.215.33.180]) by treacle.ucs.ed.ac.uk (8.13.8/8.13.4) with ESMTP id s4THxteu004035; Thu, 29 May 2014 18:59:55 +0100 (BST)
Received: from troutbeck.inf.ed.ac.uk (troutbeck.inf.ed.ac.uk [129.215.25.32]) by crunchie.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s4THxshW021918; Thu, 29 May 2014 18:59:54 +0100
Received: from troutbeck.inf.ed.ac.uk (localhost [127.0.0.1]) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s4THxtSc001504; Thu, 29 May 2014 18:59:55 +0100
Received: (from ht@localhost) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4/Submit) id s4THxstc001500; Thu, 29 May 2014 18:59:54 +0100
X-Authentication-Warning: troutbeck.inf.ed.ac.uk: ht set sender to ht@inf.ed.ac.uk using -f
To: John C Klensin <john-ietf@jck.com>
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com>
From: ht@inf.ed.ac.uk
Date: Thu, 29 May 2014 18:59:54 +0100
In-Reply-To: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> (John C. Klensin's message of "Mon\, 14 Apr 2014 09\:11\:18 -0400")
Message-ID: <f5b38fsh3dx.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Edinburgh-Scanned: at treacle.ucs.ed.ac.uk with MIMEDefang 2.60, Sophie, Sophos Anti-Virus, Clam AntiVirus
X-Scanned-By: MIMEDefang 2.60 on 129.215.16.102
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/gy6a7uZDxip7BopNjvDzmbpGoyM
Cc: urn@ietf.org
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 May 2014 18:00:13 -0000

I have a long-standing interest in Web Architecture, having served on
the W3C TAG for 9 years, and have a sort of informal liaison role even
though I left the TAG a few months ago.  I've been following the
thread you started here with interest, and have spent some time
reading the backtrail, and your draft [1].

I find some things that I recognise, and some that I sympathise with,
but a lot that I just don't understand.  This is, I hope, the first of
a number of posts I hope to make asking for help to be sure I
understand both your requirements and your explanations for how having
URNs governed _inter alia_ by 3986 frustrates those requirements.
-------
3986 says some things which seem very much in sympathy with URNBIS:

  1.2.2.  Separating Identification from Interaction [2]

   . . .

   A common misunderstanding of URIs is that they are only used to
   refer to accessible resources.  The URI itself only provides
   identification; access to the resource is neither guaranteed nor
   implied by the presence of a URI.  Instead, any operation
   associated with a URI reference is defined by the protocol element,
   data format attribute, or natural language text in which it
   appears.

   Given a URI, a system may attempt to perform a variety of
   operations on the resource, as might be characterized by words such
   as "access", "update", "replace", or "find attributes".  Such
   operations are defined by the protocols that make use of URIs, not
   by this specification.

  . . .

   Although many URI schemes are named after protocols, this does not
   imply that use of these URIs will result in access to the resource
   via the named protocol.  URIs are often used simply for the sake of
   identification.  Even when a URI is used to retrieve a
   representation of a resource, that access might be through
   gateways, proxies, caches, and name resolution services that are
   independent of the protocol associated with the scheme name.

 3.5 Fragment

   . . .

   As with any URI, use of a fragment identifier component does not
   imply that a retrieval action will take place.  A URI with a
   fragment identifier may be used to refer to the secondary resource
   without any implication that the primary resource is accessible or
   will ever be accessed.

Nonetheless fragment( and query) seem to be at the heart of the
perceived problems of URIs expressed in URNBIS [1]:

  8.  The role URI fragment and query could or should have in
       identification is unclear and the statements in RFC 3986 are
       definitely problematic from the points of view of existing
       identifier systems and management of naming.

   Does fragment identify a location or a certain section of a resource?
   In the evolving set of URN Internet standards, fragment will not be a
   part of the Namespace Specific String.  Then fragment only indicates
   a place / segment within the identified resource, but does not
   identify it.  If fragment had a role in identification, fragments
   would extend the scope of existing standard identifiers to component
   parts of resources.  For instance, anyone could use URN based on ISBN
   + fragment to identify chapters of electronic books.

There is certainly a fundamental tension with pretty fundamental IETF
principles (beyond just what we find in 3986) implied here, if I
understand it correctly.  Media types play a central role in the IETF
architectural vision: given a character string and a media type, I
know how to find out what I can do with the characters.  It doesn't
matter how I got them and their type: from my local disk, via HTTP, in
an email, or by carrier pigeon.  And it doesn't matter what name or
names may have been involved in helping me get access to them.  One of
the things I can do is use whatever fragment identifier syntax and
semantics is provided by the definition of the media type in question
to . . . identify 'fragments'.  And, crucially, those 'fragments'
_need not_ be any locations or sections of the character string or its
interpretation per the media type:

   The identified secondary resource may be some portion or subset of
   the primary resource, some view on representations of the primary
   resource, or _some other resource defined or described by those
   representations_. [emphasis added] [3]

To make this absolute concrete, given a text/html representation and a
fragment identifier "foo", the secondary resource in question is
e.g. a paragraph in the document resource represented by that html, in
particular the paragraph corresponding to the markup including "<a
name='foo'>" in the representation.  Or, given a text/turtle
representation and a fragment "tbl", the secondary resource in
question is e.g. the resource _described_ by the RDF node
corresponding to a line consisting of "<#tbl>" in the representation.
All of that follows straightforwardly from 3986 and the two respective
media type registrations.

3986 is clear that those secondary resources are _identified_ by those
fragment identifiers.  You appear to be proposing that per URNBIS and
2141bis e.g. urn:example:w3cstaff#tbl would _identify_ the same
resource as urn:example:w3cstaff (although it might be defined to
'indicate' something different. . .).

It seems pretty clear to me that this is a divisive and confusing
thing to do.  Let's take the e-book example (I'm using a DOI, because
I happen to remember where to find the resolver for them, but the
point would be the same for any URN->http:URI resolver today):  The
following _both_ identify the main body of an editorial in the journal
_Nature_ of 28 May 2014:

  doi:10.1038/509534a#article
  http://dx.doi.org/10.1038/509534a#article

Am I right that your proposal would change this (assuming a URN
instead of a DOI)?  I.e., that those two things would no longer
identify the same secondary resource, despite both, minus the
'#article', being usable to retrieve the same text/html representation
of the same resource?

Have I misread your document?  If not, could you expand what I guess
must lie behind your desire to move in this apparently divisive
direction, namely

  "If fragment had a role in identification, fragments would extend
   the scope of existing standard identifiers to component parts of
   resources.  For instance, anyone could use URN based on ISBN
   + fragment to identify chapters of electronic books."

I _think_ this is intended to describe a _bad_ outcome from the
perspective of CiMo [4], one which is however understood to be
required by some combination of 3986 and 2141 as they stand, and thus
to amount to an argument why 3986+2141 won't meet your requirements.

Since it looks to me like a _good_ outcome, I'd be very grateful if
you could help me understand why it's bad from the CiMo perspective.

Thanks,

ht

[1] http://tools.ietf.org/html/draft-ietf-urnbis-urns-are-not-uris-00
[2] http://tools.ietf.org/html/rfc3986#section-1.2.2
[3] http://tools.ietf.org/html/rfc3986#section-3.5
[4] http://www.ietf.org/mail-archive/web/urn/current/msg02249.html
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]