Re: [urn] URNs are not URIs (another look at RFC 3986)
worley@ariadne.com (Dale R. Worley) Thu, 17 April 2014 19:49 UTC
Return-Path: <worley@ariadne.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0325E1A0054 for <urn@ietfa.amsl.com>; Thu, 17 Apr 2014 12:49:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.001
X-Spam-Level:
X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h4cQ8psB2iAq for <urn@ietfa.amsl.com>; Thu, 17 Apr 2014 12:49:19 -0700 (PDT)
Received: from qmta03.westchester.pa.mail.comcast.net (qmta03.westchester.pa.mail.comcast.net [IPv6:2001:558:fe14:43:76:96:62:32]) by ietfa.amsl.com (Postfix) with ESMTP id 8377B1A0011 for <urn@ietf.org>; Thu, 17 Apr 2014 12:49:19 -0700 (PDT)
Received: from omta05.westchester.pa.mail.comcast.net ([76.96.62.43]) by qmta03.westchester.pa.mail.comcast.net with comcast id qzvW1n0030vyq2s537pFrl; Thu, 17 Apr 2014 19:49:15 +0000
Received: from hobgoblin.ariadne.com ([24.34.72.61]) by omta05.westchester.pa.mail.comcast.net with comcast id r7pF1n00D1KKtkw3R7pFuX; Thu, 17 Apr 2014 19:49:15 +0000
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id s3HJnFbg005147 for <urn@ietf.org>; Thu, 17 Apr 2014 15:49:15 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id s3HJnE6R005146; Thu, 17 Apr 2014 15:49:14 -0400
Date: Thu, 17 Apr 2014 15:49:14 -0400
Message-Id: <201404171949.s3HJnE6R005146@hobgoblin.ariadne.com>
From: worley@ariadne.com
Sender: worley@ariadne.com
To: urn@ietf.org
In-reply-to: <001976FFC9FE8FFCAA2E7990@JCK-EEE10> (john-ietf@jck.com)
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <CAMm+Lwia99RdyO4RFScSwCaVHLsr_BRzmXK18eUoxGFti79Vog@mail.gmail.com> <001976FFC9FE8FFCAA2E7990@JCK-EEE10>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1397764155; bh=Vkz7aynBuNqzW6q+EpnuEoRDs/zmunU2QqLy9IGeCos=; h=Received:Received:Received:Received:Date:Message-Id:From:To: Subject; b=C5ED7YjWtm1E5i7Eq2rl9jvvtmShN8K4+avbjyJAu7isBTzG50J1HDNYwL5QpUxOF 7Z+JYza1AspUGGlAy8rVOqIWzWKIhufw4E/TFKzRzrmdK5dEMJ/b73rrn2/1RpAxbV qzqrC+EGxqoeVzoHxhRW2vPlkadJj8ofgEhAKiyWxUmFuO9peMv1AUOBOLKcdxga/O btZxNtU2ptU/HTX4NTmfJWfqpl3j4mcPrkwxRytGLwZySBHdZeT+YgxM+nlTF6gmuG 4ZN2k6Pr+NrdCavBy35PbSzehEOW9rM9fUGAHqFods8jS3uujtRTQ2CWUhyOxUH/3+ Mlkxr9BYPi9Vg==
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/mI-S1gXGFEj4ukZKKPmRHR6UJs8
Subject: Re: [urn] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2014 19:49:23 -0000
One the other hand, if we are considering URNs as they are now constituted, then I can give some specific comments: draft-ietf-urnbis-urns-are-not-uris-00 is written in an unusual format: it gives a good list of issues, gives a sketch of discussion and reasoning, and ends by stating a very specific solution. So I was rather confused as to exactly how I-D was to be interpreted -- Is it the basis from which a discussion will be made? (If so, why does it specify the conclusion?) Is it the final document? (If so, why does it describe the issues, but not give the reasoning which leads to the stated solution?) After seeing the ensuing discussion on the mailing list, it's clearer what the purpose of the I-D is, and I'd like to present my opinions again in a better-organized way. First, let me enumerate some secondary issues that would clutter the discussion of the main points: I conceive that URLs and URNs are two subsets of URIs. In principle, they may overlap, and also there are URIs that are neither URLs or URNs (e.g., the "tag" scheme). In practice, I see neither of these facts as being significant, as there are no practical consequences of either fact, beyond that specific schemes may be defined to be URLs and/or URNs. (This is the "Contempory View" of http://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/, but I arrived at it before reading that document.) Currently, all URNs are in the scheme "urn". But it has never been specified that all URNs must be of the "urn" scheme. The lack of clarity on this point may lead to people making the assumption that because a URI is a URN, it must be of the "urn" scheme. I can see the philosophical distinction between URNs and URLs, but I don't see that that necessitates that they are handled differently at the protocol level. This is despite that I'm a mathematician by training and have a strong sense that the proper structuring of systems is aided by having clearly organized concepts. In particular, both URLs and URNs can be used for the operation of "designating a resource". Generic "designating a resource" can be useful in at least two situations: (1) when the use of the URI is opaque, and much of the processing in question can be done without consideration of what the resource is in particular, and (2) when the processor can resolve the URN into a concrete representation of the resource. In the latter case, the URN allows the processor to access an object in the much same way as a URL, and URN acts as a subclass of URL. As RFC 2141 says, "The URN syntax has been defined so that URNs can be used in places where URLs are expected." If we are seriously concerned with persistence of objects, we should standardize on the most-proven technology available, viz., translating the object into Sumerian, transcribing it into cuneiform on a clay tablet, firing the tablet, and then burying it in suitable dry soil. In particular, I haven't seen any reference to "Ozymandias" being made persistent in this way. However, that should be put into a different I-D, as it is out of scope for this one. ----- Getting back to the main issues: Looking at RFC 3986, I see that despite its title "Uniform Resource Identifier (URI): Generic Syntax", it does contain certain specifications of the generic semantics of URIs, and these specifications have some consequences for defining URNs. (The fact that these consequences exist is essentially admitting that the concerns of http://www.w3.org/DesignIssues/ModelConsequences must be met. Again, I had come to these conclusions before reading that document.) 1) Paths: the use of '/', '.', '..', and how relative URIs are resolved into absolute URIs These rules describe how relative URIs are to be interpreted, and in doing so, specify that a URI is divided into "segments" by '/', and how a new absolute URI is assembled from the segments of a base URI and the relative URI. As a consequence of this process, the use of '.' and '..' as segments must be avoided in absolute URIs. I don't see this as significantly restrictive of URNs -- If a URN scheme wants to take advantage of the relative URI mechanism, it can do so by conforming to the generic syntax, and if it does not want to use the relative URI mechanism, it can avoid using '/'. 2) Query: the use of '?' A URI containing a query part is related to the URI created by deleting the query part in some manner, but the manner seems to be entirely left for definition by the scheme definition: The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority (if any). That appears to me to be a non-constraint in any practical sense. 3) Fragment: the use of '#' The use of the fragment part has much more semantic content: The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained. Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications. Individual media types may define their own restrictions on or structures within the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found). In short, the full process of dereferencing a URI must be factorable into three phases: - dereference the URI with the fragment part removed to provide a set of representations - select one of the representations - from or based on the selected representation, derive the fragment part and while the first phase is scheme-dependent, the third phase may only depend on the chosen representation and its media type. The degree of constraint this places on URNs is not clear to me. If one wishes a URN-without-fragment to designate a resource whose representation is a media type whose fragment-access is already defined, the URN is constrained. (I know that fragment-access is defined for HTML documents; is it defined for any other media type?) If one is free to have one's URN-without-fragment designate a new media type, the semantics of the fragment part is nearly unlimited, as long as the base resource representation contains all the information needed for fragment resolution, as specified by its media type. 4) Syntactic compatibility A question which seems to me to be getting insufficient attention is that of semantic compatibility of all URIs, that is, that all current and future URIs should conform to the currently set syntax for URIs. This is required for upward-compatibility with current systems that validate URIs for syntactic conformity. In regard to this, it seems to me to be undesirable to decouple the syntax specification of URNs (or rather, the "urn" scheme) from RFC 3896 -- because we have a de-facto requirement that URNs remain within 3896, formally disconnecting the two will lose formal specification of this requirement. However, there is a caveat (vide Phillip Hallam-Baker's remarks): [That] may not be backward-compatible with the specification, but it is backward-compatible with reality. -- Francois Audet There is a very real question of the degree to which any existing system validates data that are considered to be "generic URIs". If systems in practice don't validate URIs beyond "having a scheme", then we are free to update the URI syntax (and consequently the URN syntax) very broadly. Similarly, we have to worry about changes to the syntax of the "urn" scheme. RFC 2141 states that '/', '?', and '#' are "reserved for particular purposes". But in fact, they don't appear in the BNF, so the "urn" definition can't be expanded to include them without risking breaking any software that validates URNs against the BNF of 2141. Again, the degree to which this a practical problem is not clear. 5) Equality testing One feature of RFC 3986 that may have turned out to be a bad idea is that testing for equality of two URIs cannot be done without specific information regarding their scheme. 3986 does define that certain URIs (that is, certain character sequences that conform to the BNF) must be "the same" (and thus have equivalent functionality for all purposes). But each scheme is permitted to define equality in a coarser sense, that is, to combine those groups of equal URIs into larger groups. This makes it impossible for a processor to index something based on URIs that does not have specific knowledge of the URI schemes involved, but still handles all equal URIs in the same way. However, it's not clear how much that matters in practice -- most URI schemes have de-facto canonical forms. It's also not clear that this situation can be avoided if we want to regularly incorporate pre-existing identifier systems as URI schemes, as other identifier systems frequently have their own rules for identifier equality. Dale
- [urn] URNs are not URIs (another look at RFC 3986) John C Klensin
- Re: [urn] URNs are not URIs (another look at RFC … Julian Reschke
- Re: [urn] URNs are not URIs (another look at RFC … John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Martin J. Dürst
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Graham Klyne
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Graham Klyne
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Larry Masinter
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Julian Reschke
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Nico Williams
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Julian Reschke
- Re: [urn] URNs are not URIs (another look at RFC … Dale R. Worley
- Re: [urn] URNs are not URIs (another look at RFC … Dale R. Worley
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Nico Williams
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Mark Nottingham
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Graham Klyne
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Scott Brim
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Mark Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] URNs are not URIs (another look at RFC … John C Klensin
- Re: [urn] URNs are not URIs (another look at RFC … Barry Leiba
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Scott Brim
- Re: [urn] URNs are not URIs (another look at RFC … John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Nico Williams
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Nico Williams
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Barry Leiba
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Mark Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Martin J. Dürst
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Martin J. Dürst
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Larry Masinter
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Tony Finch
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Maurizio Lunghi
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- [urn] R: [apps-discuss] URNs are not URIs (anothe… Maurizio Lunghi
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Larry Masinter
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Edward Summers
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Dale R. Worley
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Dale R. Worley
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Phillip Hallam-Baker
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… jehakala
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Juha Hakala
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Svensson, Lars
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… SM
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… jehakala
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… John C Klensin
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… SM
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Dale R. Worley
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Henry S. Thompson
- Re: [urn] [apps-discuss] URNs are not URIs (anoth… Henry S. Thompson