Re: [urn] Thoughts on fragments, queries, and new URN namespaces
John C Klensin <john-ietf@jck.com> Sat, 15 June 2013 13:16 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C426221F9C6C for <urn@ietfa.amsl.com>; Sat, 15 Jun 2013 06:16:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.949
X-Spam-Level:
X-Spam-Status: No, score=-101.949 tagged_above=-999 required=5 tests=[AWL=-0.037, BAYES_00=-2.599, FUZZY_VPILL=0.687, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kFO1CSsjHxUB for <urn@ietfa.amsl.com>; Sat, 15 Jun 2013 06:16:08 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) by ietfa.amsl.com (Postfix) with ESMTP id 8416921F9A7E for <urn@ietf.org>; Sat, 15 Jun 2013 06:16:08 -0700 (PDT)
Received: from [198.252.137.115] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.71 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1UnqKl-000GDY-2E; Sat, 15 Jun 2013 09:15:59 -0400
Date: Sat, 15 Jun 2013 09:15:53 -0400
From: John C Klensin <john-ietf@jck.com>
To: Keith Moore <moore@network-heretics.com>
Message-ID: <4A9225387F6E4CCB5BB1A018@JcK-HP8200.jck.com>
In-Reply-To: <51BB743B.2020007@network-heretics.com>
References: <93D12CA26D01683582E31B95@JcK-HP8200.jck.com> <51BA7AAB.4080301@network-heretics.com> <51BA9BCA.7080407@stpeter.im> <51BAA2B2.5010602@network-heretics.com> <B2CABDBAEC8551703DFD512F@JcK-HP8200.jck.com> <51BB743B.2020007@network-heretics.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: urn@ietf.org
Subject: Re: [urn] Thoughts on fragments, queries, and new URN namespaces
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Jun 2013 13:16:15 -0000
Keith, It is starting to feel as if we are either reading different documents with the same names and identifiers or that we are somehow reading the same documents very differently. Key examples and some other discussion inline below. Note to impatient readers: there is a proposal for specific text (a proposed new section of 2141bis and update to 1737) at the end of this over-long note. --On Friday, June 14, 2013 15:51 -0400 Keith Moore <moore@network-heretics.com> wrote: >... > 1. RFC 2141 defined a type of identifier which it called > Uniform Resource Names or URNs. It also defined a syntax for > URNs which happens to begin with 'urn:', and rules for > creating and assigning and not reassigning URNs. Granted, > there had been discussions for years prior to that which used > the term URN more loosely and/or which proposed different > rules and different syntaxes, but 2141 represents a > rough-consensus result of those long discussions aimed at > understanding what URNs really should be. My reading of draft-ietf-urnbis-rfc2141bis-urn-05 is entirely consistent with that. It excludes the other uses of the term "URN" in favor of talking about 2141-type URNs, contains explicit statements about minimal necessary chances from 2141, etc. Now, if I thought 3986 were as bogus as you apparently do, or even found it mildly distasteful (which I do), and were holding the pen on 2141bis rather than Peter, I'd probably try to structure the introductory paragraphs to sound less like the main reason for the document was to bring 2141 into conformance with 3986. But, even if Peter rephrased things that way, they wouldn't change the spec significantly in most areas, only the vocabulary used to describe it. > 2. For various reasons, some people weren't happy with the URN > syntax or framework that IETF decided on. One result was > that the authors of RFC 3969 tried to broaden the definition > of URNs in that document. I assume you mean "3986" in the above and a few places below that you mention 3969. Whether that characterization is accurate or not --it goes almost without saying that some people (maybe exclusively the same "some people", maybe not) would disagree-- 3986 has now stood at a full Internet Standard for more than eight years without any significant challenge to its validity as a specification or claim to be an Internet Standard. Given that, I think URNBis and rfc2141bis are obligated to be consistent with it, at least unless we want to make the claim that 2141 (and 2141bis) URNs are really not URIs. I don't see any basis in the URNBis charter for either challenging the "URNs are URIs" assumption or for simply ignoring an applicable Internet Standard that is much later than 2141. > 3. Even if you accept (as RFC 3969 states) that the name URN > applies to things other than those defined in 2141, the > situation we were left with is both confusing and cumbersome > to discuss. The identifiers defined in RFC 2141 have unique > properties, by design, which do not necessarily apply to other > persistent URI-like identifiers. And it's cumbersome to > discuss RFC 2141 identifiers specifically (for the purpose of > updating RFC 2141, or for any other purpose) while still being > consistent with the language in RFC 3969. You end up either > saying "for the purpose of this document, URN refers to the > identifiers defined in RFC 2141, language in RFC 3969 > notwithstanding" (inviting confusion from those who miss that > restriction), or you end up saying something like "URNs as > defined in RFC 2141" every time you need to refer to that kind > of identifier. Yes, that is a bit of an editorial challenge. I don't see it as a serious substantive problem because I don't think anyone is claiming that draft-ietf-urnbis-rfc2141bis-urn should be anything but 2141bis... with adjustments made to conform 2141-style URNs to the requirements of 3986. I don't see that 3986 requires draft-ietf-urnbis-rfc2141bis-urn to adopt any fundamental definition for URNs other than that of 2141 and don't think the draft contains such a definitional change. So I'm not sure where we are disagreeing. > Regardless of what RFC 3969 says, the identifiers most often > associated with the term URN are undoubtedly those that begin > with 'urn:' and have a syntax consistent with RFC 2141. > Trying to say that there are URNs that don't begin with 'urn:' > is like saying that there are other HTTP URLs that don't begin > with 'http:'. Yes, it's true in a sense, but it's just silly > and confusing and there's no good reason to define things that > way. But, as far as I can tell, draft-ietf-urnbis-rfc2141bis-urn-05 doesn't do that and in fact carefully avoids it. So the above is either an attack on 3986 (and hence out of scope for the WG) or just isn't relevant. > Also, regardless of what RFC 3969 says, URNs as defined in RFC > 2141 weren't designed to be used with fragment identifiers or > query strings - or at least, we didn't manage to define how > that would work, and the reason that we didn't define how that > would work is because there wasn't an obvious interpretation > that didn't kill the properties we wanted for URNs. So we > declined to do that in RFC 2141, and in trying to generalize > URI syntax, RFC 3969 didn't address those issues either. And here we get to what I think are two of the core issues. Let's separate them into two questions: (i) If there were no syntax restrictions imposed by 2141, 3986, or anything else, would fragment identifiers and/or queries be appropriate for URNs? (ii) Is 3986 required to authorize fragments or queries in the URN syntax and, if so, is that a serious and problematic incompatible change? Let me address the second here and the first below. You read the statements in 2141 (or perhaps some oral tradition to which the rest of us don't have access) as prohibiting fragment identifiers and queries. When I read what I hope are the same statements, I conclude that there are lots of ways to say "prohibit". One of them involves the term "excluded", which is exactly what 2141 says about a number of characters in its Section 2.4. But what it says about "?" and "#" is to identify them with purposes defined in RFC 1630 -- no appeal to presumed 3986 revisionism is required-- and "has not yet debated the applicability and precise semantics of those purposes as applied to URNs". It then says "these characters are RESERVED for future developments". To me, those are statements that imply that those future developments are anticipated, even though no timeframe is given for them (and they might not happen). When the spec explicitly says those sorts of things about future definition of semantics and future use, it seems to me that comments like "weren't designed to be used..." (with the implication of "designed to _not_ be used") are a real stretch. "didn't manage to define how that would work, and the reason that we didn't define how that would work is because there wasn't an obvious interpretation that didn't kill the properties we wanted for URNs" might be true, but there is no evidence at all for it in 2141. All I can get from 2141 is that there wasn't consensus on particular semantics but that there was no particular reason to expect that semantics and consensus would not emerge in the future. Again, if the intent had been to say "we concluded that this was impossible without violating fundamental URN design decisions" or just "really bad idea" how do you explain 2141 not just saying that rather than talking about future use? Your memory or that of others as to what you thought the intent was at the time notwithstanding, 2141 is now a 16-year-old spec. I think what it actually says has to be taken at face value, especially if it seems to contradict what you believe was the intent. >... > 4. URNs (and when I say URN, I always mean the 2141 > definition), above all else, are intended to be persistent. > This is the fundamental property of URNs - not only that they > are persistent, but that the presence of the 'urn:' tag is an > indicator that the identifier may be interpreted as > persistent, and also that the persistence of that identifier > is an important property that should be maintained. > Extending the concept of URNs in such a way that the resulting > identifier is no longer persistent would break the essential > and fundamental property of URNs. I don't think anyone is disagreeing about that. At least I'm not and I don't see anything in the current version of 2141bis that does either. >... > 6. Fragment identifiers as used in existing content-types were > not designed to be persistent across changes to the document. > For identifiers to have persistence there needs to be some > discipline in assigning meaning to them initially and in not > reusing them in ways inconsistent with their originally > assigned meanings. If there is any such convention for > fragment identifiers, I'm not aware of it, but it certainly > isn't widely used. Given the existence and utility of other > kinds of identifiers which are not persistent and should not > be so, I also believe that persistent identifiers need to be > readily distinguished from non-persistent identifiers. > Again, I'm not aware of a convention for doing this, though I > hypothesize that one could exist. I think you are creating a strawman here and then demolishing it. There is no question, in my mind, at least, that using, e.g., a character offset fragment identifier would be truly stupid in a context that requires persistence, URN or otherwise. But that doesn't imply that, for some URN types (namespaces) stable fragment identifiers cannot be properly identified and defined. Second, even after rereading 2141 and 1737, I'm not sure how far one can go in the direction of an identifier that is "persistent across changes in a document" because even that statement takes one very far in the direction of needing a universal theory about what changes are still "the same document" and what changes make new documents. More important, as soon as you say "persistent identifier" and "changes to object" in the same context, you transport us all to to the edge, not of rathole, but of a bottomless pit. "Persistence" of an identifier is clear when there is a single, unique, object and the only thing that changes is its location. That is where one of the major threads that led to URNs started when web objects were considered -- URLs were just not right when one considered content that might be relocated, unchanged, from one server (and DNS name) to another. But, as soon as one talks about two objects that are alleged to be identical or changes in one object, the "persistence" and object-binding validity of the identifier become fairly deep questions that have plagued archivists, classifiers, and philosophers for centuries... questions that ultimately have no clear answers except in the axioms and postulates of object-type-specific axiomatic systems. Another key reason why I shut down the original URI WG in the hope of saving the work as that several of the efforts that were underway appeared to have comprehensive solutions to the "can it change and be the same thing" and "are two objects actually identical" questions in the critical path of a protocol or identifier type questions -- a WG that could safely be predicted to go on for years and indeed centuries and never converge was just not considered acceptable in the IETF of the time. Note that the above has nothing to do with fragments or queries. The problem exists in its full glory with URNs and URN-object bindings as soon as you say that something _is_ "designed to be persistent across changes to the document". Whether fragments make things any worse depends on what the namespace looks like and how things are designed. Let's take the fairly familiar example of a book. The publisher, library, and archival communities have established conventions about when two copies of a book are "the same". If they are "the same" then we would except either copy to represent a correct binding (and resolution response) to a given URN (or instances of that URN that match the 2141 equivalence rules). It is important to understand that "the same" represents conventions and that pushing the limits too hard leads to confusion and/or high-minded arguments. For example, two different editions are almost always considered different books for identification or classification purposes. Two different printings in which a few obvious typographical efforts are corrected in the later one are usually considered the same book. If one physical instance of the book is autographed by the author and another is not, they are the "same book" for many purposes but are certainly not "the same". The only possible definitive, convention or axiom-free definition of "the same" requires agreeing that all objects are unique and that no URN can be satisfied by more than one volume-object. (Of course, that is pretty close to an axiomatic statement too, but of a different type than we we talk about multiple satisfying objects.) But "all physically distinct volumes are unique" would, as a rule, make book-URNs useless for most of the purposes to which one would like to put them. Now, given that hypothetical book-URN (i.e., "urn:book:..."), whether a transformation of the content from bound paper form to electronic (and non-page-image) form is a change that still allows both forms to satisfy the same URN is another one of those philosophical questions that can be resolved only by convention. If the convention is that they are the same, than any fragment identifier that utilizes page numbers is obviously trash (and cannot be "persistent" across the two forms). But one that utilizes chapter numbers or even names doesn't make the URN any less persistent than it would be if fragment identifiers are not used or not allowed. Suppose, instead, that the convention is established that a translation is still "the same book". Now that convention would make me and probably others very anxious but I can find nothing specific in 2141 or even 1737 that prohibits it. The nervousness arises from the URN and namespace definition and not from the presence or absence of fragments. But it would _constrain_ the types of fragment identifiers that make any sense at all. For example, using the example above, chapter numbers would still be sensible as fragment identifiers (at least modulo a few i18n issues) but chapter names almost certainly would not. (Aside for the record: whatever that hypothetical "book" URN might be and how it might be defined, draft-ietf-urnbis-rfc3187bis-isbn-urn is not it. The latter identifies, among other things, a particular set of conventions about uniqueness and who gets to determine it together with the consequent rules about object-bindings. By doing so, it "solves" a lot of the more general issues described above but also covers over some differences that might be important for other purposes.) > 7. Thus, the combination of a URN and a fragment identifier > has no assurance of persistence. It follows that the > combination of a URN and a fragment identifier cannot be a URN. That does not follow at all. If it does, it leads to the interesting conclusion that a URN cannot be persistent enough to be a URN unless it names only a single and unique object with no possibility of changes to the object itself. It does follow that fragment identifiers to be used with (or as part of) URNs have to be designed with far more care about the nature of the namespace and what that namespace is used to identify than fragment identifiers for, e.g., URL-identified web pages have often been in the past. > 8. One can argue that the persistence of an identifier > consisting of a URN and a query string could actually be > persistent, if the resource named by the URN were defined in >... I think that query strings associated with URNs are far more problematic than fragment identifiers because fragment identifiers (at least by historical convention) point to something _within_ and object or some subdivision of it. Queries, as your example (not quoted here) suggests (at least as I interpret it), can, in principle, be used to take the rest of the URI as input and return something completely different. Defining such a situation in a way that would assure persistence is hard at best. Extending your example a bit and combining it with my hypothetical "book" URN, one could imagine urn:book:....?reverse-citation-index which would return all of the known books or articles that cite that book. Since a new citation could be added at any time, the practical persistency problems of that query are horrible even if the query could be well-defined (it can't, at least without a definition of the sources to be searched, but that is just a property of my choice of a simple, but sloppy, example). > 9. At any rate, existing resources that accept query strings > do not in general assure persistence of the results of such > queries. Thus, in general, a combination of a URN used to > name an existing resource, and a query string, provides no > assurance of persistence, and the combination should not be > considered a URN. This is where I wish the IETF could make a bam ban assertions that are not backed up by citations or evidence that is visible to the community. Let me state that in a way that more closely aligns with the reality I've seen and that has been pointed out to me. "Some existing resources that accept query strings (or fragment identifiers) do so in ways that are ill-considered and that do not assure persistence of the results. Other existing resources and uses do and are fine. The question is whether it is appropriate to try to ban the latter because there are a certain number (even a large number) of bad examples or if we should try to define things so that the good cases are allowed and we are more clear about why the bad cases are bad.". > The above, I submit, is reality. (There are probably some > other relevant and salient points which are also defensible as > reality.) Note that, while our realities may differ, part of mine is that I am extremely positive that, if the IETF says "don't do that, we think it is evil", we will mostly be ignored and fragments and queries in things that people will persist (sic) and that people will persist (sic) in calling URNs and using "urn:namespace:..." syntax to describe. If we say "the syntax is valid but one must be really, really, careful about how the things are used to ensure that persistence is maintained" and, ideally, explain why that is important, then we will affect the behavior of at least some of those who are trying to do The Right Thing. If se say "don't do it because we said so" and ban the syntax, it is nearly certain that we will be ignored by existing uses of fragments and/or queries and by lots of potential ones. And saying "even though that thing that conforms to the URI syntax and starts in 'urn:', it isn't a URN and you are forbidden to call it one" is even more useless. At least in my pragmatic, observational, reality. In the interest of even a weak approximation to brevity, I'll skip comments on your brainstorming for now. I think this does suggest that 2141bis needs an additional section that says something like the following. I've written this first cut on the assumption that we will allow fragment identifiers in queries in the syntax, but I believe, for the reasons explained above, that most of the material is needed regardless. Even if we decide to not allow fragment identifiers and/or namespaces, the relevant text below could probably usefully be adapted into a "why not" explanation (rather than having to rely on an IETF assertion of authority). Some of the other material above may be useful; I'd be happy to see it in the document if Peter and the WG believe that would help. I believe that, if this type of material is added to 2141bis, it should be explicitly identified as updating RFC 1737 by clarifying issues associated with the "requirements" of that document. "The notion of 'persistency' of a URN and its relationship to whatever resource it identifies is key to the nature of URNs as defined in this document and in the original functional specification [RFC1737]. That notion and the associated relationships are, however, somewhat elusive and are likely to depend, in practice, on conventions and the properties of particular namespaces. For example, if one can speak of replicated versions of a resource, transformation of a resource into a different form without affecting its content or nature, or even changes to a resource that don't alter what a URN identifies, one must either establish very specific conventions or move into the fundamental philosophical problem of when two objects can properly be considered "the same". In more practical terms, if replicated objects are considered different for some purposes but the same for others, the "Global uniqueness" criterion of RFC 1737 Section 2 may easily be violated. so the conventions about identity and uniqueness are important parts of the namespace definition even though, in practice, they may be better articulated for some namespaces than for others. "It is important to note that universal conventions are almost certainly impossible: there is no reason to assume that the conventions that apply for one namespace will apply to another. "These issues are a large part of what make fragment identifiers and queries problematic for many URN namespaces and create a requirement for very careful and namespace-sensitive definitions in the namespaces where they are allowed. A badly-designed fragment identifier may be inconsistent with the stability and persistence of a putative URN if replication or any changes all all to the names object are allowed. A badly-designed query string may require reference to information or resolution of objects outside the namespace, thereby undermining multiple key URN properties as identified in this document, RFC 1737, and elsewhere. In addition if ether were to follow trends common in contemporary usage of queries and sometimes fragments in URLs, the requirements of Section 3 of RFC 1737, especially those for Human transcribability and Simple comparison, could easily be violated. "To the extent feasible, definitions of particular URN namespaces should be clear about the relationships between the URN, the namespace, and the underlying objects; about the implications of replication and various changes to the named resources; and, if fragment identifiers or queries are allowed, how they should be constructed and constrained to preserve identifier persistency and to meet the other requirements of this specification and RFC 1737." That is obviously just a first cut, but maybe it will help us understand at least some of where we disagree, what the problems actually are, what problems can and cannot be solved (especially in a general, rather than per-namespace, way), and how to move forward. best, john
- [urn] Thoughts on fragments, queries, and new URN… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… SM
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… Julian Reschke
- Re: [urn] Thoughts on fragments, queries, and new… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… SM
- [urn] Fragment IDs and media types (was: Re: Thou… Keith Moore
- Re: [urn] Thoughts on fragments, queries, and new… Svensson, Lars
- Re: [urn] Thoughts on fragments, queries, and new… Juha Hakala
- Re: [urn] Thoughts on fragments, queries, and new… John C Klensin
- Re: [urn] Thoughts on fragments, queries, and new… Juha Hakala
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre
- Re: [urn] Thoughts on fragments, queries, and new… Juha Hakala
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre
- Re: [urn] Thoughts on fragments, queries, and new… Juha Hakala
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre
- Re: [urn] Thoughts on fragments, queries, and new… Juha Hakala
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre
- Re: [urn] Thoughts on fragments, queries, and new… Peter Saint-Andre