Re: [urn] Tomorrow''s "URNs are not URIs" topic

Julian Reschke <julian.reschke@gmx.de> Fri, 25 July 2014 14:34 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B7B7F1B295A for <urn@ietfa.amsl.com>; Fri, 25 Jul 2014 07:34:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1
X-Spam-Level:
X-Spam-Status: No, score=-1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, J_CHICKENPOX_31=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R1rQkVeJ1WcC for <urn@ietfa.amsl.com>; Fri, 25 Jul 2014 07:34:09 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9E0E01B281C for <urn@ietf.org>; Fri, 25 Jul 2014 07:34:08 -0700 (PDT)
Received: from [31.133.141.13] ([31.133.141.13]) by mail.gmx.com (mrgmx001) with ESMTPSA (Nemesis) id 0MKprU-1XAgZM0vGy-0007Tx; Fri, 25 Jul 2014 16:34:00 +0200
Message-ID: <53D26AD2.8050302@gmx.de>
Date: Fri, 25 Jul 2014 16:33:54 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: John C Klensin <john-ietf@jck.com>, "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, urn@ietf.org
References: <C7BA827407347B467A013330@JCK-EEE10>
In-Reply-To: <C7BA827407347B467A013330@JCK-EEE10>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:UOElLYKIJXGC3DgneuhysQ8Y6rk4syBVg720D5j5k0Wp3LZ0FxI +Ly/o3bY2DUtY3JAtuJr8YKMYanG9H4IeXBbYFfR2LBPltnkOQoCkkMOaJtsIl9w0DqXV7m KD9RexNivB0NO0GrRNnhc4pXl8ScNf4gPhwtVhnFESHXLHUhZcKNDLtryExZLeDPb0sBIfA r9e4K+xYrzE/pKNlcOMOA==
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/oazEr_R-JXYL5_t95cyc82MomzA
Subject: Re: [urn] Tomorrow''s "URNs are not URIs" topic
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Jul 2014 14:34:13 -0000

Hi there,

due to time constraints I'll have to focus on a few specific lines below...

On 2014-07-25 13:34, John C Klensin wrote:
> Independent of the problematic restrictions (see above and the
> slides), 3986 contains a lot of "stuff" that has no
> applicability to 4121 URNs and that could easily be banned from
> expanded URNs without any negative effect.  Take the dot-removal
> stuff.  It would be fairly simple to just ban those from URNs as
> we expand them; as far as I can tell the very restricted forms
> allowed by 2141 ban them now.  After several readings, it is not
> clear whether, once one expands because 2141, such a ban would
> actually be conformant to 3986.
> ...

Does dot-removal play any role outside resolving against a base URI, and 
as part as a totally optional comparison function?

> All the machinery for relative URIs (really relative URLs, IMO).
> are another example of something that is unneeded for URNs,
> could cause confusion if used, but do not appear to be ban-able
> on a per-method basis.

"Unneeded" is not a reason for a split. "Could cause confusion" is more 
severe, but you really need to be more concrete about it.

Generic URI processing code *does* handle relative resolution in a 
generic way. Is this a problem? And what is the intended way to solve this?

If you say "URNs are not URIs" this implies that (1) you can't put a URN 
where a URI is expected anymore, and (2) you can't use generic URI 
processing code. Is this really really the intent?

> At a different level, the syntax production for <patH> (in
> Section 3.3)  appears to use an RHS matching rule and/or an
> implicit name-component match to link it to the production for
> <hier-part> at the beginning of Section 3. I've seen syntactic
> metalanguages that allow things like that, but ABNF isn't one of
> them (nor, AFAIK, is there anything in its family that does).

If you believe that there's a problem in the ABNF, by all means submit 
an erratum. That being said, I don't understand what the problem is but 
maybe you can explain it today in more detail.

> There is no production, or set of productions, that I can find
> that links through from <URI> (beginning of Section 3) to
> <path>.  How confusing that is depends on experience and
> perspective, and it may be worse for people with lots of
> experience reading ABNF and metalanguages of the same general
> style and easier for people who just mentally shrug and move on.
> But it is not correct.

Why is it not correct?

>>> To review an example in advance of the slides, 3986 gives a
>>> very specific interpretation to what people --and many
>>> existing URL applications-- consider a sequence of queries,
>>> e.g., ?x abc?y def?g=http://foo.example.com/?
>>>
>>> 3986 allows the sequence but treats it as a single query
>>> string.
>
>> Well, yes. That's not because RFC 3986 didn't understand URNs
>> or whatever, but because there are virtually no such strings
>> around.
>> Everybody on the Web writes queries like the above as
>> something like:
>>
>> ?x=abc&y=def&g=http://foo.example.com/
>
> And they often take, and treat, it as an ordered sequence.  The
> issues arises if one has an unordered sequences of, e.g.,
> name-value pairs that one wants to express as or embed in
> queries because queries are the only tool that 3986 provides.
> Out of curiosity, is "Everyone on the Web..." do it another way
> in http UPLs, identify one of those factual errors?

I didn't get what you think is a problem here.

> Yes, and I've said things that agree with that statement several
> times.  On the other hand,
>
> (i) Section 6.2 discusses a comparison ladder and appears to
> say, approximately, "apply as many of these operations, in
> order, as you  like, with the understanding that going further
> up the latter will cost more processing time but eliminate more
> false negatives" (see Section 6.2 for the exact statement).
> Noting that several of the rungs on that latter are not
> applicable for URNs (but would be harmless if mechanically
> applied), I can find nowhere in Section 6 that says "compare any
> way you like" or "additional types of comparisons are
> encouraged, or even allowed, on a per-method basis" except for
> Section 6.2.3, which does not appear to me to be applicable to
> URNs.

"Other scheme-specific normalizations are possible." - 
http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.6.2.3.p.4

Why is that not applicable???

> (ii) It appears to me that the statements, e.g., in Section 3.4,
> make the query component atomic as far as 3986-based equivalence
> checking is concerned and do not make allowance for per-method
> parsing, much less reordering in some method-dependent canonical
> way, of that atomic unit in making comparisons.  I could find
> nothing in Section 6 that contradicts that and I've now read it
> carefully three times in the last 48 hours.   If you can, please
> point me to it because I'm sincerely wondering what I missed.
> If not, can we pull back a bit from statements like "no basis in
> facts"?

I agree with Martin. Nothing in Section 6 disallows scheme-specific 
additional re-ordering.

>> For the schemes where queries are mostly used (http: and
>> https:), queries are mostly interpreted by software on the
>> server, and I can tell you that most decent Web frameworks
>> (starting with Ruby on Rails, which I know best) do exactly
>> the same with
>>
>> ?x=abc&y=def&g=http://foo.example.com/
>> and
>> ?x=abc&g=http://foo.example.com/&y=def
>>
>> and all the other permutation variants.
>
> Here we get into a gray area, at least IMO, because 3986 clearly
> allows a method-specific implementation to do whatever it wants
> to interpret a query string, including treating its components
> any way it likes.  But that is not part of the 3986-level
> parsing or comparison procedure, it is something that 3986
> allows in the process or accessing or considering the object.

It is simply out of scope of the generic syntax and semantics.

> The first sentence of Section 3.4 appears to say that, although
> I'm not positive that is what it means.   That takes us back to
> a discussion that has been going on, IMO, unproductively, for
> more than 20 years.  If one believes that URNs are either bogus
> or really no different from http-URLs (and I note that all of
> your examples are based on the latter), then, yes, all of this
> works out except, maybe, the option Section 6 gives for
> determining equality by fetching the resources and comparing
> them rather than syntax.  If one believes that URNs are a
> different sort of more abstract naming critter, then it becomes
> less clear what that first sentence means (if anything).

Again, I have no idea what the concrete problem here is.

>> There's no need to change. It's already mostly done that way
>> where it matters.
>
> And, clearly, one option for the URN WG is to say "mostly, where
> it matters, no one pays much attention to what 3986 says in
> detail, so we should follow common practice and ignore it".   I

Is there something specific in 3986 that you think is "commonly 
ignored"? If yes, what?

> However, if one wanted to think about this a different way (I
> think it would drag things out and I don't particularly want to
> do the work) the comparison part of the problem could be
> addressed by
> (i) Modifying Section 3.4 to explicitly allow particular methods
> (or just URNs) to impose requirements that subdivide the query
> as long as the "from '?' to '#' or end" rule is not violated and

There's nothing that needs to be modified here.

> (ii) adding a 6.2.5 that applied to those method(s) only.

6.2.3 already allows URI-scheme-specific comparisons. I really don't see 
a problem here.

>> Slide 4: "Effectively imposed retroactive requirements on
>> URNs": This isn't true. RFC 3986 didn't impose anything on
>> URNs. It was written carefully to make sure that URNs fit into
>> the overall picture.
>
> I really don't want to have the discussion that leads off
> because it involves opening up old arguments and wounds but,
> since you have said similar things several times and challenged
> me (and others) to refute it...   The problem in the last
> sentence is that there is disagreement about what URNx are and
> hence what overall picture they fit into.   At least one of the
> listed authors of 3986 is on record as not believing that URNs
> are legitimate, that the distinctions some people are trying to
> make about them make any sense, and, indeed, that there is
> anything one can do with a URN that cannot be done with an HTTP
> URL.  URNs that fit into _that_ overall picture could be pretty
> easy and totally irrelevant to URNs as seen by those who think
> they are different and that the differences are important.

I happen to agree mostly with that author, but then it's totally unclear 
how this affects the technical argument. Can we please stick to what RFC 
3986 says, as opposed to what some of the authors said somewhere else?

>> Slide 12: "If WHATWG / W3C succeed in killing 3986": W3C
>> doesn't want to till RFC 3986. WHATWG will only succeed to the
>> extent that the IETF allows it.
>
> No, they will succeed to the extent that the marketplace accepts
> what they are doing and/or the browser vendors accept and
> implement what they saying and not more openly developed
> consensus standards from other areas.  That is more likely to be
> the case if W3C endorses and publishes what they produce with
> minor or no modifications and that seems to be on track.   At

The IAB and the IESG met with Philippe Le Hegaret on Monday to discuss 
this, and I have a different recollection.

That being said: pulling the WhatWG issue into this discussion is a 
distraction. The reason Anne van Kesteren and others work on their own 
spec (IMHO) has absolutely nothing to do with URNs.

>> The best way to think about
>> that spec is to see it as a detailed implementation spec to
>> avoid or reduce browser divergence. It contains all the stuff
>> a browser implementer needs to know, but virtually nothing
>> else.
>
> While you can look at it that way, statements made by several of
> the leaders of that group over the last couple of years suggest
> much broader intent and applications as well as a view that the
> world resolves around the web and that either nothing else
> counts or everything else should get in line.  The recent flap
> about deprecating the IANA Charset registry is, IMO, symptomatic
> of that pattern.

I agree with what they do is problematic. But can we please focus on 
URNs and RFC 3986?

> I think your comment above reflects the same problem.  If one
> can say "URL" and mean "HTTP[S] URLs in a Web Browser context"
> but then claim it really applies to all URIs including URNs we
> are, at best, in very bad shape wrt communicating with each
> other and probably in worse shape to consider the needs of
> communities whose needs are very different from those of the
> HTTP-URL-in-browser community.

Do you have a proposal how to fix this problem?

>> Slide 13: "Query is always part of equality comparison –
>> Fragments apparently never": Where did you get that from? I'd
>> guess that all the examples in RFC 3986 work that way, but
>> where does the spec *mandate* it? I'd guess nowhere.
>
> What do you think it says?  See the comments about the
> comparison ladder above.

I agree with Martin. Please be more specific.

>> Slide 13: "Urn:foo:bar?a=b?c=d and Urn:foo:bar?c=d?a=b Never
>> compare equal": Wrong, it's up to who is comparing them to
>> decide.
>
> Not as I read 3986.  See above, note particularly the 3986 text
> about the balance between false positives and false negatives,
> and explain to me where you get "up to who is comparing them to
> decide".   Again, if it is the consensus of the community that
> 3986 provisions are meaningless in practice, we don't need the
> "URNs are not" draft, we need some notes (independent of this
> WG) to move 3986 to Historic as no longer of
> protocol-specification interest.

Again, Martin is right. Software that is aware of URN-specific rules can 
normalize query strings any way it likes (and is defined for that scheme).

>> Slide 16: "If we want the third, not with 3986." ("the third"
>> is "As an additional parameter (if we allow it)"): Why not
>> with 3986? I don't think there is anything in RFC 3986 that
>> would disallow this.
>
> 3986 says, pretty clearly I think, that the path ends with a
> "?", "#", or the end of the string.   Queries are all of the
> text between "?" and either "#" or the end of the string and
> cannot follow fragments, and fragments end with the end of the
> string.   So I don't see where else 3986 would allow putting an
> additional parameter.

If it hurts, don't do it.

> ...

Best regards, Julian