Re: [URN] Re: URI documents

Patrik Faltstrom <paf@swip.net> Sun, 28 December 1997 12:44 UTC

Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id HAA03236 for urn-ietf-out; Sun, 28 Dec 1997 07:44:13 -0500 (EST)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id HAA03229 for <urn-ietf@services.bunyip.com>; Sun, 28 Dec 1997 07:44:09 -0500 (EST)
Received: (from daemon@localhost) by mocha.bunyip.com (8.8.5/8.8.5) id HAA12591 for urn-ietf@services; Sun, 28 Dec 1997 07:44:08 -0500 (EST)
Received: from nix.swip.net (nix.swip.net [192.71.220.2]) by mocha.bunyip.com (8.8.5/8.8.5) with ESMTP id HAA12588; Sun, 28 Dec 1997 07:44:05 -0500 (EST)
Received: from localhost (paf@localhost) by nix.swip.net (8.8.8/8.8.8) with SMTP id NAA03652; Sun, 28 Dec 1997 13:43:21 +0100 (MET)
Date: Sun, 28 Dec 1997 13:43:21 +0100
From: Patrik Faltstrom <paf@swip.net>
X-Sender: paf@nix
To: Larry Masinter <masinter@parc.xerox.com>
cc: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>, harald.t.alvestrand@uninett.no, moore@cs.utk.edu, uri@bunyip.com, urn-ietf@bunyip.com
Subject: Re: [URN] Re: URI documents
In-Reply-To: <34A611C2.B197EF17@parc.xerox.com>
Message-ID: <Pine.GSO.3.96.971228131924.3210G-100000@nix>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-urn-ietf@Bunyip.Com
Precedence: bulk
Reply-To: Patrik Faltstrom <paf@swip.net>
Errors-To: owner-urn-ietf@Bunyip.Com

On Sun, 28 Dec 1997, Larry Masinter wrote:

> > What some of us
> > ask for are your document divided in three so it is crystal clear what is
> > a definition for URIs, what is URLs and what is URNs. 
> 
> The proposal (c) divides (b) into two, not into three. I'm
> guessing you really meant "two".

Well, we do have a document which talks about the syntax for URNs aswell.
I.e. yes, it is true that (c) splits the document on the table in two, but
the result will be three documents.

> In an earlier message, it sounds like you are saying that you have
> reviewed (b) (draft-fielding-uri-syntax) and (c) (Leslie's two-documents),
> and that "(c) is the only working solution". Do you mean to say
> that you also have found (a) and (d) unacceptable?

Yes.

What I mean is that I think that draft-fielding-uri-syntax should be split
into two, and as arguments for that I did read the original document --
and then Leslies suggestion on how a split should be done.

> Do you find it less, or more confusing, to have both: 
>
> >    Many URL schemes have been defined.  The scheme defines the
> >    space of the URL, and thus may further restrict the syntax and
> >    semantics of identifiers using that scheme.
> 
> in the URL document, and also
> 
> >   Many URI schemes have been defined.  The scheme defines the
> >   namespace of the URI, and thus may further restrict the syntax and
> >   semantics of identifiers using that scheme.
> 
> in the URI document?

If we just decide that a cut and creation of two documents makes the final
result better, we can then work on the details. I think there are things
in Leslies cut which did not make 100% sense, mostly (I guess) because
text should be rewritten (like the above) when we have two documents.

As it is now -- I do only argue for a separate document about URI syntax
from one about URL syntax.

I think Leslies suggestion did show that two documents were better. I can,
if we go down that path, come with perticular suggestions what should be
better in those.

I think for example that some "may" in the URL document should be "must".

> > This is needed because the document currently under the name of a URI
> > syntax document talk so much about URLs, and use a terminology that is
> > only valid for URLs, that confusion occurs regarding, if nothing else, the
> > difference between a URL and a URN. It does not help that the document
> > have "may" all over the place.
> 
> Can you say what the 'confusion' is? There's a section 1.2 URI, URL, and URN,
> which attempts to discuss the difference.  Is this confusion also in
> place for (d) or (a)?

The confusion is when there are so many parts that talk about (today)
URL-specific things as URI-things, but with a "may". One example is
relative URLs, which I think should be described as relative URLs, and not
relative URIs. The same thing about fragments, and details on how to
construct and parse query/username etc constructions. It sounds like if
these things -- even though they are preceded with a "may" -- should apply
to all URIs, and more specifically to the _design_ of a URZ, URB, URX or
whatever.

> > I think it is definitely better if we have documents about URIs, URNs and
> > URLs, so the number of "may" can be limited to a minimum when we talk
> > about so important things as grammars and what characters are allowed, how
> > encoding is done and how to handle/accept things like fragments, queries
> > and relative addressing.
> 
> There are no fewer "may"s in the combined (c) than there are in (b).

Well, I think they might be able to be fewer. I might be wrong. I would
like to say that _IF_ certain functionality should be able to be applied
to a URL scheme, it _MUST_ syntactically be written in a certain way. That
rule might not be possible to create if we also include URNs -- because
the URN namespace itself might have rules and constructions which makes
that rule not appliable.

Also, because a URN and a URL are different things (as a URN can be used
in a number of ways, N2Ls, N2L, N2C,...) they are also used differently --
and certain operations one can apply to a URL can not be applied to a URN
and vice versa.

> As far as I can tell, there is no proposal to have a different
> set of allowed characters in "URI" than in "URL", so I'm not sure 
> waht you mean by "what characters are allowed". Also, I don't see
> any proposals to have a different mechanism for encoding for URNs
> and URLs. Are you suggesting there might be such a thing?

This is from a discussion I had with the Handle people, which didn't
understand why we when talking about URNs did say that the character set
in use should be UTF-8 encoded UNICODE 2.0, when so many different
character sets did work when using HTTP URLs. Well, this is because when
getting a URL, you normally (there are exceptions of course) get them in a
HTML document as a reference. That reference is then, as-is, passed back
to the same server as the one that did pass the reference to the client,
so noone have to parse the stream of bytes passed back and fourth over the
wire. The URL, if displayed on the screen in the clients browser, might
look funny, or like garbage, but it will work. This as long as the client
doesn't change the stream of bytes.

But, when talking about URNs, the URN will be inside some document, say a
HTML one. That URN will _NOT_ be passed to the same HTTP server, but to
some resolver (in the case of a N2L resolution) which must understand what
characters are represented in the name-space-specific string, so a search
can be done, which in turn will result in the URL which is sent back to
the client. That URL is then what the browser in this example sends back
to the HTTP server to get the next HTML page.

As you can see in this example, we have when using URNs a third party
involved -- or at least some function which acts as a resolver which in
this simple example turns the URN into a URL which is then used as normal.

Because of that -- it is definitely needed when talking about URNs to
agree on what character set and encoding is used, as the parties involved
have to be able to parse the characters (not the bytes) sent in the URN.

> I agree that the query forms are unlikely to apply to URNs, that relative
> addressing is problematic, and that fragments are controversial, but your
> message indicated that you belive there are more extensive differences
> that are not just restrictions, which would be a much more serious issue.

All of these issues you rise are things that we know how to handle in the
case of a URL, and character set issues are things we know when talking
about URNs. We do though not know in detail how things like relative
addressing, fragments, query forms etc etc should, if ever, applicable to
URNs, so I rather have one document for URIs, one for URLs and a third for
URNs.

It does just make so much more sense -- especially when the URLs and URNs
are so different regarding usage. I.e. many things are more mature when
talking about URLs. I do not say that the URN people (including me) should
invent their own way of handling things different than URLs. That would
be stupid. Totally stupid. I just say that some things might NOT apply to
URNs, only to URLs. Who knows if they will apply to URXs? So, let write
about URLs in one document and URNs in one.

I also think it is a good idea to talk about things like how to implement
a parser for URLs that also handle URNs and URXs, like what Roy has done,
but that is something different than talking about the syntax for each one
of the constructs. That is a BCP which might also be needed.

   Patrik