Re: New URC Specification is ready....

"Ronald E. Daniel" <rdaniel@acl.lanl.gov> Wed, 06 July 1994 22:23 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa12453; 6 Jul 94 18:23 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa12449; 6 Jul 94 18:23 EDT
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa21922; 6 Jul 94 18:23 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA25618 on Wed, 6 Jul 94 13:31:34 -0400
Received: from acl.lanl.gov by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA25606 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Wed, 6 Jul 94 13:31:16 -0400
Received: from idaknow.acl.lanl.gov (idaknow.acl.lanl.gov [128.165.161.102]) by acl.lanl.gov (8.6.8.1/8.6.4) with ESMTP id LAA19294; Wed, 6 Jul 1994 11:31:10 -0600
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Received: (rdaniel@localhost) by idaknow.acl.lanl.gov (8.6.8.1/8.6.4) id LAA02219; Wed, 6 Jul 1994 11:31:09 -0600
Date: Wed, 06 Jul 1994 11:31:09 -0600
Message-Id: <199407061731.LAA02219@idaknow.acl.lanl.gov>
To: ccoprmm@oit.gatech.edu
Subject: Re: New URC Specification is ready....
Cc: pays@faugeres.inria.fr, rdaniel@lanl.gov, uri@bunyip.com

Hi Michael,

Really good job on the URC spec. There are, of course, a few nits to pick,
please don't take the size of my response to mean that I think poorly
of your work. Overall it looks really good.

General comments:

I agree with pays@faugeres.inria.fr that the whois++ stuff should not be
in the final version of this document, it should be in a seperate document.

I also agree with him(?) in not liking the precedence rules approach.
Reasons why are are give below.

I agree with you and disagree with him(?) about prefering a single value
for each attribute.  Your example of Author(s): is a perfect illustration.


Now, on to the specific comments.

> 2: Design Goals
> ===============
...
>  o Simplicity: A URC specification must be simple enough for practically
>    anyone to understand or to encode. This allows users to encode and
>    maintain a given URC without the need for esoteric computer science
>    knowledge. 

I agree that this is a *very* worthwhile goal. I don't know that the
precedence rules you define later meet this goal. Right now they are
not too bad. However, I can certainly think of other elements that
could be very useful and would require that proposed elements be
changed to have precedence rules. (An example? Abstract. Currently it
is a text field. However, since the best abstract for a picture is
a thumbnail, the best abstract for a movie is a trailer, etc. it is
not too hard to imagine making Abstract: into something with precedence
so that the abstract can have a content-type, length, URLs, etc.) As
the system is used over the next 15 years there will no doubt be lots
of things added, and a desire to change the operation of particular
elements. I don't think implicit precedence rules are going to age
gracefully.

>  o Compatibility: Since URCs will be utilized by vastly different systems
>    on vastly different networks it must be encoded in such a way as to
>    allow very complex systems to communication complex information
>    via very simple gateways and access methods. 

Perhaps a mention of base-64 encoding, quoting rules, etc. is necessary?
Yuck. See what comes out of the URL discussion and see if it can be
adopted.

>  o Use of existing and developing technology: In order to be able to
>    implement something soon, an encoding specification should allow
>    existing systems to be easily retrofitted to use URCs. The use of
>    existing systems that already support object similar to URCs is
>    encouraged. 

A nice goal. I certainly don't think we should have gratuitous differences
with existing and developing systems (such as whois++). However, I think
there are fundamental requirements of the system that must be addressed, and
that whois++ does not currently handle.  (Here we go again :-) 
The fundamental requirement that does not seem to have received adequate
consideration is, in a word, security. There has been no consideration of
how to prevent me from issuing URNs that apparently originate from any
publisher I choose. There has been no discussion of how to prevent me from
forging authorship of resources. There has been only the most trivial
discussion of ensuring the integrity of the resources I retrieve. An MD5
field in the URC is nice, but if I can get into the URC info I can provide
an MD5 that is accurate for what you get when you access my URL. Too bad that
what is hiding at my URL is not what the original author intended.

I *strongly* believe we need to give security serious consideration now,
rather than try to hack it in later.

> It then becomes a simple
> exercise of selecting the equality character and specifying some method of
> encoding special situations such as character quoting and line continuation.

Nicely put. As I mentioned earlier, some discussion of character quoting
would not be amiss. Perhaps just a statement that you are monitoring the
character handling discussions in other working groups and will adopt one
of the resulting approaches?

> Experimental attribute_names should be encoded with the
> [X-attribute_name] notation.

This reminded me of a purported bug in Mosiac - case sensitivity where
X-foo is different than x-foo. Should we specify right now that
attribute names are case insensitive? What does MIME say?

> There are no attribute/value pairs that are required to be a part of a URC. 

Not even a URN? Seems mighty handy to require this so that when people do
searches on Author, URL, ... they end up with something they can do more
with.

> It is intended that any additions or subtractions from this list will be
> handled by the Uniform Resource Identifier Working Group. It is also
> intended that this list should be extended since the full usefulness of
> URCs is beyond the scope of these pairs listed. 

We should also start a registry for the current types so that people
don't have to wade through the archives of the mailing list to find out
what is the current set of well-known attributes.

>  o URL:
>    This pair must conform to the current Uniform Resource Locator
>    specification as defined in the URL Internet-Draft[Berners-Lee 94-1].
>
>    Example:
> 
>    URL:http://www.gatech.edu/ietf/urc.encoding.html

What? No angle brackets? Aren't these responses text/plain?   :-) 
More seriously, should we get a MIME Content-type for the URC
responses? Might make dealing with them easier.

>    Since many cultures have different ways of writing names
>    there are no requirements on how a name should be written. Thus it is
>    encouraged that users encode names in the most common format i.e.
>    first, middle and last in English societies.

Nicely put. 

>  o TTL:
>    This pair encodes a Time To Live measured in seconds. Infinity is
>    denoted by the '+' character. This element references the attribute/value 
>    pair directly preceding it (see section 4) and is meant as a caching aid.
> 
>    Example:
> 
>    TTL:86400

This works well for the resources identified by a URN or URL. However, the
URC information for a resource is something that will also be cached in
order to avoid unnecessary expense in the URN->URL resolution. We need a
means of specifying the TTL for URC info, which will not be the same as the
time for the resource itself. Perhaps the TTL associated with
the URN tells the time to live for the URC info, while TTLs associated
with URLs tell the time to cache particular resources. e.g.:

  URN:IANA:foo:bar:123434523
  TTL:36000                       // Cache URC info for 10 hours
  URL:http://www.bar.foo/huh.html
  TTL:+                           // Cache the html until LRU rules kick it out.

BTW - what are the units in the TTL field? Seconds?  Microseconds?

>  o Abstract:
>    This pair encodes a short abstract about the given resource. Any
>    characters are allowed. Line continuation follows normal rules.
> 
>    Example:
> 
>    Abstract: 
>            This document explores the various flight patterns and speeds of
>     unladden African and European swallows. A companion document concerning
>     the relative velocities of swallows ladden with coconuts is available.

Pretty soon we could imagine using thumbnails as abstracts of images,
trailers as abstracts for movies, etc. How about we change Abstract:
to be the URN for an abstracted version of the resource, and Text-Abstract
to be an ASCII description of the resource. (Of course, this doesn't
solve the problem of knowing what language the text description was
written in).

>  o Version header field:
>    In order to give some ability to utilize different version schemes it is
>    recommended that the Version field be given the idea of schemas so
>    that machine based algorithms can be used to differentiate resources.
>    For this specification only one schema is given but more can be
>    developed.
> 
>     o Schema 1: decimal This schema specifies the use of the
>       standard decimal type of version enumeration. For example,
>       this is version 1.0 of this document. At the authors or publishers
>       whim it can change to version 1.1 or even 2.0.
> 
>       Example:
> 
>       Version:decimal:1.0

Really good plan to specify the scheme. The whole issue of versioning could
stand a good deal of serious thought. We may want Supercedes: and
Superseded-By: fields to hold URNs, but that is a back-burner thing that
should go through an X- phase first.

> ... there must be some structure to a URC. The easiest and most elegant
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> method is simply to introduce a set of precedence rules onto the above set of
> attribute/value pairs. 

Well, this is a point where reasonable people may disagree. As I mentioned
earlier, I am concerned about how precedence rules might have to change
in the future. Explicit delimiters do not have that problem. Some
people are concerned about complexity with explicit delimiters, that
has not been my (admittedly limited) experience.

> As above, this set of precedence rules is extensible by the IETF URI Working
> Group. 

We probably need a protocol version identifier in the URC so that the client
will know what precedence rules and attribute definitons to use.

>  o URNs have precedence over all other pairs, except for LIFNs.

OK. 

>  o LIFNs are equal to URNs in precedence only.
> 
>    An LIFN has many of the same characteristics of a URN. While there
>    is no current specification of exactly what a LIFN is or does this paper
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>    will attempt to place them in the structure of a URC. This is definitely
>    up for discussion.

Well, I'm just an Okie, but if we don't know what a LIFN is or what it
is supposed to do, it seems kind of strange to be specifying a standard
field that clients have to interpret consistently. How about we let
people play around with X-LIFN, and assume that it has no precedence
rule?

>  o URLs have precedence over all other pairs except for URNs
>    and LIFNs until the occurrence of a new URL.

I can go with this. 

>  o TTLs have no precedence over any other attribute/value pair
>    and therefore describe any element directly before it.

I am not comfortable with this for two reasons. The first is the need to
be able to put a TTL on the URC info. The second is that I think you are
being too restrictive with the placement of the TTL field. It, along with
things like Content-type and Content-length, should talk about the
*section* they are currently in. If we can elaborate on the notion of
sections we may be able to come up with a scheme that will generalize
over many extensions to the set of known elements. For example,
divide the fields into "Section-starting fields" and filler fields.
Section starting fields, in order of priority, are URN and URL. Sections
can nest, but cannot partially overlap. Therefore, the second URN: field
in the example below closes both the first URN section and the URL subsection.
  URN:bla
  Author:foo
  URL:bar
  Content-type:text/html
  URL:baz
  Content-type:text/plain
  URN:bletch
  

>    Example: 
> 
>    URN:IANA:626:oit.5674
>    TTL: +
> 
>    In this example the first TTL is not needed since a URN has an infinite
>    time to live. This one is simply used as an illustration.

I would recommend you take out the TTL: + field in the example above. The
later TTL field showed a legitimate use of it, this is not a good use
unless such a positioning is adopted as the TTL for the URC info.

> 4.1 Other possible elements and precedence rules
> ++++++++++++++++++++++++++++++++++++++++++++++++
...
>  o Collection:

I suppose we can go around and around on this again, but probably
the best thing to do is use X-Collection for awhile, as well as
X-References and X-Related and see which one(s) receive enough
use to promote to full status.

>  o Authoritative:
> 
>    This pair will give the location of the authoritative URC server for the
>    given URN. This will serve as a pointer of last resort for the URC of
>    the given URN. This would require some method of being able to
>    identify a given URC database server.

I don't see this as being necessary or desirable. First of all, to
ever see it you have to have contacted *some* sort of URC service. My worry
is that people will use this as a quick hack rather than a last resort
and won't do a proper job on the distributed aspects of the URC
service.

> Possible future precedence rules:
> 
>  o Multiple URNs in the same URC denote simple relationship
> 
>    This is simply used as a method for the URC server to return
>    additional URNs that it thinks may be of value to the client. This is
>    useful if the server can do link prediction. If a client can already have a
>    URC for a given URN cached then it doesn not have to do a network call for
>    that related resource.
> 
>    Example: 
> 
>    URN:IANA:626:oit.5674
>    URL:http://www.gatech.edu/iiir/urc2.paper.html
>    URN:IANA:1:ietf-uri-002
>    URL:http://cnri.reston.va.us/internet-drafts/draft-ietf-uri-urn2urc.txt
> 
>    This simply is the server's way of telling the client if the user is
>    interested in this resource that he/she may also be interested in the
>    other one. 

Hmm, having multiple URNs in a URC is certainly something that will
happen because of the query language access that is needed for the URC
service. However, having that mean an implicit "Prefetch-URC" doesn't
seem a good idea. An explicit X-prefetch field is OK. 
I would rather leave it up to the browser to do implicit prefetches of the
URC info, presumably by linearly scanning the current document the user
is reading doing the prefetches from another thread. Really looks like
a browser-side efficiency hack to me, not an implicit meaning we want
to assign to a very common search result. 

>  o Relationship operations denoted by special attribute/value pairs 
> 
>    Attribute/value pairs could be specified that allow different types of
>    precedence rules to apply in different instances. A Block: pair could
>    specify a set of values that describe a specific URL or URN without
>    interacting with the given external precedence rules. These block
>    pairs would have numbers assigned to denote block nesting.
> 
>    Example: 
> 
>    URN:IANA:626:oit.5674
>    Authoritative:URL:whois://whois.gatech.edu:7070/template=urc
>    Block:1
>    URN:IANA:626:oit.5600
>    URN:IANA:626:oit.5601
>    URN:IANA:626:oit.5602
>    Block:1
> 
>    This illustrates that the URNs in block number 1 also have the
>    given authoritative site as their authoritative URC server.

If you are going to do this, why not just use something like braces or
parenthesis, have general grouping, and do away with the precedence rules?
Block seems to combine the worst of both worlds - you don't have a
flat structure and you still have fragile precedence rules. Also,
BLOCK and ENDBLOCK will let you do away with the digits, and should
be easier to parse.

> Currently there are several
> systems that could be retrofitted to handle URCs. One of the best suited
> services is the draft whois++ [Deutsch 94]. Whois++ is an extension to the
> trivial WHOIS service which allows servers to make more structure
> information available. Additions to the trivial WHOIS protocol allow for
> communication between whois++ servers so that information can be shared
> across collections of servers.
>
> The two primary advantages to using whois++ are that the data is structured
> in the same template format as URCs and that the distributed nature allows a
> search to start local and expand globally as required. 

Whois++ probably makes a good basis for the system, but there are a few
places where (I think) retrofitting is needed. The first is in security,
the current draft I saw was not adequate in this area. Second, I was
trying to find the Internet draft on the architecture of the Whois++
indexing a few days ago, it seemed to have been withdrawn or not provided
in the first place. Without seeing that, I don't know how easy it is to
change some of the "forward knowledge" that is used for directing queries
to remote servers. For a global system, I am concerned about centroids and
would like to see something that cached "authoritative" servers for
publishers in one area, and cached casual URC info for various URNs in
another area.


> Below are several
> sessions between some client and a whois++ server in which example URCs
> are given:

Nice examples. It would be nice to see something about how a server that
can answer a query is located. There is Mitra's suggestion of adding
.uri.int to the publisher ID. I am working on a scheme more like DNS and
hope to have something to post to the URI list in a few weeks.

Good work Michael!


Ron Daniel Jr.                email: rdaniel@acl.lanl.gov
Advanced Computing Lab        voice: (505) 665-0139
MS B-287  TA-3  Bldg. 2011      fax: (505) 665-4939
Los Alamos National Lab        http://www.acl.lanl.gov/~rdaniel/Home.html
Los Alamos, NM,  87545    tautology: "Conformity is very popular"