Re: New URC Specification is ready....

Michael Mealling <ccoprmm@oit.gatech.edu> Sat, 09 July 1994 02:10 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa12205; 8 Jul 94 22:10 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa12201; 8 Jul 94 22:10 EDT
Received: from mocha.bunyip.com by CNRI.Reston.VA.US id aa00383; 8 Jul 94 22:10 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA06988 on Fri, 8 Jul 94 15:23:33 -0400
Received: from oit.gatech.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA06981 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Fri, 8 Jul 94 15:23:18 -0400
Received: by oit.gatech.edu (5.67a/OIT-4.2) id AA13130; Fri, 8 Jul 1994 15:22:47 -0400
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Michael Mealling <ccoprmm@oit.gatech.edu>
Message-Id: <199407081922.AA13130@oit.gatech.edu>
Subject: Re: New URC Specification is ready....
To: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Fri, 08 Jul 1994 15:22:46 -0400
Cc: ccoprmm@oit.gatech.edu, pays@faugeres.inria.fr, rdaniel@lanl.gov, uri@bunyip.com
In-Reply-To: <199407061731.LAA02219@idaknow.acl.lanl.gov> from "Ronald E. Daniel" at Jul 6, 94 11:31:09 am
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Content-Length: 23079

Ronald E. Daniel said this:
> General comments:
> 
> I agree with pays@faugeres.inria.fr that the whois++ stuff should not be
> in the final version of this document, it should be in a seperate document.

I agree. Next iteration will reflect that.

> > 2: Design Goals
> > ===============
> ...
> >  o Simplicity: A URC specification must be simple enough for practically
> >    anyone to understand or to encode. This allows users to encode and
> >    maintain a given URC without the need for esoteric computer science
> >    knowledge. 
> 
> I agree that this is a *very* worthwhile goal. I don't know that the
> precedence rules you define later meet this goal. Right now they are
> not too bad. However, I can certainly think of other elements that
> could be very useful and would require that proposed elements be
> changed to have precedence rules. (An example? Abstract. Currently it
> is a text field. However, since the best abstract for a picture is
> a thumbnail, the best abstract for a movie is a trailer, etc. it is
> not too hard to imagine making Abstract: into something with precedence
> so that the abstract can have a content-type, length, URLs, etc.) As
> the system is used over the next 15 years there will no doubt be lots
> of things added, and a desire to change the operation of particular
> elements. I don't think implicit precedence rules are going to age
> gracefully.

I definitly agree. The examples I give for possible URC elements are
not complete by any stretch of the imagination. It's the concept I want
to put forward. I want to see the URI-WG take these and give each one
the same attention that is given to URNs and URLs. For example, there are
these things called SOAPS that are methods for encoding how much
trusted people on the net 'trust or rate' a given resource. It's a fairly
complicated thing. I would expect the Abstract element to possibly get
that complex. For example, why not have schemes for Abstracts. Like this:

Title:Terminator II
Abstract:MIME:
 Content Type:video/mpeg
 Content Length: 2435934
 
 039870439r85j7w3094587rj0w39487r5jower9ifpoie...bla..bla..radix64.encoded
 terminator.II.trailer.video....
URL:bla

> >  o Compatibility: Since URCs will be utilized by vastly different systems
> >    on vastly different networks it must be encoded in such a way as to
> >    allow very complex systems to communication complex information
> >    via very simple gateways and access methods. 
> 
> Perhaps a mention of base-64 encoding, quoting rules, etc. is necessary?
> Yuck. See what comes out of the URL discussion and see if it can be
> adopted.

Definitly. I wanted to stay away from issues like that and character sets.

> >  o Use of existing and developing technology: In order to be able to
> >    implement something soon, an encoding specification should allow
> >    existing systems to be easily retrofitted to use URCs. The use of
> >    existing systems that already support object similar to URCs is
> >    encouraged. 
> 
> A nice goal. I certainly don't think we should have gratuitous differences
> with existing and developing systems (such as whois++). However, I think
> there are fundamental requirements of the system that must be addressed, and
> that whois++ does not currently handle.  (Here we go again :-) 
> The fundamental requirement that does not seem to have received adequate
> consideration is, in a word, security. There has been no consideration of
> how to prevent me from issuing URNs that apparently originate from any
> publisher I choose. There has been no discussion of how to prevent me from
> forging authorship of resources. There has been only the most trivial
> discussion of ensuring the integrity of the resources I retrieve. An MD5
> field in the URC is nice, but if I can get into the URC info I can provide
> an MD5 that is accurate for what you get when you access my URL. Too bad that
> what is hiding at my URL is not what the original author intended.
> 
> I *strongly* believe we need to give security serious consideration now,
> rather than try to hack it in later.

I agree. Work is being done on security issues in whois++. whois++ is still
evolving WRT certain issues like security and such.

> > It then becomes a simple
> > exercise of selecting the equality character and specifying some method of
> > encoding special situations such as character quoting and line continuation.
> 
> Nicely put. As I mentioned earlier, some discussion of character quoting
> would not be amiss. Perhaps just a statement that you are monitoring the
> character handling discussions in other working groups and will adopt one
> of the resulting approaches?

Nice way to put that. I will add that virutally verbatim if you don't mind. ;-)

> > Experimental attribute_names should be encoded with the
> > [X-attribute_name] notation.
> 
> This reminded me of a purported bug in Mosiac - case sensitivity where
> X-foo is different than x-foo. Should we specify right now that
> attribute names are case insensitive? What does MIME say?

Good point. I'll add that since I think x-foo and X-foo should be equal. I
will check MIME and see what they say first. If there is a difference I'll
follow the MIME standard.

> > There are no attribute/value pairs that are required to be a part of a URC. 
> 
> Not even a URN? Seems mighty handy to require this so that when people do
> searches on Author, URL, ... they end up with something they can do more
> with.

Yep. Not even a URN. Not every net resource will have a URN. The current 
wether map for example, the resource may have a URN but each map may not.
This would be an example of a URC that has no URN. What about a resource
that is no longer on the net? It may have a URN but no URL. What if it
doesn't have a URN AND isn't on the net: no URN and no URL.

It's possible but may not be useful. This statement allows you to let things
grow where you never though possible.


> > It is intended that any additions or subtractions from this list will be
> > handled by the Uniform Resource Identifier Working Group. It is also
> > intended that this list should be extended since the full usefulness of
> > URCs is beyond the scope of these pairs listed. 
> 
> We should also start a registry for the current types so that people
> don't have to wade through the archives of the mailing list to find out
> what is the current set of well-known attributes.

True. I'll let this grow out of the Working Group. I don't have any
experience in handling Type Registries. Co-authors! I'm looking for 
Co-Authors!!!

> >  o URL:
> >    This pair must conform to the current Uniform Resource Locator
> >    specification as defined in the URL Internet-Draft[Berners-Lee 94-1].
> >
> >    Example:
> > 
> >    URL:http://www.gatech.edu/ietf/urc.encoding.html
> 
> What? No angle brackets? Aren't these responses text/plain?   :-) 
> More seriously, should we get a MIME Content-type for the URC
> responses? Might make dealing with them easier.

Yes. Once the Working Group adopts it and it goes to RFC status. Until then
we stick with X-URC.

> >    Since many cultures have different ways of writing names
> >    there are no requirements on how a name should be written. Thus it is
> >    encouraged that users encode names in the most common format i.e.
> >    first, middle and last in English societies.
> 
> Nicely put. 


Thank you. 

> >  o TTL:
> >    This pair encodes a Time To Live measured in seconds. Infinity is
> >    denoted by the '+' character. This element references the attribute/value 
> >    pair directly preceding it (see section 4) and is meant as a caching aid.
> > 
> >    Example:
> > 
> >    TTL:86400
> 
> This works well for the resources identified by a URN or URL. However, the
> URC information for a resource is something that will also be cached in
> order to avoid unnecessary expense in the URN->URL resolution. We need a
> means of specifying the TTL for URC info, which will not be the same as the
> time for the resource itself. Perhaps the TTL associated with
> the URN tells the time to live for the URC info, while TTLs associated
> with URLs tell the time to cache particular resources. e.g.:
> 
>   URN:IANA:foo:bar:123434523
>   TTL:36000                       // Cache URC info for 10 hours
>   URL:http://www.bar.foo/huh.html
>   TTL:+                           // Cache the html until LRU rules kick it out.
> 
> BTW - what are the units in the TTL field? Seconds?  Microseconds?

Hmmm...How about putting the TTL first instead of after the URN, like this:

TTL:36000
URN:IANAL:foo:bar:1928340978
TTL: +
URL:bla
TTL: + (good point)

Since nothing precedes it, it will characterize the whole thing. I like this.

> >  o Abstract:
> >    This pair encodes a short abstract about the given resource. Any
> >    characters are allowed. Line continuation follows normal rules.
> > 
> >    Example:
> > 
> >    Abstract: 
> >            This document explores the various flight patterns and speeds of
> >     unladden African and European swallows. A companion document concerning
> >     the relative velocities of swallows ladden with coconuts is available.
> 
> Pretty soon we could imagine using thumbnails as abstracts of images,
> trailers as abstracts for movies, etc. How about we change Abstract:
> to be the URN for an abstracted version of the resource, and Text-Abstract
> to be an ASCII description of the resource. (Of course, this doesn't
> solve the problem of knowing what language the text description was
> written in).

See my example above. I'm not exactly sure this is the right way to do it but
it seems to follow the standard method of doing things with URIs which is:

id:scheme:data

> >  o Version header field:
> >    In order to give some ability to utilize different version schemes it is
> >    recommended that the Version field be given the idea of schemas so
> >    that machine based algorithms can be used to differentiate resources.
> >    For this specification only one schema is given but more can be
> >    developed.
> > 
> >     o Schema 1: decimal This schema specifies the use of the
> >       standard decimal type of version enumeration. For example,
> >       this is version 1.0 of this document. At the authors or publishers
> >       whim it can change to version 1.1 or even 2.0.
> > 
> >       Example:
> > 
> >       Version:decimal:1.0
> 
> Really good plan to specify the scheme. The whole issue of versioning could
> stand a good deal of serious thought. We may want Supercedes: and
> Superseded-By: fields to hold URNs, but that is a back-burner thing that
> should go through an X- phase first.

Good point. I like Supercedes. I know of a lot of people who could use that 
now.


> > ... there must be some structure to a URC. The easiest and most elegant
>> method is simply to introduce a set of precedence rules onto the above set of
>> attribute/value pairs. 
> 
> Well, this is a point where reasonable people may disagree. As I mentioned
> earlier, I am concerned about how precedence rules might have to change
> in the future. Explicit delimiters do not have that problem. Some
> people are concerned about complexity with explicit delimiters, that
> has not been my (admittedly limited) experience.

I agree with you in part. The precedence rules are simple and should 
stay that way. That doesn't keep us from developing another method of
encoding certain types of meta-info in another way that can also become
another attribute/value pair. For example, some want to use SGML as
a encoding method. URCs could accomodate this by simply allowing for
an attribute/value pair for it. For example,

URN:bla
Meta:V1.5:MIME
 <Author>Herman Mellvile</Author>
 <Title>Moby Dick</Title>
URL:bla
Content_Type:text/html

This solves two VERY important things:

1. Allows us to get a simple, extremely implementable system that we can
   completely roll out in less than a year.

2. Gives us an extremely upgradeable system that we can use as a bedrock 
   for building other more complicated systems on without breaking 
   the underlying systems.

A complete description language/grammer that will allow us all the complexity
that we ALL want will not come into existence anytime soon.  A simple,
extensible URC can atleast give us something to build on in the future.



> >  o LIFNs are equal to URNs in precedence only.
> > 
> >    An LIFN has many of the same characteristics of a URN. While there
> >    is no current specification of exactly what a LIFN is or does this paper
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >    will attempt to place them in the structure of a URC. This is definitely
> >    up for discussion.
> 
> Well, I'm just an Okie, but if we don't know what a LIFN is or what it
> is supposed to do, it seems kind of strange to be specifying a standard
> field that clients have to interpret consistently. How about we let
> people play around with X-LIFN, and assume that it has no precedence
> rule?

Sounds good to me. At the last IETF there was a lot of noise about
URNS not solving the problems that folx wanted. They specifically wanted
LIFNs so I was trying to provide a framework that they could work
within. I can leave it with X-LIFN. I still want to see their work 
continue, though.

> >  o TTLs have no precedence over any other attribute/value pair
> >    and therefore describe any element directly before it.
> 
> I am not comfortable with this for two reasons. The first is the need to
> be able to put a TTL on the URC info. The second is that I think you are
> being too restrictive with the placement of the TTL field. It, along with
> things like Content-type and Content-length, should talk about the
> *section* they are currently in. If we can elaborate on the notion of
> sections we may be able to come up with a scheme that will generalize
> over many extensions to the set of known elements. For example,
> divide the fields into "Section-starting fields" and filler fields.
> Section starting fields, in order of priority, are URN and URL. Sections
> can nest, but cannot partially overlap. Therefore, the second URN: field
> in the example below closes both the first URN section and the URL subsection.
>   URN:bla
>   Author:foo
>   URL:bar
>   Content-type:text/html
>   URL:baz
>   Content-type:text/plain
>   URN:bletch

I think we are talking about the same thing here. I agree that TTL is
a bit hairy and needs some work. See my discussion on Block below.
I think there needs to be a sence of TTL through out the entire URC but
I do think certain fields may have their own TTL encoding.

> >    Example: 
> > 
> >    URN:IANA:626:oit.5674
> >    TTL: +
> > 
> >    In this example the first TTL is not needed since a URN has an infinite
> >    time to live. This one is simply used as an illustration.
> 
> I would recommend you take out the TTL: + field in the example above. The
> later TTL field showed a legitimate use of it, this is not a good use
> unless such a positioning is adopted as the TTL for the URC info.

Point taken. See my earlier comments on the 'whole URC TTL' above.

> >  o Collection:
> 
> I suppose we can go around and around on this again, but probably
> the best thing to do is use X-Collection for awhile, as well as
> X-References and X-Related and see which one(s) receive enough
> use to promote to full status.

This whole section isn't ment to be a part of the total specification. 
I agree it should be X-Collection to begin with. If its even adopted.
Its just there for discussion.

> >  o Authoritative:
> > 
> >    This pair will give the location of the authoritative URC server for the
> >    given URN. This will serve as a pointer of last resort for the URC of
> >    the given URN. This would require some method of being able to
> >    identify a given URC database server.
> 
> I don't see this as being necessary or desirable. First of all, to
> ever see it you have to have contacted *some* sort of URC service. My worry
> is that people will use this as a quick hack rather than a last resort
> and won't do a proper job on the distributed aspects of the URC
> service.

Again, discussion only. It can go away and noone would notice....


> > Possible future precedence rules:
> > 
> >  o Multiple URNs in the same URC denote simple relationship
> > 
> >    This is simply used as a method for the URC server to return
> >    additional URNs that it thinks may be of value to the client. This is
> >    useful if the server can do link prediction. If a client can already have a
> >    URC for a given URN cached then it doesn not have to do a network call for
> >    that related resource.
> > 
> >    Example: 
> > 
> >    URN:IANA:626:oit.5674
> >    URL:http://www.gatech.edu/iiir/urc2.paper.html
> >    URN:IANA:1:ietf-uri-002
> >    URL:http://cnri.reston.va.us/internet-drafts/draft-ietf-uri-urn2urc.txt
> > 
> >    This simply is the server's way of telling the client if the user is
> >    interested in this resource that he/she may also be interested in the
> >    other one. 
> 
> Hmm, having multiple URNs in a URC is certainly something that will
> happen because of the query language access that is needed for the URC
> service. However, having that mean an implicit "Prefetch-URC" doesn't
> seem a good idea. An explicit X-prefetch field is OK. 

This was only a suggestion for possible meaning for multiple URNs. You
can take it or leave it.

> I would rather leave it up to the browser to do implicit prefetches of the
> URC info, presumably by linearly scanning the current document the user
> is reading doing the prefetches from another thread. Really looks like
> a browser-side efficiency hack to me, not an implicit meaning we want
> to assign to a very common search result. 

True. Some have complained about multiple URNs. I simply pointed out that
we were just not agreeing on what our file delimiter was. I say you
can have multiple URNs per URC. They dont'. I divide my URC at the end of
file maker. They can devide their URCs and the end of URN marker.
"You say potato. I say potato" (kind of looses in the translation ;-)
> 
> >  o Relationship operations denoted by special attribute/value pairs 
> > 
> >    Attribute/value pairs could be specified that allow different types of
> >    precedence rules to apply in different instances. A Block: pair could
> >    specify a set of values that describe a specific URL or URN without
> >    interacting with the given external precedence rules. These block
> >    pairs would have numbers assigned to denote block nesting.
> > 
> >    Example: 
> > 
> >    URN:IANA:626:oit.5674
> >    Authoritative:URL:whois://whois.gatech.edu:7070/template=urc
> >    Block:1
> >    URN:IANA:626:oit.5600
> >    URN:IANA:626:oit.5601
> >    URN:IANA:626:oit.5602
> >    Block:1
> > 
> >    This illustrates that the URNs in block number 1 also have the
> >    given authoritative site as their authoritative URC server.
> 
> If you are going to do this, why not just use something like braces or
> parenthesis, have general grouping, and do away with the precedence rules?
> Block seems to combine the worst of both worlds - you don't have a
> flat structure and you still have fragile precedence rules. Also,
> BLOCK and ENDBLOCK will let you do away with the digits, and should
> be easier to parse.

hmmm......

As long as it doesn't make it a) any harder to parse with some standard
RFC822 derived header parsing routins and b) it doesn't make it any
harder for a secretary to enter data into then I don't really care one
way or another. For most things I would like to see issues like this
solved with simple precedence (or section as you called it) rules than
icky things like Block: #. I thew that it there at the last minute and 
wasn't all that convinced whether I liked it or not. This could need
some more discussion.

> > Currently there are several
> > systems that could be retrofitted to handle URCs. One of the best suited
> > services is the draft whois++ [Deutsch 94]. Whois++ is an extension to the
> > trivial WHOIS service which allows servers to make more structure
> > information available. Additions to the trivial WHOIS protocol allow for
> > communication between whois++ servers so that information can be shared
> > across collections of servers.
> >
> > The two primary advantages to using whois++ are that the data is structured
> > in the same template format as URCs and that the distributed nature allows a
> > search to start local and expand globally as required. 
> 
> Whois++ probably makes a good basis for the system, but there are a few
> places where (I think) retrofitting is needed. The first is in security,
> the current draft I saw was not adequate in this area. Second, I was
> trying to find the Internet draft on the architecture of the Whois++
> indexing a few days ago, it seemed to have been withdrawn or not provided
> in the first place. Without seeing that, I don't know how easy it is to
> change some of the "forward knowledge" that is used for directing queries
> to remote servers. For a global system, I am concerned about centroids and
> would like to see something that cached "authoritative" servers for
> publishers in one area, and cached casual URC info for various URNs in
> another area.

The security stuff was in one of the earlier drafts that have since expired.
It was taken out due to the decision to break the whois++ specification out
into several different drafts. Currently there are a few small changes to 
be made to the current base architecture after which work will continue on
security, record maintenance, etc. 

Some of us are also working on ways if identifying and caching 'authoritative'
PubId information. What we may end up with are several "Well Known" whois++
template sets. One for PubIds so that a client can find the 'authoritative'
server for a given URN. One for general, distributed URC, info. One for
Publisher type info, etc.  

> > Below are several
> > sessions between some client and a whois++ server in which example URCs
> > are given:
> 
> Nice examples. It would be nice to see something about how a server that
> can answer a query is located. There is Mitra's suggestion of adding
> .uri.int to the publisher ID. I am working on a scheme more like DNS and
> hope to have something to post to the URI list in a few weeks.

A few of us are working on a method that uses a different special template for 
Publisher Ids that are taken from the URN. This template would allow
a local server to find out where the starting point for a given 
ID-space is.  That last sentence is as far as we have gotten so I dont'
know if we'll have anything ready by Toronto.....



> Good work Michael!

Thanks. You did some good work in that response. Sorry I took so long
in responding.....

-MM



-- 
------------------------------------------------------------------------------
<HR><A HREF="http://www.gatech.edu/michael.html">
<ADDRESS>Michael Mealling</ADDRESS>
<ADDRESS>michael.mealling@oit.gatech.edu</ADDRESS></A>