Re: URCs and URL resolution

Paul Francis <francis@cactus.slab.ntt.jp> Thu, 29 September 1994 05:40 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa21914; 29 Sep 94 1:40 EDT
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa21910; 29 Sep 94 1:40 EDT
Received: from mocha.Bunyip.Com by CNRI.Reston.VA.US id aa26134; 29 Sep 94 1:40 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA28374 on Thu, 29 Sep 94 00:10:57 -0400
Received: from mail.ntt.jp by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA28370 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Thu, 29 Sep 94 00:10:52 -0400
Received: by mail.core.ntt.jp (8.6.9/COREMAIL.4); Thu, 29 Sep 1994 13:10:16 +0900
Received: by slab.ntt.jp (8.6.9/core-slab.s5+) id NAA12388; Thu, 29 Sep 1994 13:10:15 +0900
Received: by cactus.slab.ntt.jp (4.1/core*slab.s5) id AA03599; Thu, 29 Sep 94 13:10:03 JST
Date: Thu, 29 Sep 1994 13:10:03 -0000
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Paul Francis <francis@cactus.slab.ntt.jp>
Message-Id: <9409290410.AA03599@cactus.slab.ntt.jp>
To: Michael.Mealling@oit.gatech.edu, wade@cs.utk.edu
Subject: Re: URCs and URL resolution
Cc: moore@cs.utk.edu, sgreen@cs.utk.edu, uri@bunyip.com

>  > 
>  > The other--for URL resolution would use a very lightweight 
>  > RPC or RPC-like mechanism. 
>  
>  I've been toying with a lightweight version of whois++ that is basically
>  a stripped version of the protocol that runs under udp. It has no support
>  for the HOLD constraint and doesn't support centroids or fancy return
>  types. So basically your UDP packet would contain:
>  
>  template=URC;URN=bla:match=exact
>  

I see there is a lot of talk on this list about using whois++ as
the service for both "URC" and "URN" based searches.  (To be clear,
what I mean by a URN-based search is that the query contains the
URN, and the answer contains the URC/URL.  By URC-based search is
the more general case where the query contains some descriptive
information such as keywords, and the answer contains the URC/URN.)

I want to insert a word of caution concerning easy assumptions
about what whois++ can do and when it will be able to do it.

First, concerning the "easy" problem---URN-based searches (well,
easy compared to URC-based searches).  As near as I can tell from
my readings of the whois++ doc draft-ietf-wnils-whois-03.txt,
whois++ forward knowledge is currently unable to encode the
information needed to process the URN-based search given in
the example of Micheal's paper draft-ietf-mealling-urc-spec-00.txt:

	% 220 Enter search string or type 'help' for help.
	template=urn;URN=IANA:623:oit:cs:ftp-and-telnet

This query contains URN=<string>.  As near as I can tell, whois++
forward knowledge works on a per-attribute basis, not a
part-of-an-attribute basis.  However, for the search of the
above example to scale, the (hierarchical) URN attribute must
itself be parsed (i.e., the IANA server sends the query to
the 623 server, that server sends it to the oit server, etc.).

One fix to this, I suppose, is to have multiple attributes
URNL0, URNL1, URNL2, ..., where:

	URNL0 = IANA
	URNL1 = 623
	URNL2 = oit
	etc.

I don't know the pros and cons of this approach versus some other
approach (offhand it looks ugly), but the fact that I can't find
any specific information about how whois++ would work as a
hierarchical URN lookup service makes me think that whois++ has
a long way to go.  

Of course, strictly speaking, the above "fix" requires no "changes"
to whois++ at all---it just requires that the right attributes be
defined.  But defining the right attributes is the hard part of the
problem, since that is where the real functionality comes from.  So,
the simple fact that whois++ is capable of encoding the right
information (because it is a general type-value kind of system)
doesn't mean the problem is solved or even easily solved.



The use of whois++ for a more general URC-based lookup is much
more vague.  I have yet to see a cogent argument (or much of any
argument) as to how centroids is expected to scale in terms of
BOTH the number of queries AND the amount of memory needed.  
If the hierarchy is deep, then the amoung of information in
the index servers at the upper parts of the hierarchy is
resonable---basically the size of the vocabulary times a 
relatively small factor (the number of index servers at
the next level down).    But, the number of resources that
each index server represents is enormous, meaning
that almost all queries would go to almost all index servers, so
bad scaling by number of queries.

If, on the other hand, the hierarchy is shallow, then the
amount of information represented by each index server can be
made smaller, so in theory the queries can be spread to only
a few of them, but since the amount of memory needed by the servers
at the top scales according to the number of index servers at
the next level down, the amount of memory (and updating of said
memory) at the top must be large.

In addition, the performance of centroids (for instance, in terms
of percentage of queries that go to machines that don't have
a resource that satisfied the query) is highly dependent on what
information goes where.  It may be possible to fine tune the 
hierarchy to a high degree, and it may even be possible to find
an acceptable middle ground between number of queries and amount
of memory.  But, I haven't seen any good description of how this
might be done, or even of what the characteristics of centroids
is in general.  (I've seen "anecdotal" examples, but not a general
description of characteristics.)

Well, this message seems to be more about whois++ than UR*, but
I guess most of the whois++ readership is on this list too,
so I won't cross post it.

PF