Comments on the centroids paper

Rickard Schoultz <schoultz@admin.kth.se> Mon, 30 August 1993 11:34 UTC

Message-Id: <9308301114.AA23958@othello.admin.kth.se>
To: ietf-wnils@aggie.ucdavis.edu
Subject: Comments on the centroids paper
Date: Mon, 30 Aug 1993 13:14:23 +0200
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Rickard Schoultz <schoultz@admin.kth.se>

Folks,
 Breaking the Silence Of The List, I have some proposals, comments and
questions on the Centroids paper that I wish to bring up for discussion.
Comments are very welcome...

Consider this tree of whois++ servers:
                         ___ 
                        |   |
                        | A |
                        |___|
                        /   \
                    ___/     \___ 
                   |   |     |   |
                   | B |     | C |
                   |___|     |___|
                   /   \         \
               ___/     \___      \___ 
              |   |     |   |     |   |
              | D |     | E |     | F |
              |___|     |___|     |___|

(A polls centroids from B and C. B polls from D and E. C polls from F.)

1. Handle uniqueness:
 What happens if server C starts polling E?  A query posed to server
A will then get two identic matches from B and C. 
 One way to solve this is to let server A remove duplicate answers, but
that means that the record handles presented by B and C must be identic.
Bringing this up with Chris, we found that we can do this by having the
handle composed of the server handle and the record handle (eg.
"serverhandle/recordhandle").
 But should the origin server handle really be easily deduceable from
that handle by a server several "hops" away? This behaviour will have
two sides. On one hand, a remote server can use that extracted server
handle to pose subsequent related queries to that server directly for
faster lookup. On the other hand, this means that this direct access
will bypass some caching server, and might thus add to the total network
load and decrease total lookup speed.

2. Strong namespaces:
 Suppose all index servers in the layout above have polled the servers
below for "description" fields in "services" templates. 
 If there is a query posed to server A that asks for "mail" with no
field specification, should the index server then change the query to
"description=mail"? 
 If server A passes the query unmodified, it might match a *lot* more
things than intended, thus making the namespace that A was supposed to
hold garbled. Maybe this should be mentioned or even restricted in some
document.

3. Trace information for servers:
 The administrator of server D finds whois++ networking really cool and
decides to start polling server A to get fast lookups for some data.
Server A is an open server that doesn't put any restrictions on who is
to be allowed to poll it.
 This will affect FORWARDED-QUERY, SERVERS-TO-ASK and DATA-CHANGED which
might all get caught in a loop if there is no trace information. I
suggest we put that information inside these command records with a
common field name like: 

FORWARDED-QUERY ...
  Path: handle1,handle2,handle3
...
END FORWARDED-QUERY

 There could be some more information in the query path like time
stamps, but I don't know if it is a good idea - at least not for the
moment - to put in needs for more computation in this time critical
command.
 
4. TTL for responses:
 A response to a forwarded-query is typically (always?) sent to another
server, which in turn could cache this response for some time. Wouldn't
it be appropriate with a TTL together with the response from the server
that gave the originating response?  This imposes some encapsulation on
the response (FORWARDED-RESPONSE?), but I don't have a good idea how
this nested record could be composed in a nice way. Any taker?

5. Search scope:
 I think the user has to be able to specify the scope of the search, if
it should be local-only or not, and maybe even a maximum hop-count (of
which I can't find any use). This would mean the creation of the
constraint SCOPE={local|normal|from-top} or something similar.

6. What is a namespace?:
 I am a bit confused about what I'm asking here, so please bear with me.
Whois++ gives us the ability to have more than one carved-in-stone
namespace.  But on what level are those namespaces partitioned? As I see
it, it may be on I) templates, II) Strict from top to down server trees,
III) matches to specific fields.  Now if we pose a query to a random
place in a tree and we didn't get the matches we wanted, we might want
to expand our search to servers "above" our query point.  The queriued
server knows what servers are above from the POLLED-BY information. But
how do we know that we are in the same namespace when querying those
servers?  Should there be a way to identify namespaces? (How would one
set up multiple namespaces on one server?)

7. Minor detail - port and handle in SERVERS-TO-ASK
 The SERVERS-TO-ASK command is to return host name or IP address of
another whois++ server to query. There needs to be a port number for
this too, since not all whois++ server will be on one port.  There is
also a need for a server handle for each of the servers to ask if the
resource has moved after the index server polled the resource.

-Rickard

--
Rickard Schoultz			schoultz@admin.kth.se
KTH/SUNET				+46-8-790 90 88   (voice)
S-100 44 Stockholm (SWEDEN)	    	+46-8-10 25 10    (fax)

Comments on the centroids paper Rickard Schoultz
Re: Comments on the centroids paper Chris Weider
Re: Comments on the centroids paper Simon E Spero
Re: Comments on the centroids paper Chris Weider