Re: Comments on the centroids paper

Chris Weider <> Mon, 13 September 1993 20:14 UTC

Received: from by IETF.CNRI.Reston.VA.US id aa06617; 13 Sep 93 16:14 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa06611; 13 Sep 93 16:14 EDT
Received: from by CNRI.Reston.VA.US id aa23312; 13 Sep 93 16:14 EDT
Received: by (4.1/UCD2.05) id AA21668; Mon, 13 Sep 93 12:43:57 PDT
Received: from by (4.1/UCD2.05) id AA20227; Mon, 13 Sep 93 12:18:52 PDT
Received: from by (5.61/UCD2.05) id AA24883; Mon, 13 Sep 93 12:15:01 -0700
Return-Path: <>
Received: by (5.65/1123-1.0) id AA17698; Mon, 13 Sep 93 15:13:59 -0400
Date: Mon, 13 Sep 1993 15:13:59 -0400
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Chris Weider <>
Message-Id: <>
Subject: Re: Comments on the centroids paper

Hi Rickard! How's life at KTH :^)
I finally got a bit of time to get caught up on e-mail....

> Breaking the Silence Of The List, I have some proposals, comments and
>questions on the Centroids paper that I wish to bring up for discussion.
>Comments are very welcome...
>Consider this tree of whois++ servers:
>                         ___ 
>                        |   |
>                        | A |
>                        |___|
>                        /   \
>                    ___/     \___ 
>                   |   |     |   |
>                   | B |     | C |
>                   |___|     |___|
>                   /   \         \
>               ___/     \___      \___ 
>              |   |     |   |     |   |
>              | D |     | E |     | F |
>              |___|     |___|     |___|
>(A polls centroids from B and C. B polls from D and E. C polls from F.)

One general comment I wanted to make... I'm not really sure how much caching
is going to buy us once WHOIS++ has several hundred gigabytes of information
in it. My intuition says (and here I freely admit that without evidence this
is not very persuasive) that the queries will be so varied as to make caching 
useless. But we shall see.

>1. Handle uniqueness:
> What happens if server C starts polling E?  A query posed to server
>A will then get two identic matches from B and C. 
> One way to solve this is to let server A remove duplicate answers, but
>that means that the record handles presented by B and C must be identic.
>Bringing this up with Chris, we found that we can do this by having the
>handle composed of the server handle and the record handle (eg.
> But should the origin server handle really be easily deduceable from
>that handle by a server several "hops" away? This behaviour will have
>two sides. On one hand, a remote server can use that extracted server
>handle to pose subsequent related queries to that server directly for
>faster lookup. On the other hand, this means that this direct access
>will bypass some caching server, and might thus add to the total network
>load and decrease total lookup speed.

Handle uniqueness buys us so much (the handle can thus start acting something
like a URN) that I think we need to take the risk of bypassing a relevant
cache sometimes. Also, although the handle is unique and happens to be composed
of two semantically useful parts, that doesn't mean that a server has to make
use of that knowledge... 

>2. Strong namespaces:
> Suppose all index servers in the layout above have polled the servers
>below for "description" fields in "services" templates. 
> If there is a query posed to server A that asks for "mail" with no
>field specification, should the index server then change the query to
> If server A passes the query unmodified, it might match a *lot* more
>things than intended, thus making the namespace that A was supposed to
>hold garbled. Maybe this should be mentioned or even restricted in some

This is an excellent point. We should discuss this in a follow-up document.

>3. Trace information for servers:
> The administrator of server D finds whois++ networking really cool and
>decides to start polling server A to get fast lookups for some data.
>Server A is an open server that doesn't put any restrictions on who is
>to be allowed to poll it.
>might all get caught in a loop if there is no trace information. I
>suggest we put that information inside these command records with a
>common field name like: 
>  Path: handle1,handle2,handle3
> There could be some more information in the query path like time
>stamps, but I don't know if it is a good idea - at least not for the
>moment - to put in needs for more computation in this time critical

Tracing and loop control is still an open problem. In the past, 'routing 
protocols' have gotten around this by forcing a hierarchical tree structure
on the underlying mesh. (Like the spanning trees in early routing protocols).
I'd like to find a general solution that allows us to use the full power of the
mesh, but I haven't had any time to work on it. Suggestions are encouraged :^)

>4. TTL for responses:
> A response to a forwarded-query is typically (always?) sent to another
>server, which in turn could cache this response for some time. Wouldn't
>it be appropriate with a TTL together with the response from the server
>that gave the originating response?  This imposes some encapsulation on
>the response (FORWARDED-RESPONSE?), but I don't have a good idea how
>this nested record could be composed in a nice way. Any taker?

The TTL could be set by the recieving server, independent of the sending server.
I'm not sure this should be part of the protocol. 

>5. Search scope:
> I think the user has to be able to specify the scope of the search, if
>it should be local-only or not, and maybe even a maximum hop-count (of
>which I can't find any use). This would mean the creation of the
>constraint SCOPE={local|normal|from-top} or something similar.

Right now the scope is constrained by the location in the mesh where one 
originally issues the query. This problem nests with the loop control 
problem, so that if we get a good solution for one, we should be able to solve 
the other.....

>6. What is a namespace?:
> I am a bit confused about what I'm asking here, so please bear with me.
>Whois++ gives us the ability to have more than one carved-in-stone
>namespace.  But on what level are those namespaces partitioned? As I see
>it, it may be on I) templates, II) Strict from top to down server trees,
>III) matches to specific fields.  Now if we pose a query to a random
>place in a tree and we didn't get the matches we wanted, we might want
>to expand our search to servers "above" our query point.  The queriued
>server knows what servers are above from the POLLED-BY information. But
>how do we know that we are in the same namespace when querying those
>servers?  Should there be a way to identify namespaces? (How would one
>set up multiple namespaces on one server?)

I would handle this by allowing the user to look at the 'polled for' fields.
Let's say that A polls B for the 'profession' attribute. B should be able to
cache that query in a POLLED-FOR database, and could give it to the user
when she wishes to expand her search. Alternatively, the server abstract could
be handed down to the pollee. That would require some more negotiation in the
POLL protocol, but could be easily extended.

>7. Minor detail - port and handle in SERVERS-TO-ASK
> The SERVERS-TO-ASK command is to return host name or IP address of
>another whois++ server to query. There needs to be a port number for
>this too, since not all whois++ server will be on one port.  There is
>also a need for a server handle for each of the servers to ask if the
>resource has moved after the index server polled the resource.

Agreed. We'll add it.

>Rickard Schoultz
>KTH/SUNET				+46-8-790 90 88   (voice)
>S-100 44 Stockholm (SWEDEN)	    	+46-8-10 25 10    (fax)

Rickard, will you be in Houston? I'll be in early... We have enough 
implementation experience (I believe) to make a meeting on holes and future
directions useful, which could be started in WNILS and continued in bar BOFS
for the rest of the week. In particular, this index service will need a lot
of research. I see this as the most exciting part of the protocol, and hope
to make it useful.