LDAP shall speak unto LDAP + first cut at best-first searching
Simon E Spero <ses@tipper.oit.unc.edu> Sat, 27 November 1993 23:24 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa03270;
27 Nov 93 18:24 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa03266;
27 Nov 93 18:24 EST
Received: from haig.cs.ucl.ac.uk by CNRI.Reston.VA.US id aa11799;
27 Nov 93 18:24 EST
Received: from bells.cs.ucl.ac.uk by haig.cs.ucl.ac.uk with local SMTP
id <g.04372-0@haig.cs.ucl.ac.uk>; Sat, 27 Nov 1993 22:59:10 +0000
Received: from tipper.oit.unc.edu by bells.cs.ucl.ac.uk with Internet SMTP
id <g.14327-0@bells.cs.ucl.ac.uk>; Sat, 27 Nov 1993 22:59:03 +0000
Received: from localhost.oit.unc.edu by tipper.oit.unc.edu (SMI4.1/FvK 1.02)
id AA01189; Sat, 27 Nov 93 17:58:59 EST
Message-Id: <9311272258.AA01189@tipper.oit.unc.edu>
To: ldap@umich.edu, osi-ds@cs.ucl.ac.uk
Subject: LDAP shall speak unto LDAP + first cut at best-first searching
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Sat, 27 Nov 93 17:58:59 -0500
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Simon E Spero <ses@tipper.oit.unc.edu>
After some rather severe patching and tweaking on both sides, the umich server and my client are now speak yer actual ldap to each other. Thanks to Tim for fixing things so quickly. Now that I have the protocol layer I've been working on a general search harness I can plug various strategies into. As I've been filling in the blanks, I've come upon what seems to be a nice search strategy based on best first search for solving the soft pages problem (name->url mapping). Given a few assumptions it should return near-optimal results. I'm still a bit fuzzy as to how long the search will take, especially when the object being searched for does not exist. There may a slight problem in the handling of servers which poll themselves which I can't quite spot at the moment. I need more sleep. Simon First cut at best-first searching. Definitions: Softpages problem The task of locating the most appropriate copy of an object (images, software, etc). Centroid A centroid is an object that represents a collection of other objects. In this case we shall use the term to describe a record containing a list of the attribute-value pairs present on a directory server. Topological hierachy In the modified [C]LDAP a server can be a member of serveral different hiearchies. The topological hierachy is contructed by placing links between servers which are at opposite ends of networked connections. This may or may not be precisely the same as the underlying physical network, but is intenged to be a fairly close approximation. Sites with multiple connections may have multiple parents. In the topological hiearchy, parent-child relation- ships are determined by which server polls the other. Since connectivity is symetric, two servers may poll each other without Oedipal complexities. Cost(h1,h2,k) The time taken to initiate a transfer and recieve 'k' bytes from 'h1' to 'h2' --------------- Alogrithm: part 1 - centroid weightings. Each term in a centroid is given a weight. For the topological hiearchy, we shall define this weight to be a measure of the Cost of accessing items referenced by that term from this server. If there are several different costs associated with a term, the weighting used should be the lowest value. The pollee should also identify the topological parents which it does not poll [This isn't quite right - ses] When a poller retrieves a topological centroid and adds it to its database of forwarding information it should increment the weights by the Cost of accessing the polled server. Intuitively this value is valid because we are measuring the cost of accessing the document via the network link corresponding to this topological link. The polled server should maintain an upwards link weighted by the cost of accessing the poller. part 2 - search strategy We shall assume that the server returns a list of either records or referrals, each labelled with its cost. The client should increment the costs returned by the cost of accessing the server. Client-Side ----------- let Q be a queue of records and servers, ordered in ascending order of cost. Initialise R, the set of already queried servers, to NIL Initialise Q to the default server While Q is not empty let item = head(Q) if item is a record, return the record else remove item from Q add item to R add the results of querying 'item' to Q, removing all items already in R. fi endwhile Return not-found -------- server side. Return all matches to the query, together with the server's topological parents. ----------- Although the algorithm is fairly simple to understand when a match exists, the case where the resource does not exist is slightly more complicated. In this case, the queue will only contain servers topological parents. It might seem that this would lead to an explosive fan-out; however, centroids come to the rescue here, restricting the search to only upwards links. These will eventually converge upon a backbone, and die off pretty rapidly (we can consider a backbone to poll all its links - much of the internal topology could be collapsed into a large cloud). At least, that's what I think will happen - I'll try and borrow a graph theorist on Monday and use a real brain. If my analysis is wrong (and I think I might be handwaving over something important), the algorithm can be saved by adding a time/number of queries limit, and then failing over to a simple direct lookup, which may not yield the best result, but will at least give a result.
- LDAP shall speak unto LDAP + first cut at best-fi… Simon E Spero
- Re: LDAP shall speak unto LDAP + first cut at bes… Tim Howes
- Re: LDAP shall speak unto LDAP + first cut at bes… Simon E Spero
- Re: LDAP shall speak unto LDAP + first cut at bes… Tim Howes