Re: root knowledge
yeongw@spartacus.psi.com Tue, 12 May 1992 16:29 UTC
Received: from nri.nri.reston.va.us by ietf.NRI.Reston.VA.US id aa24206; 12 May 92 12:29 EDT
Received: from nri.reston.va.us by NRI.Reston.VA.US id aa17772; 12 May 92 12:35 EDT
Received: from bells.cs.ucl.ac.uk by NRI.Reston.VA.US id aa17750; 12 May 92 12:35 EDT
Received: from spartacus.psi.com by bells.cs.ucl.ac.uk with Internet SMTP id <g.23173-0@bells.cs.ucl.ac.uk>; Tue, 12 May 1992 15:59:07 +0100
Received: from localhost by spartacus.psi.com (5.61/1.3-PSI/PSINet) id AA00492; Tue, 12 May 92 10:58:52 -0400
Message-Id: <9205121458.AA00492@spartacus.psi.com>
To: osi-ds@cs.ucl.ac.uk
Subject: Re: root knowledge
Cc: yeongw@psi.com
Reply-To: osi-ds@cs.ucl.ac.uk
In-Reply-To: Your message of Tue, 12 May 92 12:40:29 -0000. <199205121240.AA16195@mitsou.inria.fr>
Date: Tue, 12 May 1992 10:58:50 -0400
From: yeongw@spartacus.psi.com
This message actually has nothing to do with the management and distribution of root knowledge. But Christian made a statement that I absolutely feel obliged to comment on. > The fact is that we cannot perform distributed data base access intelligently > and resort a mixture of crude hierarchies and brute force replications. > Hierarchies > and caching are OK for a "read only" service, e.g. DNS. They make a stinking > white page service. I couldn't agree more. In fact I would state it even more strongly by saying that a hierarchy is not even "OK" for "read only" services. I will now don a few flame retardant suits :-) and say The single biggest problem with X.500 is that it structures data as a tree At the risk of splitting very fine hairs, I'll say that the above refers not to the way information is modeled (as a tree), but to the way information is distributed in the 'database' (in a tree-like way). The fact that the DIB is in fact structured as a DIT for the purposes of determining the boundaries between the information held by different DSAs means that it is very difficult to represent many-many relationships in a way that makes searching the database based on different relationships easy (or even doable in the case of a large data set). Basically, if you embed one relationship into the hierarchy used for distributing information across DSAs, searching based on that one relationship is going to yield adequate/reasonable/good performance. But once you want to search on any other relationship besides the one that is embedded into the DIT structure, search performance becomes 'interesting' :-). The workaround that we (the Internet X.500 community) have been adopting is to store reasonably autonomous subsets of information in single DSAs. For example, entire organizational subtrees get stored in one DSA. Then we play indexing games to make the performance of searches based on different criteria (the "search indexes" :-)) reasonable within that one DSA, and hope that most searches don't extend beyond the boundaries of the information stored in that one DSA. Carrying this to an extreme, you end up with Christian's scenario of having to store the population of France in a single DSA and play aforementioned indexing games to make searching on anything but the relationship embedded into the actual DIT 'reasonable'. Don't get me wrong: playing indexing games is not only "all right", but should be encouraged. It solves half the problem, the half having to do with making sure that searches that don't cross DSA boundaries have reasonable performance. However we still need to workaround the other half of the problem of what to do with searches (in general, Directory operations) that have to span multiple DSAs. [Also, to be a little irreverent before the almighty X.500 altar :-), I should point out that any number of database system vendors would be most happy to sell us products that could run circles around our X.500 implementations if all we wanted was high performance searching on a centralized -- in a single "DSA" -- database. X.500's strength is in the infrastructure it provides for distributing information, not in its ability to provide for implementations that have blindingly fast operations.] At this point, I'll get up on one of my favorite soapboxes :-) and suggest that a real Directory needs more than just the geographical hierarchy (the "White Pages" namespaces) in order to be useful. In addition to the geographical hierarchy, there is a need to represent the information from other, non-geographical namespaces in alternate DIT hierarchies. And, of course, the relationships between the information in the various hierarchies should also be represented, by means of pointers. For starters, in the Internet, I think we need to get the domain namespace and the IP address space (which is a "namespace" of sorts -- and yes, I do mean network addresses in general, not just IP addresses, but the reality right now is that the pressing need is for IP address representation) in. There are a number of relationships, network <--> network contact, domain <--> domain contact, domain(s) <--> network(s) to name just three pairs, which are best (from both modeling and performance standpoints) represented by pointers between hierarchies, and not as explicit entries shoehorned into the existing geographical namespace. Two things though (which I have to mention because I've been misunderstood before :-(): (a) I am not advocating constructing a separate hierarchy of aliases/pointers/what-have-you for every possible criteria a Directory user could base an operation on. Doing so is even less practical then trying to index every possible attribute in an entry within a DSA's 'database system'. Notice in the above that I tied the creation of a hierarchy to the existence of a (autonomous) namespace. Although there are certainly good cases for exceptions (an organizational role 'index' hierarchy for example), and I'll admit that I haven't thought this through completely (can't think this through really: need to deploy and play), I'll state very strongly that I think that alternate hierarchies should only be created to represent information from a separate 'namespace', and should actually contain useful information (ie., shouldn't just be a tree of pointers). Of course, I'll probably end up eating these words later :-) :-). (b) specifically in reference to putting the DNS into the DIB, all the DNS proponents are invited to note that I did not mention "domain name <--> IP address" as a relationship that needs to be represented above. Not that it isn't a useful relationship, of course, just that the Internet already has a perfectly good way of representing this relationship in the DNS system itself. The point is this: I'm not interested in playing the "my protocol is better than your protocol" game (especially with you DNS people since you have a working system, us Directory folks don't :-) :-)). DNS information needs to go into the DIB so that *other* relationships can be represented. The fact that the domain <--> IP address relationship happens to "fall out", is a bonus, not the motivation for the effort [and I do have an answer to the "why not move the White Pages information to the DNS, instead of moving the DNS information into X.500" question too, but this message is too long already ...] from my point of view. Wengyik
- Re: root knowledge Colin Robbins
- Re: root knowledge pays
- Re: root knowledge Steve Hardcastle-Kille
- Re: root knowledge Sylvain Langlois
- DSAs through ISDN connections Sylvain Langlois
- Re: root knowledge pays
- Re: root knowledge Colin Robbins
- Re: root knowledge pays
- Re: root knowledge Colin Robbins
- Re: root knowledge pays
- Re: root knowledge Colin Robbins
- Re: root knowledge pays
- Re: root knowledge Christian Huitema
- Re: root knowledge Steve Hardcastle-Kille
- Re: root knowledge yeongw
- Re: root knowledge Andrew Waugh
- Re: root knowledge yeongw
- Re: root knowledge Andrew Waugh
- Re: root knowledge yeongw
- Re: root knowledge Thomas Johannsen