Re: root knowledge

Andrew Waugh <A.Waugh@mel.dit.csiro.au> Wed, 13 May 1992 02:23 UTC

Received: from nri.nri.reston.va.us by ietf.NRI.Reston.VA.US id aa26270; 12 May 92 22:23 EDT
Received: from nri.reston.va.us by NRI.Reston.VA.US id aa20034; 12 May 92 22:29 EDT
Received: from bells.cs.ucl.ac.uk by NRI.Reston.VA.US id aa20030; 12 May 92 22:29 EDT
Received: from shark.mel.dit.CSIRO.AU by bells.cs.ucl.ac.uk with Internet SMTP id <g.03108-0@bells.cs.ucl.ac.uk>; Wed, 13 May 1992 02:16:12 +0100
Received: from squid.mel.dit.CSIRO.AU by shark.mel.dit.csiro.au with SMTP id AA11506 (5.65c/IDA-1.4.4/DIT-1.3 for <osi-ds@cs.ucl.ac.uk>); Wed, 13 May 1992 11:15:52 +1000
Received: by squid.mel.dit.CSIRO.AU (4.1/SMI-4.0) id AA12015; Wed, 13 May 92 11:15:51 EST
Message-Id: <9205130115.AA12015@squid.mel.dit.CSIRO.AU>
To: osi-ds@cs.ucl.ac.uk
Cc: yeongw@psi.com
Subject: Re: root knowledge
In-Reply-To: Your message of "Tue, 12 May 92 10:58:50 -0400." <9205121458.AA00492@spartacus.psi.com>
Date: Wed, 13 May 92 11:15:51 +1000
From: Andrew Waugh <A.Waugh@mel.dit.csiro.au>

>[..] to be a little irreverent before the almighty X.500 altar :-) [...]

Burn the heretic! Who's got the firelighters? :-)

>This message actually has nothing to do with the management and
>distribution of root knowledge. But Christian made a statement that
>I absolutely feel obliged to comment on.
>
>> The fact is that we cannot perform distributed data base access intelligently
>> and resort a mixture of crude hierarchies and brute force replications.
>> Hierarchies
>> and caching are OK for a "read only" service, e.g. DNS. They make a stinking 
>> white page service.
>
>I couldn't agree more. In fact I would state it even more strongly by
>saying that a hierarchy is not even "OK" for "read only" services.
>
>I will now don a few flame retardant suits :-) and say
>
>	The single biggest problem with X.500 is that
>	it structures data as a tree
> [...]
>At this point, I'll get up on one of my favorite soapboxes :-) and
>suggest that a real Directory needs more than just the geographical
>hierarchy (the "White Pages" namespaces) in order to be useful.
>In addition to the geographical hierarchy, there is a need
>to represent the information from other, non-geographical
>namespaces in alternate DIT hierarchies. And, of course, the
>relationships between the information in the various hierarchies
>should also be represented, by means of pointers.

Yes, of course. But, to be fair, I doubt that the hierarchical database
was chosen lightly or without due thought for the consequences.

X.500 was designed as a directory which, potentially, could store billions
of entries in millions of DSAs scattered throughout the world. This leads
to what I call the "negative" problem: how do you authoritatively say
"The data you requested does not exist" without visiting each of these
DSAs?

The solution adopted by X.500 was the DIT hierarchy and the hierarchical
distribution of this DIT amongst DSAs with appropriate linkages. Given
a distinguished name the navigation algorithm can directly navigate to the
DSAs which should hold the information and determine, authoritatively,
whether the information exists or not.

A secondary reason for the choice of hierarchically arranged DIT is that it
allows countries and organisations to own the DSAs which master their own data.

The down side of this approach, as you have noted, is that information which
is not geographically (actually organisationally) arranged is difficult to
efficiently store in an X.500 system.

I found it interesting that you refer to the Internet domain name space and
address space as an example of this problem. It is interesting because the
DNS uses exactly the same solution as X.500 (or should I say X.500 uses the
same solution as DNS :-). The DNS has a hierarchical tree and the information
is stored in a hierarchically arranged system of servers. DNS has, of course,
exactly the same problem.

The example I always use of this problem concerns the problem of storing a
library catalogue in an X.500 system.

It would be trivial to store the catalog for a single library within X.500.
Under the library's organisational entry would be entries for each book. It
is much more difficult to store an equivalent catalog for several separate
libraries within X.500. You could, hypothetically, underneath the Australian
entry place a subtree 'Australian Library Catalog' which would contain an
entry for each book contained in a library in Australia. But how do you
distribute this subtree across DSAs run by the Australian libraries? The
problem is, of course, that we are attempting to store data in X.500 which is
not distributed organisationally.

'Archie' addresses a similar problem. Files are held in geographically
distributed file servers, but, logically, the files are not organised in
a geographical way. Archie 'solves' this by periodically polling each
archive site and using the information retrieved to build a list of
files and sites which hold the file. This is actually more useful for users
as they merely have to query Archie instead of explicitly searching individual
file servers.

Locally, I have recently suggested using X.500 as a means of providing
better access to the Archie data. Currently the problem with finding a
particular software system is that you need to know the name of the file to
ftp. It would be simple to construct an X.500 system which stores information
about the software systems available for anonymous ftp. Users could easily
search this system for 'freely available osi implementations' (:-) using
already available DUAs. Because of the non-geographical nature
of the data, however, this solution would still rely on an Archie style
polling of ftp servers to actually build the X.500 database.

Another possible 'solution' is based on the idea that:
>The workaround that we (the Internet X.500 community) have been
>adopting is to store reasonably autonomous subsets of information
>in single DSAs. For example, entire organizational subtrees get
>stored in one DSA.

Remember that a 'DSA' is a logical construct. A 'DSA' is simply an
entity which provides access to the data to DUAs and DSAs using the X.500
protocols. Internally, the data could be stored and replicated using
quite different methods.

An organisation could store all of its information in one 'DSA'. This
DSA could actually be constructed out of many different entities geographically
distributed. The data storage and data exchange could be nothing like X.500
and be deliberately designed to operate with maximum speed.

Turning to countries and organisations, you are going to have to have a
country DSA with at least specific subordinate references to each organisation
in that country. (All right, you could have a couple of country DSAs each
with a subset of the organisations, but we are talking about a couple, not
ten or twenty, and certainly not hundreds.) Otherwise, even performing the
navigation algorithm at the country level will be horrifically slow.

The incremental cost of actually storing the organisational entry
instead of just the specific subordinate reference in the country DSA will
be relatively small. As the benefits - being able to search on non RDN
information in the organisational entry - will be so great, I couldn't
imagine a real commercial X.500 system not doing this.

One final point. I hope I haven't given the impression that I think that X.500
is the be all and end all of distributed databases. It is not. It does have
serious problems when used for certain applications. However, many of these
problems are caused by the limitations imposed by the highly distributed
requirements. When designing an application we must be aware of the strengths
and weaknesses of X.500 and choose the right tool for the job. In many cases
this will not be X.500. In many cases, though, it will and there are ways
around some of the problems.

andrew waugh