Re: proposal for subjects in subject tree

Anders Gillner <awg@sunet.se> Wed, 16 December 1992 18:04 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa06273; 16 Dec 92 13:04 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa06269; 16 Dec 92 13:04 EST
Received: from kona.CC.McGill.CA by CNRI.Reston.VA.US id aa17938; 16 Dec 92 13:06 EST
Received: by kona.cc.mcgill.ca (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA05994 on Wed, 16 Dec 92 12:00:13 -0500
Received: from sunic.sunet.se by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA05780 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Wed, 16 Dec 92 11:50:11 -0500
Received: from localhost.sunet.se by sunic.sunet.se (5.65c8/1.28) id AA23080; Wed, 16 Dec 1992 17:50:00 +0100
Message-Id: <199212161650.AA23080@sunic.sunet.se>
To: bajan@bunyip.com
Cc: eurogopher@ebone.net, gvl@unt.edu, iafa@bunyip.com, jkrey@isi.edu, uri@bunyip.com, Anders Gillner <awg@sunet.se>, nir@bunyip.com
Subject: Re: proposal for subjects in subject tree
Date: Wed, 16 Dec 1992 17:50:00 +0100
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Anders Gillner <awg@sunet.se>

Alan,

First: I loved it ! (Probably because I am a "non-techie"
and tend to think along the same lines :-)), I have not included
your text in total, partly because it is long as it is, partly
because I have`t commented on all of it.

>As a minimum I see the following points as creating an essential
>foundation if we in the two communities are to work together and provide
>the general public with systems that meet their needs for this decade and
>beyond.

right

>First and foremost the stereotypes have to be broken down. While this
>smells a bit like social engineering, it is a problem that needs to be
>recognized and addressed. Without trying to sound silly here, it is still
>partially true to say that the techies (and many people in general) think
>of your average librarian as a little lady with a _really_ tight hairdo,
>pursed lips and cat-eyed glasses sitting around in dimly lit rooms
>reeking of moldy paper.  Similarly, the "Poindexter" (sorry to
>non-Americans here for the culture-specific reference :-),
>pocket-protector-and-sliderule image comes to mind when the general
>public thinks of techies. These stereotypes get in the way of
>understanding what each of us does and what knowledge, expertize and
>experience each group brings to the table.

Even if it has a certain touch of social engeneering, you are definitly
right, I spend a lot of time on this. A friend of mine use to say " That
is a natural conflict you know, like that between church musicians and
priests". We can not afford having that type of conflicts in our field
any more, things are moving to fast. And of course, this type of conflict
is one of societys enemies as a whole. In my country, if I got
things right, there`s also a similar conflict between librarians and those
whos are responsible for archiving old printed material, and they
will also be on the net in the near future so we see another similar
problem coming up......
 

>Contacts between the two communities have to be formalized, strengthened
>and, perhaps most importantly at this juncture _funded_. It is nice when
>the two groups have conferences in the same city at the same time, and as
>welcomed as that kind of "cross-pollenation" is it is simply no longer
>good enough as it depends on the motivated interest of individuals not
>institutional will. The organizations in both communities need to
>establish liaison groups, lines of communication (like common mailing
>lists) and ultimately, face-to-face gatherings in which both feel welcomed
>and accepted.

I thought that was one of the reason for forming the CNI body, and I
think that we need a similar organisation on the european level, maybee
a more formalized one, but no more buero"cracy" please !. We have
already had talks about forming a new body in the nordic countries.
When it comes to funding, your ideas are as good as mine..... 

>We all have to realize that neither group has the financial means nor
>expertize to do this on their own. We really do _need_ one another.

right



>Point 2: Getting it right vs. Getting it now
>--------------------------------------------

>We techies need to realize that the community that the library serves are
>hungry for something right now, not tomorrow or the day after. The users
>have an idea of what's out there and they want to know how to find what
>they are looking for. This of course is also true of the Internet
>environment.  However, the traditional Internet user was a lot more
>likely to put up with the indignities of the FTP client interface and the
>word of mouth method of information dissemination than your average high
>school student is today.

>We have to understand that for the users that don't have access to _any_
>of this information now, something is better than nothing even if it
>means that the information may be out of date when it gets to them.

>I find myself now coming down somewhere in between the two camps: give
>the users the data and make it as accurate as you can at the time, but
>make sure that they realize that this information may be out of date and
>at best they must think of it as a hint. In the meantime, the long
>process of designing technical solutions to providing a higher quality of
>service can be worked on. It doesn't help your student looking for a
>particular piece of information on the network to say "We'd love to help
>you with that, but we'd like for you to get it right all the time... and
>by the way, this will be ready in 1999".

That is exactly what happened with some of the OSI stuff. "We`ll get
you connected to the US in two years!" "Sure, what`s wrong with the
connection we have today ?" "It is running a non-standard protocol!"
"Go fly a kite Sir!". As we don`t want that to happen again, I guess
we have to follow the Internet way, letting things grow out of peoples
need for them. And, as you say, during the time when people are 
reasonably satisfied with that, we have the time working out something
more generally adoptable. I have one concern though, we have to do
it fairly quick, because of the organic growth of services. When we are
ready with the new all_purpose_fetch_anything_on_the_net_+_my_slippers
protocol, what has happened on the net...........?
Maybe, at that time, somebody in Greece, Taiwan or Minnesota has
written a small AI-routine, running around networks on it`s own
finding information according to rules like "new lines of reasoning
in philosophical research" keeping the database at home updated as it
roams around. Only a theorethical example, but that is what could possibly
happen when big, time consuming projects are launched.

>The problems that we are attempting to tackle are enormous in scope and
>while I have no doubt that they will ultimately be solved, it will take a
>lot of talk, work and time.

se above...

Point 3: The Spectrum of Quality
--------------------------------

>Librarians have long realized that there is no chance that they could
>catalogue every piece of information that is made available to them.
>There are possibly millions of publications each year which are never
>proffered for cataloging and even amongst those which are, there are a
>significant number which are not considered "worthy" of the time
>consuming and expensive process.

I remember a couple of years ago, searching for a book which I hoped
could be reference material for an article I had to write. This book
turned out to be a dissertation, so I went to the Royal Library in
Stockholm only to hear one of the librarians say "We have 15.000
unsorted foreign dissertations in the basement, wanna go searching
yourself ?". I think that most people believe that everything still
is sorted, ordered and possible to find.

>Whether we like it or not (and I do) we "ain't seen nothin' yet". It is
>not necessary or even desirable to try and catalogue every (or maybe not
>even a large percentage) of the information which is, and will be,
>published (made available for public or semi-public access) over the next
>say, ten years. However, I have seen comments in the past suggesting that
>certain members of the library _and_ technical communities believe that
>there is some element of "control" still possible in the process. I would
>like to put their mind at ease: We are currently looking at an Internet
>which claims on the order of 10 million users and is growing
>exponentially and I have seen estimates up to 300 Million for 1995. We can
>view each and every one of these people as potential "publishers" of
>information. There is no way that we can even begin to think about trying
>to control this process. Why would we want to?

There are different types of information, and different reasons why we
want to structure it. Scientific information is structured and catalogued
by the libraries so that the scientific community can find what need in
order to bring science forward, but not only for that. Different
periodicals rank differently in the world of scientific merit, and will
do so also in the future. This type of information will have to be
catalogued also in the future, and the libraries, or their 
successors, will have to have some "control" over that type of
information. How do you rank an article published on the net, without
reviewing, in terms of academic merit ? What scientific value has
a discussion like the one which went on, on the net, in a newsgroup, after
the "discovery" of "cold fusion" ? 


>Point 4: Fostering the User Communities
>---------------------------------------
>Stating the obvious: Communities spontaneously arise from every domain of
>human activity, whether this is the "community" of your close friends,
>your local neighbourhood, your football team or your nation. In the most
>general sense these communities are bound together by common interests and
>goals. We see this right here in the mailing list or newsgroup you are
>reading this on: your interests lie in the domain of "Information on the
>Network". We are all members of innumerable communities simultaneously,
>each meeting different needs.

>I believe that this will be a very powerful and useful force as we start
>thinking about the kind of services we will be providing into the
>foreseeable future. Those of us interested in for example, tropical fish
>will naturally gravitate to other people with the same interest. It is
>our job as the architects of the new information infrastructure (what I
>call the "infostructure") to allow our users to locate other members of
>their communities and the information which is important to that
>community. It also means that we need to create systems which allow one
>to "import" the view of that community on a wholesale level such that
>each user on entering the system need not have to rebuild the structure
>that defines that community.

As I said, the Biology "Subject Tree" is a good example of that, just
in that sense, it covers more than biology, it`s there for biologists,
and that is the difference between a "Subject Tree" in the library
sense of the word, and the pragmatic view. We have to get different
kinds of users interested in using the services, getting them to
realize that the services are a benefit to them. My first "Subject
Tree" was created with that in mind. I had the idea of sending what
you might have called "gravitational waves" out on the net, to get
new groups of users interested in the media. Seems to have worked,
our librarians for instance are enthusiastic.....

>Point 5: World Views
>--------------------
>In the past I have said that I have become less and less concerned about
>the particular scheme used to catalogue information resources.  As far as
>I can tell since Linnaeus (and of course before, but he probably was the
>first to use what we now term as the "modern scientific method"), most
>classification schemes have been based on the hierarchical model. It is
>so common that most of us don't think about it: when we go to the
>supermarket and we are looking for ice cream we don't go looking in the
>personal hygiene section.... we go into frozen foods. Nature also seems
>to work in this way (or at least the way the human brain perceives
>"nature"): we have the idea of "dog" and when we recognize those qualities
>of "dogness" in an animal we have just encountered we tend to think
>"related to dog" or "subdog" :-) Of course we also recognize that there
>are some objects that don't always fit as nicely as our intuition would
>have us believe: when we see the panda "bear" we think "bear" not
>"raccoon" which the taxonomists would tell us is the correct answer. [Ask
>any taxonomist about the "guinea pig" (or "cavy" to our British friends)
>and watch them run away screaming ;-), but I digress....].

Hierachical systems makes you miss a lot, yes. Makes you miss the 10%
"catness" in the dog, and probably the 50% dinosaour in birds...and how
many times have you not been running around supermarket to find some-
thing that you just couldn`t classify in any of the categories that
you find in a normal supermarket.......?

>[I can hear you saying "but what's the point?" :-] The point here is that
>I find it difficult to imagine a situation where some group could
>construct a classification scheme in which an adequate (and in many cases
>a one-to-one) mapping could not be performed between it any any other
>scheme. However, this is not my field of expertize and I would welcome
>other insights into this. However I would ask that you take a look at the
>bigger picture before turning up the heat. We are working in the most
>malleable and versatile environment ever created.  It is trivial for a
>computer to transform the number 203.3.4 (BTW, I have no idea if this is
>a valid number or what if anything it refers to) in the Dewey Decimal
>system to whatever text description or other taxonomical system you are
>working in provided the appropriate infostructure exists. These
>transformations should occur at the level of the client/server or
>client/system not at the user/client boundary. 

We shouldn`t have the problem with with different classification
schemes if there have not been a classification problem. It probably
takes an expert in a certain field to come up with a proper scheme
for that subject, so in the gopher case, let`s leave the scheme for
the "sub-topification" (nice word, made it myself:-)) to the experts.
So what happens if we apply Reinhards idea about having subject main-
-tainers for the subject, or maybe interest groups, where we can have
them, leaving the subject tree idea and talking about interest groups
instead ?

>It is often useful to adopt a scheme already in existence however, I
>contend that it is not worth the effort to radically warp a pre-existing
>system to your own use if it does not naturally do so. It is important
>that your primary user base be comfortable with the system you use. If
>other communities feel that it would be useful for them to have access to
>your system, but they work under a different scheme then it is in their
>interest to perform the conversion.


If we leave the classification to the providers, it would be
possible for them to classify a document in a couple of different
schemes, making it possible to search the data with a couple of
search-systems.(The text below is seen more or less from a gopher
perspective.) Would be nice to have a system where you could
have a menu like this for ex:




			Information Search


	1. Search for document classified in a certain library system/
		(You have to know the coding for each system you search)
-------------------------------------------------------------
(below that the menu:)

		1. LCC<?>
		2. Dewey<?>
		3. Unesco<?>
		4. OCLC dokument system<?>
		.
		4. Search any combination of these<?>
		5. Search all of them<?>
------------------------------------------------------------
	
	2. Do a keyword (one or many) search keyword in menues and titles<?>
		(Veronica)
	3. Search the subject database(free text) on a certain subject/
		(WAIS)
---------------------------------------------------------------
(below that a menu:)

		1. Archeology<?>
		2. Arts <?>
		.
		.

		etc...
--------------------------------------------------------------
Seems a little odd maybe, but possibly, people with a small piece
of info don`t take the time to fill in a library classification
into a template, but that info should still be searchable via
Veronica and WAIS or WWW or whatever.(Of course none of the existing
systems will win, we have to combine them in one way or another). If we
combine this with a facility where you can read an abstract, and get
the option of getting the whole document by reacting to a question
from the gopher, then we need:

1.	Abstracts, classified in any/or many(:-)) supported library systems.
	including some dialogue feature(easy to do with hypertext I presume)

2.	Dokuments, each linked to their specific abstrac by some means.
	(see above)

3.	A search-system searching inside the abstracts for
	library classifications.

4.	A search-system searching in document names and menues.
	(Veronica or Veronica_look_alike)

5.	A search-system for freetext-search.
	(WAIS, SR, or some cloned library system or whatever) 

6.	Interest group Maintainers to glue this together.

7.	Information providers to do the basic work.

8.	We also need to keep the geographical structure,
	(maybee it should be based more on language than
	on geography).

The End (at last!:-))


regards/awg