RE: "so-called" keyword and layer 3

John C Klensin <klensin@jck.com> Wed, 05 December 2001 06:39 UTC

Return-Path: <ietf-irnss-errors@lists.elistx.com>
Received: from ELIST-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GNU00K04YHN3U@eListX.com> (original mail from klensin@jck.com); Wed, 05 Dec 2001 01:39:24 -0500 (EST)
Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GNU00K01YHK3S@eListX.com> for ietf-irnss@elist.lists.elistx.com (ORCPT ietf-irnss@lists.elistx.com); Wed, 05 Dec 2001 01:39:22 -0500 (EST)
Received: from DIRECTORY-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GNU00K01YHJ3R@eListX.com> for ietf-irnss@elist.lists.elistx.com (ORCPT ietf-irnss@lists.elistx.com); Wed, 05 Dec 2001 01:39:19 -0500 (EST)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by eListX.com (PMDF V6.0-025 #44856) with ESMTP id <0GNU00J0TYHFXU@eListX.com> for ietf-irnss@lists.elistx.com; Wed, 05 Dec 2001 01:39:16 -0500 (EST)
Received: from ns.jck.com ([209.187.148.211] helo=tat.jck.com) by bs.jck.com with esmtp (Exim 3.22 #1) id 16BVfe-000E2J-00; Wed, 05 Dec 2001 06:36:42 +0000
Date: Wed, 05 Dec 2001 01:36:38 -0500
From: John C Klensin <klensin@jck.com>
Subject: RE: "so-called" keyword and layer 3
In-reply-to: <7FC3066C236FD511BC5900508BAC86FE4364AC@trestles.internal.realnames.com>
To: Nicolas Popp <nico@realnames.com>
Cc: ietf-irnss@lists.elistx.com
Message-id: <12699928.1007516198@localhost>
MIME-version: 1.0
X-Mailer: Mulberry/2.1.1 (Win32)
Content-type: text/plain; charset="iso-8859-1"
Content-transfer-encoding: quoted-printable
Content-disposition: inline
References: <7FC3066C236FD511BC5900508BAC86FE4364AC@trestles.inter nal.realnames.com>
List-Owner: <mailto:ietf-irnss-help@lists.elistx.com>
List-Post: <mailto:ietf-irnss@lists.elistx.com>
List-Subscribe: <http://lists.elistx.com/ob/adm.pl>, <mailto:ietf-irnss-request@lists.elistx.com?body=subscribe>
List-Unsubscribe: <http://lists.elistx.com/ob/adm.pl>, <mailto:ietf-irnss-request@lists.elistx.com?body=unsubscribe>
List-Archive: <http://lists.elistx.com/archives/ietf-irnss>
List-Help: <http://lists.elistx.com/elists/admin.shtml>, <mailto:ietf-irnss-request@lists.elistx.com?body=help>
List-Id: <ietf-irnss.lists.elistx.com>

Just for calibration, and for whatever it is worth, I agree with
Nico's analysis, although not necessarily his conclusion (see
below).

Building on my earlier note, which I'm not going to repeat, two
observations:

* The white pages/ yellow pages analogy was in the document
because it seemed useful.  One of my inferences from Nico's note
is that it isn't as useful as I thought and that it might be
confusing things a bit.  Analogies are like that.

* I have been trying to push keyword systems out to sublayer
three of the model for two reasons, which should probably be in
the "dns search" document.  Next time.

(i) I'm worried about scaling with them, and especially about
creating yet another situation in which someone has to decide who
is entitled ("has rights to", "has the best claim on", "most
closely matches") some word or string.   In a way, that is
another kind of economic constraint, but, if we can meet the
technical and end-user requirements without having to implicitly
write ICANN, WIPO, or the equivalent into the protocol, I think
that is desirable.   I believe that the "no overseer" requirement
is more easily satisfied with keywords at sublayer three than at
two.

(ii) Many years ago, I spent time working on information
retrieval and automatic indexing systems.   The experience left
me quite afraid of the "is that a string, or something else, and,
if so, what" meta-question.   End users don't understand it (but,
worse, often think they do) and the systems turn out to be harder
to implement in an intelligent and consistent way than one would
like.   
So, sublayer two is defined with the first facet as
"name-string".  It isn't "name-string, or unordered list of
keywords, or "phrase with proximity weighting among words", or
any of the other options.

If a "keyword system" is structured as 
 (a)	{common name, country, language, service type},
then it meets my criteria for a sublayer two system (although I'm
still concerned about scaling).  It is only when we have
 (b)	{{descriptive-keyword1, descriptive-keyword2,...},
	country, language, service type}
or
 (c)	{{common-name, descriptive-keyword1,
	descriptive-keyword2,...}, country, language, service
	type}
that I start getting anxious and pushing toward sublayer three.

But most of the keyword systems I've seen described seem to be
more like (b) or (c) than like (a).  And, in case it hasn't been
clear, 
I have been using terminology such as "so-called" because, in the
traditional terminology of information retrieval (at least as I
was taught it), only (b) and (c) contain "keyword systems".  Even
they are not strictly keyword systems, because the additional
facets are arguably something else (e.g., the geographical
location facet described in "dns search" would presumably be a
set of numeric values from a continuous scale).  And (a) isn't a
keyword system at all, but a "name and additional facets" system,
or, if one prefers, an aliasing system (it was presumably no
accident that, historically, the company Nico works for is called
"RealNames", not "Real Keywords").

If it is useful from a marketing standpoint to call these things
"keywords" and "keyword systems", so be it.  But, here, a little
bit of precision about terminology may be helpful.

    john

--On Tuesday, 04 December, 2001 09:56 -0800 Nicolas Popp
<nico@realnames.com> wrote:

> 
> James. 
> 
> I think you are missing Mr Ko¡¯s point.
> 
> John asserts that keyword Systems are best viewed as layer 3
> services. At the same time, all the Keyword System implementers
> are saying that this not a faithful reflection of reality. In
> fact, what does ¡°real-world deployment¡± experience tells us?
> It tells us the following:
> 
> 1.	None of the deployed Keyword systems (AOL, Netpia, CNNIC,
> TWNIC, 3722, RealNames¡¦) have been deployed as yellow page
> services (layer 3). The breadth of their data and registered
> metadata (today) actually makes them inferior directory
> solutions to the Yahoo or the Looksmart of the world (or layer
> 4 services like search engines). As far as local information,
> beside country and language, they don¡¯t host any local
> information like the one John describes for layer 3 services.
> 
> 2.	On the other hand, ALL deployed Keyword Systems have been
> deployed for direct navigation (AOL, Netpia, CNNIC, TWNIC,
> 3721, RealNames¡¦). In the context of direct navigation, a
> Keyword System unambiguously looks like a white page service
> (with a strong uniqueness requirement (each tuple {common name,
> country, language, service type} is unique). That, on the other
> hand, does look like a layer 2 service to me from what I read in
> John¡¯s paper.
> 
> Now, you can argue that this is a choice of business model.
> However, it would be missing the larger picture and the larger
> picture is: ¡°how do you solve the chicken and egg problem of
> deploying a large-scale, high quality directory service on the
> network¡±? 
> 
> The answer from all the Keywords System has been the same:
> first you lay an "egg". Laying an egg means you restrict the
> cope of the facets and build a layer two system with a focus on
> one differentiated application that destination directory
> services cannot compete with and that users want (direct
> navigation in all scripts with simple names that have no
> syntax). Once you have enough data, deployment and adoption,
> you build the "chicken" (you had more metadata and build a
> layer 3 service). 
> 
> What I am saying is that layer 2 service must come first and
> then can grow into differentiated directory services. Beside
> business models, that¡¯s why all Keyword Systems are layer 2
> systems. They have been trying to bootstrap layer two for 4
> years now. They are indeed layer two services. They could not
> be anything else.
> 
> -Nico