Keywords, direct navigation, and search layer 2 (was: RE: "so-called" keyword and layer 3)

John C Klensin <> Thu, 06 December 2001 17:18 UTC

Return-Path: <>
Received: from by (PMDF V6.0-025 #44856) id <> (original mail from; Thu, 06 Dec 2001 12:18:53 -0500 (EST)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Thu, 06 Dec 2001 12:18:52 -0500 (EST)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Thu, 06 Dec 2001 12:18:51 -0500 (EST)
Received: from ([]) by (PMDF V6.0-025 #44856) with ESMTP id <> for; Thu, 06 Dec 2001 12:18:51 -0500 (EST)
Received: from [] (helo=P2) by with esmtp (Exim 3.22 #1) id 16C26w-000Gwc-00; Thu, 06 Dec 2001 17:15:02 +0000
Date: Thu, 06 Dec 2001 12:15:01 -0500
From: John C Klensin <>
Subject: Keywords, direct navigation, and search layer 2 (was: RE: "so-called" keyword and layer 3)
In-reply-to: <>
To: Yves Arrouye <>
Message-id: <122895213.1007640901@P2>
MIME-version: 1.0
X-Mailer: Mulberry/2.1.1 (Win32)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
References: <7FC3066C236FD511BC5900508BAC86FE4D7823@trestles.inte>
List-Owner: <>
List-Post: <>
List-Subscribe: <>, <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Help: <>, <>
List-Id: <>

--On Thursday, 06 December, 2001 06:44 -0800 Yves Arrouye
<> wrote:

>> (i) I'm worried about scaling with them, and especially about
>> creating yet another situation in which someone has to decide
>> who is entitled ("has rights to", "has the best claim on",
>> "most closely matches") some word or string.   In a way, that
>> is another kind of economic constraint, but, if we can meet
>> the technical and end-user requirements without having to
>> implicitly write ICANN, WIPO, or the equivalent into the
>> protocol, I think that is desirable.   I believe that the "no
>> overseer" requirement is more easily satisfied with keywords
>> at sublayer three than at two.
> I am worried about having categories in the layer 2 for that
> reason. To me, industry categories implicitely mean WIPO. As a
> matter of fact, the authors of the SLS document explicitely
> refer to the Nice agreement:

Oh, yes, indeed.  "dns search" explicitly refers to the Nice

> I know that the introduction of category helps widen the space
> of potential common names for a given service type, but
> basically, it means IP lawyers still decide who can get what.

Oh, no.  That is the beauty of this model.  Let me try to
review it, since I gather "dns search" isn't clear enough:

In most countries, trademark "rights", and the resultant
trademark lawyers, are a fact of life.  No amount of wishing
will cause them to disappear.  And, for organizations doing
business internationally, WIPO is a fact of life for similar
reasons.  The idea is not to undo either, but to try to avoid
Internet-specific regulation or rules, or things that force
the Internet to be treated differently than anything else.

(I'm going to ignore "well-known marks" in what follows --
they are a mess in and of themselves, but really don't change
anything but arguments about scope.)

When one registers a trademark in any country which is part
of the WIPO treaties (most of them) and, I've been told, in
most of the others, one has to identify the types of
businesses or services to which it applies.  Ultimately, that
is done by selecting business name categories from the
nationally-accepted list.  The Nice treaty list isn't
definitive --some countries don't use it in explicit form--
but it is the only internatially-agreed-upon list, and it is
pretty representative of the genre.  That is a reality of
trademark registration; it has nothing to do with the

It is important to note that, at least in most countries, no
one tells a potential registrant what categories they can
list.  A longer list expands the scope of coverage and widens
the range of possible challenges (those tradeoffs are
ordinary business decisions) and, in some areas, may increase
fees.  A too-broad list may also lead to later challenges on
the grounds that the name isn't being used "that way", but
that is dealt with in established ways, too.  And again, this
way of doing things predates that Internet by many years and
really has nothing to do with it.

If one trademark holder tries to challenge another over the
name, or a trademark holder challenges a use of a similar
name as infringing, the first question is, more or less, "are
they in the same industry sector" and, again, these lists are
very important and we are stuck with them.

Application of this model in the faceted system differs from
the current DNS model in one absolutely essential way: WIPO's
role is passive with regard to any registration or set of
registrations.  They establish a list of category values -- a
restricted vocabulary if you want to look at it from an
information retrieval point of view-- or, more specifically,
we adopt a list they have already established.  And after
that, they are out of the picture: no dispute resolution
mechanisms rooted in specific names, no reports about how
little green rocks are actually apples (ok, they haven't done
that, but it has felt that way).  

Instead, someone comes to a database provider and says "I
want to be listed this way, with a name string, a country, an
industry code, etc".  And they can say that several times if
they want to, varying the value for any of those facets.  If
someone doesn't like it, they mount a challenge using
_conventional_ mechanisms: the database providers are no more
part of the problem that a newspaper would be if it ran an
advertisement for a company under a name that was later

In case it isn't clear, WIPO's category-value-list specification
role is strictly limited to that particular facet, too.  The
other facets are Not Their Problem and should not be formally
visible to them.

Yes, WIPO staff (very senior staff) has reviewed, and agreed
with, the analysis that underlies the above, although they
(obviously) haven't looked at my summary of it in this note.

>> If a "keyword system" is structured as
>>  (a)	{common name, country, language, service type},
>> then it meets my criteria for a sublayer two system (although
> I see keywords systems, from a direct navigation standpoint,
> as (a). The common name is the key (along with country,
> language, and service type) used to get an object descriptor
> who contains a set of facets. The fact that this descriptor
> may also hold other facets in order to help additional
> applications on top of the layer 2 lookup system, is not
> really relevant to the fact that it does behave properly with
> a set of key facets. We tried to explain that, and the
> importance of having these facets that are part of a key, in
> draft-arrouye-kls-00.txt. The fact that an implementation may
> want to add non-key facets to directly support higher level
> services is more of an implementation and ease of subscription
> issue than an acknowledgement that this implementation does
> not want to participate in a layer 2 lookup system.

Ok.  draft-arrouye-kls-00.txt is not as clear as it could be
on this point, just as, obviously now,
draft-klensin-dns-search-02.txt isn't clear enough.  I'd
welcome text from you, and will look at your draft again and
try to supply text.  To see if we finally understand each
other, let me try to restate your comment above into the
language of "dns search" (and the relevant information
retrieval and classificatory systems literature as I
understand it):

  The database(s) for search layer 2 are going to contain a
  full set of facets.  One can "leave one out" by asserting
  that any searches that involve that facet should always
  match, i.e., by giving it a "matches everything" value.  I
  wouldn't recommend it, but that is a business decision
  which we don't need to resolve and the marketplace will
  figure out who is right.   And, since uniqueness is not
  required in the database itself, I'm not going to use the
  work "key": the database is not intrinsically relational in
  normal form (although one might implement it that way);
  keys are a function of search and retrieval strategies.

  A search in that search layer can specify values for any
  combination of facets that the searcher, or search-vendor,
  finds appropriate.  Leaving one out is equivalent to "match
  anything that happens to be there".  And the question of
  how much fuzziness to permit is also a function of the
  search mechanism.  (I would love to see a search product as
  general as what I'm saying here implies, but I don't expect
  to see one out of the laboratory, nor would I expect it to
  work at scale or to be economically viable.  But I could be

  To invent a plausible notation for talking about this (but
  just a notation, not a norm), we might talk about a search
  as specifying (since we have readers of this list who may
  not be familiar with ABNF, I'm going to use "pure" BNF for
  the notational syntax):

     <search> :== "{" <facet-tuple-list> <referral-range> "}"
     <facet-tuple-list> :== "{" <facet-tuple>
     	 [<facet-tuple-list>] "}"
     <facet-tuple> :== <facet-name> <facet-value>

  The (vague) semantics for the things that may not be
  obvious, are: 

   "distance-indicator" specifies the degree of fuzziness

   "referral-range" specifies how far to go down a referral
   (or "search the next database") chain if a match is not
   found in the initial database search.   I'd guess it would
   be expressed in a hop-count TTL, but haven't worked all of
   the cases through. 

   In both cases, I'm not sure that the value can really be
   expressed as an integer or real scalar, but let's try to
   keep at least the example simple.

Now, in that language, your "direct navigation" keyword
lookup process (and key), as I now understand it, might be
expressed as:

 {{ name-string "common name" 0 } 
  { geographic-location "country" 0 }
  { language "language-id" 0 }
  { industry-code "service type" 0 }
  0 }

Where the first four null values indicate "exact match"
(i.e., no distance permitted between the search value and the
database value) and the last one would indicate "no
referrals" (i.e., if it isn't found in your database, quit
and return "not found").  The latter is, I believe, necessary
to enforce/preserve your particular brand of keyword-based

Aside: Where your system and this may get into trouble is
  this model assumes that the geographic-location, language, 
  and and industry-code facets will have values based on
  established, consensus-standardized, lists from which the
  database entries are merely choices.  Search vendors don't
  get to make up either the facet names or the names of the
  category values.  Without that constraint, we have a bigger
  mess than the LDAP one, with everyone essentially selecting
  their own schema and values.  In particular, if your
  "service type" isn't isomorphic with the WIPO/Nice list (or
  whatever else is chosen), you will need a mapping function,
  s.t. that <facet-tuple> element becomes 

		{ industry-code MapToNice ("service type") 0 }

  I don't see a problem with doing that, and suspect your
  going through your service-types and checking them against
  the Nice list might be intellectually interesting (and that
  it would ultimately provide value to your customers).

Now, I hope obviously, the important business issue here is
whether generalizing that to the point that users can specify
the additional facets, or different degrees of fuzziness, or
different referral properties, makes sense.  UIs are hard,
users like things simple, and general solutions don't have a
good history in the marketplace, often for precisely those
reasons.  So permitting only an exact-match {common name,
country, language, service type) mechanism to be exposed may
make much more sense than doing something more general.  But
the general _model_ is important, if only because I can prove
it is scalable while I believe that, in the last analysis,
your system is still subject to what we have come to call the
"Joe's Pizza" problem -- far less quickly than "" or
"" would be, but the potential is there.