RE: Transport requirements for DNS-like protocols

John C Klensin <klensin@jck.com> Mon, 01 July 2002 01:14 UTC

Return-Path: <ietf-irnss-errors@lists.elistx.com>
Received: from ELIST-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GYJ00J04Q4PK4@eListX.com> (original mail from klensin@jck.com); Sun, 30 Jun 2002 21:14:49 -0400 (EDT)
Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GYJ00J01Q4PK2@eListX.com> for ietf-irnss@elist.lists.elistx.com (ORCPT ietf-irnss@lists.elistx.com); Sun, 30 Jun 2002 21:14:49 -0400 (EDT)
Received: from DIRECTORY-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GYJ00J01Q4PK1@eListX.com> for ietf-irnss@elist.lists.elistx.com (ORCPT ietf-irnss@lists.elistx.com); Sun, 30 Jun 2002 21:14:49 -0400 (EDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by eListX.com (PMDF V6.0-025 #44856) with ESMTP id <0GYJ00HB3Q4OLG@eListX.com> for ietf-irnss@lists.elistx.com; Sun, 30 Jun 2002 21:14:48 -0400 (EDT)
Received: from ns.jck.com ([209.187.148.211] helo=localhost.jck.com) by bs.jck.com with esmtp (Exim 3.35 #1) id 17OplT-000NzX-00; Mon, 01 Jul 2002 01:14:03 +0000
Date: Sun, 30 Jun 2002 21:14:02 -0400
From: John C Klensin <klensin@jck.com>
Subject: RE: Transport requirements for DNS-like protocols
In-reply-to: <17823168.1025345475@localhost>
To: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@cisco.com>, Nicolas Popp <nico@realnames.com>, 'Michael Mealling' <michael@neonym.net>
Cc: Rob Austein <sra@hactrn.net>, ietf-irnss@lists.elistx.com
Message-id: <2626548.1025471624@KLENSIN-TP>
MIME-version: 1.0
X-Mailer: Mulberry/2.2.1 (Win32)
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: QUOTED-PRINTABLE
Content-disposition: inline
References: <17823168.1025345475@localhost>
List-Owner: <mailto:ietf-irnss-help@lists.elistx.com>
List-Post: <mailto:ietf-irnss@lists.elistx.com>
List-Subscribe: <http://lists.elistx.com/ob/adm.pl>, <mailto:ietf-irnss-request@lists.elistx.com?body=subscribe>
List-Unsubscribe: <http://lists.elistx.com/ob/adm.pl>, <mailto:ietf-irnss-request@lists.elistx.com?body=unsubscribe>
List-Archive: <http://lists.elistx.com/archives/ietf-irnss/>
List-Help: <http://lists.elistx.com/elists/admin.shtml>, <mailto:ietf-irnss-request@lists.elistx.com?body=help>
List-Id: <ietf-irnss.lists.elistx.com>

--On Saturday, June 29, 2002 10:11 AM +0200 Patrik Fältström
<paf@cisco.com> wrote:

> --On 2002-06-28 10.24 -0700 Nicolas Popp <nico@realnames.com>
> wrote:
> 
>> As soon as you do fuzzy matching that forces you to retrieve
>> multiple records and rank them, the operational complexity is
>> increased ten-fold (and your query response time becomes way
>> more inpredictable unless you do a few "right things").
> 
> Doing fuzzy-matching is most efficiently done by doing a
> calculation of a hash on the search string (something like
> soundex) and then exact mathing in the database.
> 
> So, fuzzy-matching is for me just another version of
> "preparation" of the search string.

Patrik,

In a number of areas, matching by distance function --i.e.,
knowing all of the things that might match and determining which
one(s) are closest-- has turned out to be much more useful than
matching on a canonical form.  In one of the classic examples,
the first-generation theory of how to do OCR was to try to
standardize ("prepare" in your terminology) characters,
font-independent, down to a common abstraction.  Nice idea, but
it basically didn't work.  Instead, we now assume (with English)
that a given character has to match one of 62, and make a
tentative decision based on similarity functions.  Then we
repeat the process, looking up word-candidates in a dictionary
to see which candidates can be excluded because they are
uncommon in, or absent from, the language.

Sonex/ soundex matches are fuzzy matching, but they are not
fuzzy search; I think that fuzzy search is going to be needed
here.

So, I hope you are right -- it would be a lot easier.  But...

     john