RE: Transport requirements for DNS-like protocols

John C Klensin <> Mon, 01 July 2002 01:14 UTC

Return-Path: <>
Received: from by (PMDF V6.0-025 #44856) id <> (original mail from; Sun, 30 Jun 2002 21:14:49 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Sun, 30 Jun 2002 21:14:49 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Sun, 30 Jun 2002 21:14:49 -0400 (EDT)
Received: from ( []) by (PMDF V6.0-025 #44856) with ESMTP id <> for; Sun, 30 Jun 2002 21:14:48 -0400 (EDT)
Received: from ([] by with esmtp (Exim 3.35 #1) id 17OplT-000NzX-00; Mon, 01 Jul 2002 01:14:03 +0000
Date: Sun, 30 Jun 2002 21:14:02 -0400
From: John C Klensin <>
Subject: RE: Transport requirements for DNS-like protocols
In-reply-to: <17823168.1025345475@localhost>
To: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <>, Nicolas Popp <>, 'Michael Mealling' <>
Cc: Rob Austein <>,
Message-id: <2626548.1025471624@KLENSIN-TP>
MIME-version: 1.0
X-Mailer: Mulberry/2.2.1 (Win32)
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: QUOTED-PRINTABLE
Content-disposition: inline
References: <17823168.1025345475@localhost>
List-Owner: <>
List-Post: <>
List-Subscribe: <>, <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Help: <>, <>
List-Id: <>

--On Saturday, June 29, 2002 10:11 AM +0200 Patrik Fältström
<> wrote:

> --On 2002-06-28 10.24 -0700 Nicolas Popp <>
> wrote:
>> As soon as you do fuzzy matching that forces you to retrieve
>> multiple records and rank them, the operational complexity is
>> increased ten-fold (and your query response time becomes way
>> more inpredictable unless you do a few "right things").
> Doing fuzzy-matching is most efficiently done by doing a
> calculation of a hash on the search string (something like
> soundex) and then exact mathing in the database.
> So, fuzzy-matching is for me just another version of
> "preparation" of the search string.


In a number of areas, matching by distance function --i.e.,
knowing all of the things that might match and determining which
one(s) are closest-- has turned out to be much more useful than
matching on a canonical form.  In one of the classic examples,
the first-generation theory of how to do OCR was to try to
standardize ("prepare" in your terminology) characters,
font-independent, down to a common abstraction.  Nice idea, but
it basically didn't work.  Instead, we now assume (with English)
that a given character has to match one of 62, and make a
tentative decision based on similarity functions.  Then we
repeat the process, looking up word-candidates in a dictionary
to see which candidates can be excluded because they are
uncommon in, or absent from, the language.

Sonex/ soundex matches are fuzzy matching, but they are not
fuzzy search; I think that fuzzy search is going to be needed

So, I hope you are right -- it would be a lot easier.  But...