Re: [MORG] Review: draft-ietf-morg-fuzzy-search-02

Barry Leiba <barryleiba.mailing.lists@gmail.com> Fri, 30 July 2010 11:53 UTC

Return-Path: <barryleiba.mailing.lists@gmail.com>
X-Original-To: morg@core3.amsl.com
Delivered-To: morg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3BF753A692B for <morg@core3.amsl.com>; Fri, 30 Jul 2010 04:53:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.811
X-Spam-Level:
X-Spam-Status: No, score=-2.811 tagged_above=-999 required=5 tests=[AWL=-0.212, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n1soGYoTCrfK for <morg@core3.amsl.com>; Fri, 30 Jul 2010 04:53:16 -0700 (PDT)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by core3.amsl.com (Postfix) with ESMTP id BF7593A68E8 for <morg@ietf.org>; Fri, 30 Jul 2010 04:53:15 -0700 (PDT)
Received: by gyg8 with SMTP id 8so682269gyg.31 for <morg@ietf.org>; Fri, 30 Jul 2010 04:53:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=F6hpIJ0g3jFWqbatxaDPKI+FRcfoTYB8k6r/Nhv2Ow4=; b=m1z4gMMzQRlnix3SYapTat5UJ/mtXUQu2aUtRkxeSh7LiUVOcXgOpWc65dxPp8IAFJ LUN4pcm2d0J/4UxbBDYKzkpmwTPWKZjow9VpmOhbK9lYkIf/Kadr4oEOXIobjyG1qd/y BHrHCn8pwA0DOrlTIWMOO6mY19i3a+489/Kgo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=njrA5Bglab0OkLxrUit5DPax8llTAABbKxIaXVfCLMyJyNPVvELznGdhFjHvyAbPEg fccBG6azZCwq09N1zIF41SRm2mJh7Y9UmiSbWw/BjDtm/u+J3bCjLucZ0bNLrOQhJH89 Dh926Ka4GjcgFmP0V6RaZY5oiKAJVHFe0p7dI=
MIME-Version: 1.0
Received: by 10.90.101.12 with SMTP id y12mr2099664agb.116.1280490819604; Fri, 30 Jul 2010 04:53:39 -0700 (PDT)
Received: by 10.42.1.136 with HTTP; Fri, 30 Jul 2010 04:53:39 -0700 (PDT)
In-Reply-To: <A199F09978CEADEC3697D0DA@dhcp-63f1.meeting.ietf.org>
References: <A199F09978CEADEC3697D0DA@dhcp-63f1.meeting.ietf.org>
Date: Fri, 30 Jul 2010 07:53:39 -0400
Message-ID: <AANLkTikpxj788+WaFstG_-9+Uu4rmuhzQSfFqUnR_psg@mail.gmail.com>
From: Barry Leiba <barryleiba.mailing.lists@gmail.com>
To: Cyrus Daboo <cyrus@daboo.name>
Content-Type: text/plain; charset="ISO-8859-1"
Cc: morg@ietf.org
Subject: Re: [MORG] Review: draft-ietf-morg-fuzzy-search-02
X-BeenThere: morg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: barryleiba@computer.org
List-Id: Messaging Organization <morg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/morg>
List-Post: <mailto:morg@ietf.org>
List-Help: <mailto:morg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 11:53:17 -0000

> 4) Given RFC5255's comparator stuff, is the active comparator expected to
> have any influence on the fuzzy behavior? I am guess not, but it might be
> worth a mention.

I suppose it would be possible for ascii-numeric to have an effect.
< fuzzy header "special-field" "50" > would have different fuzz,
depending upon whether you used ascii-casemap or ascii-numeric.  On
the other hand, I imagine that any fuzzy string search would ignore
case, so specifying ascii-casemap would be unnecessary.

But we already say the server can do anything it wants, so do we
really need to mention this?  I suppose we could add something to that
paragraph, like "including ignoring the comparator".

> 6) Do clients need to care that the FUZZY search key is not distributive,
> e.g.:
>
> FUZZY OR SUBJECT x SUBJECT y != OR FUZZY SUBJECT x FUZZY SUBJECT y

I disagree with your example; I think they are the same, because fuzzy
applies to "OR", so I think it applies to the whole OR clause.

> Isn't the typical client use case that it wants just the top
> few rankings? If so, we need to specify how the PARTIAL search result code
> (defined in RFC5267) interacts with RELEVANCY. i.e.:
>
> A02 SEARCH RETURN (RELEVANCY PARTIAL 1:10) FUZZY TEXT "Helo"
>
> means return the top 10 relevant results. So the combination of RELEVANCY
> and PARTIAL would require the server to order the result set before applying
> the partial range matching.

I'm not sure I like that requirement.  Maybe you have to use SORT if
you want that.  What happens in the general case of PARTIAL?  Might a
client that wants the *newest* 10 results be unhappy if it got the
OLDEST 10?

Barry