Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt

Barry Leiba <barryleiba.mailing.lists@gmail.com> Mon, 23 August 2010 18:57 UTC

Return-Path: <barryleiba.mailing.lists@gmail.com>
X-Original-To: morg@core3.amsl.com
Delivered-To: morg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0B7F83A6A95 for <morg@core3.amsl.com>; Mon, 23 Aug 2010 11:57:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.101
X-Spam-Level:
X-Spam-Status: No, score=-2.101 tagged_above=-999 required=5 tests=[AWL=-0.991, BAYES_05=-1.11]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gz5uA6YjRMNx for <morg@core3.amsl.com>; Mon, 23 Aug 2010 11:57:49 -0700 (PDT)
Received: from mail-yw0-f44.google.com (mail-yw0-f44.google.com [209.85.213.44]) by core3.amsl.com (Postfix) with ESMTP id E730B3A68A7 for <morg@ietf.org>; Mon, 23 Aug 2010 11:57:48 -0700 (PDT)
Received: by ywi4 with SMTP id 4so2513936ywi.31 for <morg@ietf.org>; Mon, 23 Aug 2010 11:58:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=8CjlxU8beTap3XJ+c/14u6POHiecu6eM1SfAzx3FNq8=; b=khe2AnDBCjsexKnZTeTOxk1/0h6J0FI2SMfSnU/y68w2/XvgeCnW5kfpaie5JtFtZ3 v5lE0e1oXmYEOpKzWq75PyP+TNPVjf6KMl8F30P3MmKHK2+HRSrAlf4MNWLhTVAB2fDR PAHw2UtxbwROQtgUp2leZfeTsvYFAJkMPxT0o=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; b=unMS1NeuaC5RgFjReQ1cNT0lVjj7S0fx/JblHSz6oKYHXA+B2MdqG+rHQ8fh0eviUm OstPjsivGMEg2x3pGpolTYFx8uLQDXkzntQBI3hj1x16cEALZeoIMRwKyHT8hEYrWCGY 9drZq7fVuusCuX3FWLco6lyDhszMLpjGWB0Yk=
MIME-Version: 1.0
Received: by 10.151.62.5 with SMTP id p5mr5884672ybk.55.1282589902035; Mon, 23 Aug 2010 11:58:22 -0700 (PDT)
Received: by 10.42.5.137 with HTTP; Mon, 23 Aug 2010 11:58:21 -0700 (PDT)
In-Reply-To: <1282328966.6489.20.camel@kurkku.sapo.corppt.com>
References: <4C5021F0.5020002@isode.com> <AANLkTik3ayOVth5v5gVowi8ybtj=k99n=evgt7YZQYzw@mail.gmail.com> <4C5BDDD3.10405@isode.com> <1282328966.6489.20.camel@kurkku.sapo.corppt.com>
Date: Mon, 23 Aug 2010 13:58:21 -0500
Message-ID: <AANLkTi=L+xekVz67V4v84-7Gf-x4MGjAvXmAJFnxPw_k@mail.gmail.com>
From: Barry Leiba <barryleiba.mailing.lists@gmail.com>
To: Timo Sirainen <tss@iki.fi>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: morg@ietf.org
Subject: Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt
X-BeenThere: morg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: barryleiba@computer.org
List-Id: Messaging Organization <morg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/morg>
List-Post: <mailto:morg@ietf.org>
List-Help: <mailto:morg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Aug 2010 18:57:50 -0000

> Mentioning buffer overflows sound like they could be a bit too
> implementation specific.

I agree.  Moreover, I don't think it's the place of the IETF to tell
programmers how to program, and buffer overflow bugs are well enough
known that programmers should already know that they need to watch for
them.  I really don't think this opens things up for any more or
different buffer overflow bugs than any other spec does.

> Implementation of this extension might enable a denial-of-service attack
> if the implementation isn't careful to prevent them. Fuzzy search
> engines are often complex with non-obvious disk space, memory and/or CPU
> usage patterns. Implementors should test at least the behavior of large
> messages that contain very long words and/or unique random strings. Also
> very long search keys might cause excessive memory or CPU usage.
>
> Invalid input may also be problematic. For example if the search engine
> takes UTF-8 stream as input, it might fail more or less badly when
> illegal UTF-8 sequences are fed to it from a message whose character set
> was claimed to be UTF-8. This could be avoided by validating all the
> input and replacing illegal UTF-8 sequences with the Unicode replacement
> character (U+FFFD).
>
> Search relevancy rankings might be susceptible to "poisoning" by smart
> attackers using certain keywords or hidden markup (e.g. HTML) in their
> messages to boost the rankings. This can't be fully prevented by
> servers, so clients should prepare for it by at least allowing user to
> see all the  search results, rather than hide results below a certain
> score.

I'm happy with that text.

Barry