Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt

Timo Sirainen <tss@iki.fi> Fri, 20 August 2010 18:28 UTC

Return-Path: <tss@iki.fi>
X-Original-To: morg@core3.amsl.com
Delivered-To: morg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 840A33A6B18 for <morg@core3.amsl.com>; Fri, 20 Aug 2010 11:28:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.146
X-Spam-Level:
X-Spam-Status: No, score=-105.146 tagged_above=-999 required=5 tests=[AWL=-0.036, BAYES_05=-1.11, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cd9z9nz90Q3w for <morg@core3.amsl.com>; Fri, 20 Aug 2010 11:28:55 -0700 (PDT)
Received: from dovecot.org (dovecot.org [62.236.108.70]) by core3.amsl.com (Postfix) with ESMTP id 702CA3A67FE for <morg@ietf.org>; Fri, 20 Aug 2010 11:28:55 -0700 (PDT)
Received: from [10.134.132.86] (unknown [194.65.5.235]) by dovecot.org (Postfix) with ESMTP id CDB9DFA8B0F; Fri, 20 Aug 2010 21:29:27 +0300 (EEST)
From: Timo Sirainen <tss@iki.fi>
To: Alexey Melnikov <alexey.melnikov@isode.com>
In-Reply-To: <4C5BDDD3.10405@isode.com>
References: <4C5021F0.5020002@isode.com> <AANLkTik3ayOVth5v5gVowi8ybtj=k99n=evgt7YZQYzw@mail.gmail.com> <4C5BDDD3.10405@isode.com>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 20 Aug 2010 19:29:26 +0100
Message-ID: <1282328966.6489.20.camel@kurkku.sapo.corppt.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.28.3
Content-Transfer-Encoding: 7bit
Cc: morg@ietf.org, barryleiba@computer.org
Subject: Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt
X-BeenThere: morg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Messaging Organization <morg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/morg>
List-Post: <mailto:morg@ietf.org>
List-Help: <mailto:morg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Aug 2010 18:28:56 -0000

On Fri, 2010-08-06 at 12:02 +0200, Alexey Melnikov wrote:

> 1) This might be too implementation specific, but should we point out 
> possible buffer overflows and other nastiness with fuzzy indexing 
> systems that might be used at the backend?

Mentioning buffer overflows sound like they could be a bit too
implementation specific.

> 2) It might be worth mentioning that fuzzy search might cause 
> Deny-of-Service attacks on the IMAP server. Implementations of this 
> extension are likely to consume more disk space, memory and/or CPU.

How about this, also addresses Cyrus's point of poisoning:

Implementation of this extension might enable a denial-of-service attack
if the implementation isn't careful to prevent them. Fuzzy search
engines are often complex with non-obvious disk space, memory and/or CPU
usage patterns. Implementors should test at least the behavior of large
messages that contain very long words and/or unique random strings. Also
very long search keys might cause excessive memory or CPU usage.

Invalid input may also be problematic. For example if the search engine
takes UTF-8 stream as input, it might fail more or less badly when
illegal UTF-8 sequences are fed to it from a message whose character set
was claimed to be UTF-8. This could be avoided by validating all the
input and replacing illegal UTF-8 sequences with the Unicode replacement
character (U+FFFD).

Search relevancy rankings might be susceptible to "poisoning" by smart
attackers using certain keywords or hidden markup (e.g. HTML) in their
messages to boost the rankings. This can't be fully prevented by
servers, so clients should prepare for it by at least allowing user to
see all the  search results, rather than hide results below a certain
score.