Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt

Alexey Melnikov <alexey.melnikov@isode.com> Tue, 24 August 2010 09:50 UTC

Return-Path: <alexey.melnikov@isode.com>
X-Original-To: morg@core3.amsl.com
Delivered-To: morg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id F05AA3A67CC for <morg@core3.amsl.com>; Tue, 24 Aug 2010 02:50:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.581
X-Spam-Level:
X-Spam-Status: No, score=-102.581 tagged_above=-999 required=5 tests=[AWL=0.018, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G29Gjwsf19iI for <morg@core3.amsl.com>; Tue, 24 Aug 2010 02:50:52 -0700 (PDT)
Received: from rufus.isode.com (rufus.isode.com [62.3.217.251]) by core3.amsl.com (Postfix) with ESMTP id 6D7B73A6838 for <morg@ietf.org>; Tue, 24 Aug 2010 02:50:52 -0700 (PDT)
Received: from [172.16.2.150] (shiny.isode.com [62.3.217.250]) by rufus.isode.com (submission channel) via TCP with ESMTPA id <THOWGwBIEC6Q@rufus.isode.com>; Tue, 24 Aug 2010 10:51:24 +0100
Message-ID: <4C739609.5020101@isode.com>
Date: Tue, 24 Aug 2010 10:51:05 +0100
From: Alexey Melnikov <alexey.melnikov@isode.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915
X-Accept-Language: en-us, en
To: barryleiba@computer.org, Timo Sirainen <tss@iki.fi>
References: <4C5021F0.5020002@isode.com> <AANLkTik3ayOVth5v5gVowi8ybtj=k99n=evgt7YZQYzw@mail.gmail.com> <4C5BDDD3.10405@isode.com> <1282328966.6489.20.camel@kurkku.sapo.corppt.com> <AANLkTi=L+xekVz67V4v84-7Gf-x4MGjAvXmAJFnxPw_k@mail.gmail.com>
In-Reply-To: <AANLkTi=L+xekVz67V4v84-7Gf-x4MGjAvXmAJFnxPw_k@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: morg@ietf.org
Subject: Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt
X-BeenThere: morg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Messaging Organization <morg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/morg>
List-Post: <mailto:morg@ietf.org>
List-Help: <mailto:morg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2010 09:50:54 -0000

Barry Leiba wrote:

>>Mentioning buffer overflows sound like they could be a bit too
>>implementation specific.
>>    
>>
>I agree.  Moreover, I don't think it's the place of the IETF to tell
>programmers how to program, and buffer overflow bugs are well enough
>known that programmers should already know that they need to watch for
>them.
>
One would think so. I actually think that educating people is a good 
thing. However I am not going to insist on discussing this.

>I really don't think this opens things up for any more or
>different buffer overflow bugs than any other spec does.
>  
>
That part I agree with.

>>Implementation of this extension might enable a denial-of-service attack
>>if the implementation isn't careful to prevent them. Fuzzy search
>>engines are often complex with non-obvious disk space, memory and/or CPU
>>usage patterns. Implementors should test at least the behavior of large
>>messages that contain very long words and/or unique random strings. Also
>>very long search keys might cause excessive memory or CPU usage.
>>
>>Invalid input may also be problematic. For example if the search engine
>>takes UTF-8 stream as input, it might fail more or less badly when
>>illegal UTF-8 sequences are fed to it from a message whose character set
>>was claimed to be UTF-8. This could be avoided by validating all the
>>input and replacing illegal UTF-8 sequences with the Unicode replacement
>>character (U+FFFD).
>>
>>Search relevancy rankings might be susceptible to "poisoning" by smart
>>attackers using certain keywords or hidden markup (e.g. HTML) in their
>>messages to boost the rankings. This can't be fully prevented by
>>servers, so clients should prepare for it by at least allowing user to
>>see all the  search results, rather than hide results below a certain
>>score.
>>    
>>
>I'm happy with that text.
>
I am too. I am not convinced that mentioning Unicode replacement 
character (U+FFFD) is necessary, but I suppose it is Ok.