Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt

Alexey Melnikov <alexey.melnikov@isode.com> Thu, 26 August 2010 09:54 UTC

Return-Path: <alexey.melnikov@isode.com>
X-Original-To: morg@core3.amsl.com
Delivered-To: morg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8481D3A684A for <morg@core3.amsl.com>; Thu, 26 Aug 2010 02:54:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.286
X-Spam-Level:
X-Spam-Status: No, score=-102.286 tagged_above=-999 required=5 tests=[AWL=-0.287, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8X7Dhe7Gak4H for <morg@core3.amsl.com>; Thu, 26 Aug 2010 02:54:46 -0700 (PDT)
Received: from rufus.isode.com (rufus.isode.com [62.3.217.251]) by core3.amsl.com (Postfix) with ESMTP id A1A4B3A6359 for <morg@ietf.org>; Thu, 26 Aug 2010 02:54:46 -0700 (PDT)
Received: from [172.16.2.105] (shiny.isode.com [62.3.217.250]) by rufus.isode.com (submission channel) via TCP with ESMTPA id <THY6AwBIEIps@rufus.isode.com>; Thu, 26 Aug 2010 10:55:17 +0100
Message-ID: <4C7639F9.6090106@isode.com>
Date: Thu, 26 Aug 2010 10:55:05 +0100
From: Alexey Melnikov <alexey.melnikov@isode.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915
X-Accept-Language: en-us, en
To: Timo Sirainen <tss@iki.fi>
References: <4C5021F0.5020002@isode.com> <AANLkTik3ayOVth5v5gVowi8ybtj=k99n=evgt7YZQYzw@mail.gmail.com> <4C5BDDD3.10405@isode.com> <1282328966.6489.20.camel@kurkku.sapo.corppt.com> <AANLkTi=L+xekVz67V4v84-7Gf-x4MGjAvXmAJFnxPw_k@mail.gmail.com> <4C739609.5020101@isode.com> <1282758694.6489.372.camel@kurkku.sapo.corppt.com>
In-Reply-To: <1282758694.6489.372.camel@kurkku.sapo.corppt.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: morg@ietf.org, barryleiba@computer.org
Subject: Re: [MORG] Review of draft-ietf-morg-fuzzy-search-02.txt
X-BeenThere: morg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Messaging Organization <morg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/morg>
List-Post: <mailto:morg@ietf.org>
List-Help: <mailto:morg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Aug 2010 09:54:47 -0000

Timo Sirainen wrote:

>On Tue, 2010-08-24 at 10:51 +0100, Alexey Melnikov wrote:
>  
>
>>>>Invalid input may also be problematic. For example if the search engine
>>>>takes UTF-8 stream as input, it might fail more or less badly when
>>>>illegal UTF-8 sequences are fed to it from a message whose character set
>>>>was claimed to be UTF-8. This could be avoided by validating all the
>>>>input and replacing illegal UTF-8 sequences with the Unicode replacement
>>>>character (U+FFFD).
>>>>
>>>>        
>>>>
>>I am too. I am not convinced that mentioning Unicode replacement 
>>character (U+FFFD) is necessary, but I suppose it is Ok.
>>    
>>
>So just "This could be avoided by validating all the input."
>
Yes. Or alternatively something like:

This could be avoided by validating all the
input and, for example, replacing illegal UTF-8 sequences with the Unicode replacement
character (U+FFFD).

I.e. I want to be clear thar replacing with U+FFFD is not the only 
solution.

>would be better? Or just remove the whole sentence.
>  
>