Re: [Extra] IMAP4rev2 body search

Brandon Long <blong@google.com> Wed, 22 January 2020 19:01 UTC

Return-Path: <blong@google.com>
X-Original-To: extra@ietfa.amsl.com
Delivered-To: extra@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F139120802 for <extra@ietfa.amsl.com>; Wed, 22 Jan 2020 11:01:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.5
X-Spam-Level:
X-Spam-Status: No, score=-17.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AlYtsVnp7YZP for <extra@ietfa.amsl.com>; Wed, 22 Jan 2020 11:01:30 -0800 (PST)
Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6CC1D120804 for <extra@ietf.org>; Wed, 22 Jan 2020 11:01:30 -0800 (PST)
Received: by mail-lj1-x22f.google.com with SMTP id j26so188000ljc.12 for <extra@ietf.org>; Wed, 22 Jan 2020 11:01:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RV/dipOV1vLn7INKmrh+FpZUm8ldIhHpAqZaZZKELeo=; b=XBElU+AbiOHdMdgscjlDm0/+4I5/1GjVP4zBFgvQsNYjIQKmpeMTccunZKI04WN0Ke ZQ3IsyIQsGpOSR1toibjuYMUmG3D8SXMRD4aoauDbJOAuGaLH/ZejcWX6zzw+qQc7vEj 6PyHfjCqSqNhFf5oahOqbVgZizs/8gVt8qFpEl1HU6X+uvrzOXJCTy5wQSajrRtqUQXi Udejq9dWgcleZdVW7S2c9HWk6/VPUtBybsxaMUqD8hECP60Eusuqizx2jawbfz2lEoU2 LVA+IqREVZYnocK8Hs4rD8ovJrf3QoovwPbIUpAXGBZi4tMoyfReYDWgmpjlhB6fmB0B /sDg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RV/dipOV1vLn7INKmrh+FpZUm8ldIhHpAqZaZZKELeo=; b=F51d45LGu6enH504z/eEMeqXcMEJh0KBbfe4a7fwpC9vD6SoblPKBciaurr3AcVUN8 2nSz/kc1UOZFGjZfIR+1W4vL2l7C0oNSjsmQllO7aDzabk+scU5wU78Slt5sYQu9qTG2 wnEion/LV1jTxglVyNvjbWVuzV0xU0Ph9ed4tCaTwAzHvrQItc1k5MHBTJSQ8XAomLIP SdwWkrJPvTga4qDrLjXRPI5o7pZ1mQnvxHVUvZSQcSlHUxCO1DtmmwVmlA8V+3R9+tTW TrGru3bKOluuMz6mbULg89Wm+Vk1rDqQUjzarnRuOPzNVVIoQS9Khno4dQiUe6brFjBv vylg==
X-Gm-Message-State: APjAAAW8y57E6UILz9put6gX0XAFih+/TY7paQEN5XJDV0+6LYNGJBGo ceXlUnkle+um577T1IBF0odRtwNUg0dt2jvYvyghkEk=
X-Google-Smtp-Source: APXvYqyLSkkuxtb45d4wHQJvFfNe1RJdnaKcBEAwtRj0Scwldc+vBg9mznJBkXocsIMDqZkkeMHEXtSSwgLumtxs60c=
X-Received: by 2002:a05:651c:327:: with SMTP id b7mr6698186ljp.22.1579719688331; Wed, 22 Jan 2020 11:01:28 -0800 (PST)
MIME-Version: 1.0
References: <9d918580-70be-4e9b-90ef-372523afe359@gulbrandsen.priv.no>
In-Reply-To: <9d918580-70be-4e9b-90ef-372523afe359@gulbrandsen.priv.no>
From: Brandon Long <blong@google.com>
Date: Wed, 22 Jan 2020 11:01:14 -0800
Message-ID: <CABa8R6uyPitB24WYknQg1htC9Zm-Cw5ocx8sNjvCdoQk6k2_zQ@mail.gmail.com>
To: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Cc: extra@ietf.org
Content-Type: multipart/alternative; boundary="000000000000de869f059cbf26c8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/extra/c8bQIvahnDvOgBohTKsmQ2hpQVc>
Subject: Re: [Extra] IMAP4rev2 body search
X-BeenThere: extra@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Email mailstore and eXtensions To Revise or Amend <extra.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/extra>, <mailto:extra-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/extra/>
List-Post: <mailto:extra@ietf.org>
List-Help: <mailto:extra-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/extra>, <mailto:extra-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Jan 2020 19:01:32 -0000

On Fri, Jan 17, 2020 at 6:56 AM Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
wrote:

> Hi,
>
> a discussion the other day^Wweek jars in my mind.
>
> I think we should underspecify the BODY search key more clearly. All it
> now
> says is "matches if contains", which is IMO correct (it matches the
> running
> code) but underspecified and vague. I think we ought to eliminate the
> vagueness, and propose the following explicit underspecification:
>
> ---
>
> Messages that contain the specified string in the body of the message.
>
> The server SHOULD decdode the content-transfer-encoding, so that a message
> matches independent of its content-transfer-encoding. Apart from that
> rule,
> this specification explicitly allows much server behaviour that has been
> common in IMAP4rev1 servers, including:
>
> Most servers interpret "contains" on a character level, ie. "BODY range"
> matches a message that contains the word "orange", but some servers
> interpret it on a token or word level, ie. "BODY range" does not match a
> message that contains "orange", because "orange" is one token. It may
> however match a message that contains "ranges", if the server uses
> stemming
> (and perhaps language detection).
>
> Some servers search only in text/plain. Others also search in other types,
> for example Microsoft Word and PDF attachments.
>
> Some servers search HTML on a source level ("BODY range" does not match
> ra&shy;nge), others search HTML as normally displayed ("BODY range"
> matches
> ra&shy;nge).
>
> Most servers interpret the messages and search terms independent of
> encoding, such that a message that uses charset=iso-8859-8 may match a
> search term that uses UTF-8. However, not even this is required.
>
> Rationale: IMAP4rev1 servers vary in the kind and quality of their search
> implementation. This document chooses to avoid adding requirements on the
> business logic in such servers.
>
> ---
>
> (I'm open to changing any of this. IMO any variant is good, and good is
> better than best. I'd prefer that quarrelsome people can't beat each other
> over the head with the RFC, that's all.)
>

This seems overly wordy, almost more like a bcp or something instead of the
spec itself.

I'm not opposed to it, but I wouldn't call myself an expert on the fine
craft of spec writing either.

Can we give intent itself?  Ie, that there's several ways to implement
search (substring, word based)
and to interpret body (raw text, extracted from formatting, or even ocr)

Brandon

Brandon