[Extra] IMAP4rev2 body search
Arnt Gulbrandsen <arnt@gulbrandsen.priv.no> Fri, 17 January 2020 14:56 UTC
Return-Path: <arnt@gulbrandsen.priv.no>
X-Original-To: extra@ietfa.amsl.com
Delivered-To: extra@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EBD49120077 for <extra@ietfa.amsl.com>; Fri, 17 Jan 2020 06:56:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gulbrandsen.priv.no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gfQ-hdN5xYOB for <extra@ietfa.amsl.com>; Fri, 17 Jan 2020 06:56:10 -0800 (PST)
Received: from stabil.gulbrandsen.priv.no (stabil.gulbrandsen.priv.no [IPv6:2a01:4f8:191:91a8::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 72CC4120074 for <extra@ietf.org>; Fri, 17 Jan 2020 06:56:10 -0800 (PST)
Received: from stabil.gulbrandsen.priv.no (stabil.gulbrandsen.priv.no [IPv6:2a01:4f8:191:91a8::3]) by stabil.gulbrandsen.priv.no (Postfix) with ESMTP id 377DFC0074; Fri, 17 Jan 2020 15:00:25 +0000 (GMT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gulbrandsen.priv.no; s=mail; t=1579273225; bh=2BqufpiBVr8Yr1sHizA3cxOUFJVM8kTXmMojq39vQUU=; h=From:To:Subject:Date:From; b=VsXh+4RfedRB0sXQUZxtD3KXrc811DGeRk4Sy9j7ghRSg9IxONsfnpG+HFsaTqzdt Yf36xxoUIDT7e7xshD7K5GVlhNRERkSSG4vA9JX2brMG8is+/miag7LUbwSYkC9m6s eF0l1mbBHKElMG8CuWvGH6qwlKLyCkWx11lfwULA=
Received: from arnt@gulbrandsen.priv.no by stabil.gulbrandsen.priv.no (Archiveopteryx 3.2.0) with esmtpsa id 1579273224-27478-27476/9/59; Fri, 17 Jan 2020 15:00:24 +0000
From: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
To: extra@ietf.org
Date: Fri, 17 Jan 2020 15:56:06 +0100
Mime-Version: 1.0
Message-Id: <9d918580-70be-4e9b-90ef-372523afe359@gulbrandsen.priv.no>
User-Agent: Trojita/0.7; Qt/5.7.1; xcb; Linux; Devuan GNU/Linux 2.1 (ascii)
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/extra/_D3clWBT8Q33zamPdIRNUxuGRUM>
Subject: [Extra] IMAP4rev2 body search
X-BeenThere: extra@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Email mailstore and eXtensions To Revise or Amend <extra.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/extra>, <mailto:extra-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/extra/>
List-Post: <mailto:extra@ietf.org>
List-Help: <mailto:extra-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/extra>, <mailto:extra-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Jan 2020 14:56:18 -0000
Hi, a discussion the other day^Wweek jars in my mind. I think we should underspecify the BODY search key more clearly. All it now says is "matches if contains", which is IMO correct (it matches the running code) but underspecified and vague. I think we ought to eliminate the vagueness, and propose the following explicit underspecification: --- Messages that contain the specified string in the body of the message. The server SHOULD decdode the content-transfer-encoding, so that a message matches independent of its content-transfer-encoding. Apart from that rule, this specification explicitly allows much server behaviour that has been common in IMAP4rev1 servers, including: Most servers interpret "contains" on a character level, ie. "BODY range" matches a message that contains the word "orange", but some servers interpret it on a token or word level, ie. "BODY range" does not match a message that contains "orange", because "orange" is one token. It may however match a message that contains "ranges", if the server uses stemming (and perhaps language detection). Some servers search only in text/plain. Others also search in other types, for example Microsoft Word and PDF attachments. Some servers search HTML on a source level ("BODY range" does not match ra­nge), others search HTML as normally displayed ("BODY range" matches ra­nge). Most servers interpret the messages and search terms independent of encoding, such that a message that uses charset=iso-8859-8 may match a search term that uses UTF-8. However, not even this is required. Rationale: IMAP4rev1 servers vary in the kind and quality of their search implementation. This document chooses to avoid adding requirements on the business logic in such servers. --- (I'm open to changing any of this. IMO any variant is good, and good is better than best. I'd prefer that quarrelsome people can't beat each other over the head with the RFC, that's all.) Arnt
- [Extra] IMAP4rev2 body search Arnt Gulbrandsen
- Re: [Extra] IMAP4rev2 body search Brandon Long
- Re: [Extra] IMAP4rev2 body search Arnt Gulbrandsen
- Re: [Extra] IMAP4rev2 body search Brandon Long