Re: [imapext] multisort proposals (was Re: draft-ietf-appsawg-multimailbox-search)

Bron Gondwana <brong@fastmail.fm> Wed, 19 March 2014 02:51 UTC

Return-Path: <brong@fastmail.fm>
X-Original-To: imapext@ietfa.amsl.com
Delivered-To: imapext@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30CFF1A0338 for <imapext@ietfa.amsl.com>; Tue, 18 Mar 2014 19:51:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.3
X-Spam-Level:
X-Spam-Status: No, score=-1.3 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TgnxI46fQcEa for <imapext@ietfa.amsl.com>; Tue, 18 Mar 2014 19:51:06 -0700 (PDT)
Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by ietfa.amsl.com (Postfix) with ESMTP id 0E41D1A0343 for <imapext@ietf.org>; Tue, 18 Mar 2014 19:51:05 -0700 (PDT)
Received: from compute5.internal (compute5.nyi.mail.srv.osa [10.202.2.45]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 1789020C05 for <imapext@ietf.org>; Tue, 18 Mar 2014 22:50:55 -0400 (EDT)
Received: from web6 ([10.202.2.216]) by compute5.internal (MEProxy); Tue, 18 Mar 2014 22:50:55 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.fm; h= message-id:from:to:mime-version:content-transfer-encoding :content-type:subject:date:in-reply-to:references; s=mesmtp; bh= E8TMhx7LmySO2Tcgqpc7pjQSVb4=; b=ZXjkFfSNITq8izhzEML4oRghDcxexTqP 7HndoKpDNLZ3Y0UogMdbY2V0tYwjbuvJcaSoVAwY0nXNEtrggxTpnYhLtJXVZupQ e/juX8UMAr1pEADJpDg7ivute4cegc0eQd32OvJ9MxGBcF8lQmMXkzvNVmXfHG1r V7NMb2RiDk4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:subject:date:in-reply-to :references; s=smtpout; bh=E8TMhx7LmySO2Tcgqpc7pjQSVb4=; b=uJ9xn iCt5sYOHtmySgCTSstFuUp7koQ19MWyySFe+aWpJQu5j/YPbooqlI5iiAh+RFbAi wJZtZ8sFy6tH8fbQqa77nHT1aisq0SksnPEeJ0uK9cXctcZD0cvRqJxebDcI84Yw jNa6w+MlFHy3cOcdVl4KcCbH7YcV1OMh00VOWo=
Received: by web6.nyi.mail.srv.osa (Postfix, from userid 99) id DFD3F2834E3; Tue, 18 Mar 2014 22:50:54 -0400 (EDT)
Message-Id: <1395197454.14596.96179945.074A46ED@webmail.messagingengine.com>
X-Sasl-Enc: l6gNLngvfHm6KIPj3idPAujCWDiWfKKLNuMcjD8VTnOr 1395197454
From: Bron Gondwana <brong@fastmail.fm>
To: imapext@ietf.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
X-Mailer: MessagingEngine.com Webmail Interface - ajax-6e382e84
Date: Wed, 19 Mar 2014 13:50:54 +1100
In-Reply-To: <F3D5C85995CEFDDB8D22F0A6@96B2F16665FF96BAE59E9B90>
References: <CAC4RtVCgLuvpAqEs6Nzz+358dvg4q2YkBNmeq39SYFmnYOLoKw@mail.gmail.com> <1394103473.14059.91258445.054B3082@webmail.messagingengine.com> <06B38F8D7C86FDF50230A07E@caldav.corp.apple.com> <5B508A7BE305AE6FA48093DB@96B2F16665FF96BAE59E9B90> <1394152669.15703.91551029.06D49A6D@webmail.messagingengine.com> <F3D5C85995CEFDDB8D22F0A6@96B2F16665FF96BAE59E9B90>
Archived-At: http://mailarchive.ietf.org/arch/msg/imapext/0XNTSWMSh-u0IeD3WxECS8hG0vI
Subject: Re: [imapext] multisort proposals (was Re: draft-ietf-appsawg-multimailbox-search)
X-BeenThere: imapext@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Discussion of IMAP extensions <imapext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/imapext>, <mailto:imapext-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/imapext/>
List-Post: <mailto:imapext@ietf.org>
List-Help: <mailto:imapext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/imapext>, <mailto:imapext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Mar 2014 02:51:09 -0000

On Tue, Mar 11, 2014, at 07:43 AM, Chris Newman wrote:
> It sounds like IMAP cross-folder sort is a big non-interoperable mess
> right now due to lack of standardization.
>
> Have any of you considered writing a specification of your multisort
> protocol? Or maybe arrange a phone call between proponents of the
> "xconvmultisort" proposal, the "X-MAILBOX, X-REAL-UID" and the "\all"
> proposals to see if consensus on a standard approach is possible?

Apologies for the late reply on this - I've been busy dealing with a
nasty server partial-failure which led to corruption, and the associated
cleanup and recovery.  The "joy" of devops.

> Standards have several benefits for the vendors who participate:
>
> 1. If you have an early implementation, that's a key marketing/selling
>    point.
>
> 2. It gives your customers more choice of the other side when they
>    choose your product. (more client options if you're a server
>    vendor, or vice-versa if you're a client vendor).
>
> 3. It improves your reputation. It shows you care about the future of
>    your customers and the future of the Internet.
>
> 4. It raises the innovation baseline. Once a standard is written and
>    successful, it becomes "common practice" over time. Then there's no
>    more need to proselytize that standard, new innovation can start
>    with that standard as a baseline.
>
> 5. Once a standard is approved, you're much less likely to have to
>    waste time and effort implementing two or more ways of doing the
>    same thing to interoperate with different implementations.
>
> 6. It's one way a smaller vendor can get a leg-up on a bigger vendor
>    who ignores standards.
>
> While private IMAP extensions are fine for experiments, they're an
> interoperability problem when more than one for the same purpose show
> up. Why not take the time to write and progress a protocol
> specification so you and your customers reap the benefits and the
> community's innovation baseline is raised?

We have indeed.  Greg Banks (who doesn't work at FastMail any more, but
wrote the initial conversations support) did some initial work towards a
standards proposal for our conversations.  It was paused mainly because
we wanted to iron out the issues which would become obvious with real-
world usage first.

We have also done some work towards a more general mail access protocol
over HTTP, which is shared at http://jmap.io/

My general guiding principle is that the client should be describing
what it wants as clearly as possible, and the language of the protocol
needs to have the facility for it to do so.

The secondary principle is that the server should never do work that the
client has not explicitly requested.

This means that MSGNO by default is out, because that's an MVCC snapshot
of mailbox data.  It's an ordering that existed at a point in time, and
needs to be maintained by the server for an unbounded length of time.
The client can issue commands every 29 minutes which are chosen to not
allow EXPUNGE notifications - and the server can't forget those messages
until the mailbox is unselected.

Also, UNSEEN and RECENT have costs to calculate which aren't required by
many clients.

One thing that really frustrates me is optimisation hacks encoded in the
standard and guidance.  Things like "do a listing with %, then recurse
into each sub mailbox".  It might be more efficient when you're talking
to a server which is storing the folders in a filesystem and doing the
moral equivalent of 'ls' on each directory.

It's significantly less efficient with Cyrus, where the mailboxes database
is quite efficient to iterate in mailbox order (including sub-folders).
You're basically inspecting the sub-folders anyway - at least until you
can generate the \HasChildren or otherwise.

Generally, I would like the client to be able to say "I want the full
folder tree thanks, but limit it to the first 1000 items".

So my ideal protocol for a X-MULTISEARCH would smell a bit more like SQL
I suspect.  The offset / limit or anchor / limit syntaxes are both very
useful ways of describing exactly what you want to the server.

------

It would also be really great to merge the X-GM-MSGID and similar tricks
that various sites are using to uniquely identify messages across copy/move
operations that were performed by a DIFFERENT client.

APPENDUID is all very nice if you're using the client which did the COPY,
but it falls down if a different client did the copy - you wind up having
to fetch the entire message again if you don't trust the BODYSTRUCTURE and
ENVELOPE to uniquely identify the message enough - and most clients don't
due to the fact that Message-Id isn't 100% reliable in this broken age :(

I think those are the two areas.  I've said before that I'm willing to put
the time in to do this - and you're right - I really should stop talking
and start producing!

------

One final thing that we have found insanely useful in Cyrus replication:
checksums all over the place.  This is not so much useful to a traditional
"partial view" client, but many clients seem to synchronise the entire
contents of a mailbox, and a CRC would have value there.

The issue that's kept me from having time to reply for the past week is
documented here:

http://blog.fastmail.fm/2014/03/13/cleaning-up-from-an-imap-server-failure/

Having the sha1 of the spool files in our logs enabled data recovery in a
way that would have otherwise been impossible on an untrusted filesystem.
We could be sure that the emails we recovered were not corrupted.

Having a strong digest available for every message on the server would also
allow clients to consistency-check their local cache without resyncing all
the data.  I used to think that being able to detect mailbox renames that
you didn't do with this client was important (right now if another client
renames a folder, your client needs to fetch all the messages again for
the same reason as message copies) - but I think re-fetching just the flags/
annotations/digest isn't too expensive if you can re-use the cached message
contents.

Bron.


-- 
  Bron Gondwana
  brong@fastmail.fm