Re: [Jmap] Hacker News Discussion - getMailboxes single data set vs tree primitives or paging

Mike Cardwell <jmap@lists.grepular.com> Sun, 07 May 2017 12:39 UTC

Return-Path: <jmap@lists.grepular.com>
X-Original-To: jmap@ietfa.amsl.com
Delivered-To: jmap@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 23F6C1200CF for <jmap@ietfa.amsl.com>; Sun, 7 May 2017 05:39:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.309
X-Spam-Level:
X-Spam-Status: No, score=-1.309 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FAKE_REPLY_C=1.486, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_IADB_DK=-0.095, RCVD_IN_IADB_LISTED=-0.001, RCVD_IN_IADB_RDNS=-0.235, RCVD_IN_IADB_SENDERID=-0.001, RCVD_IN_IADB_SPF=-0.059, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=lists.grepular.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rCLQjwK_Jb_N for <jmap@ietfa.amsl.com>; Sun, 7 May 2017 05:39:54 -0700 (PDT)
Received: from frank.grepular.com (frank.grepular.com [164.132.228.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15FEF1242F7 for <jmap@ietf.org>; Sun, 7 May 2017 05:39:54 -0700 (PDT)
Received: by snake.grepular.com (Postfix, from userid 1000) id 3BD841081444; Sun, 7 May 2017 13:39:12 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=lists.grepular.com; s=glue2; t=1494160791; bh=nHOiyclKE+Hlnhr8qDOoha68XKUnU6Uezzm6a2D7FvA=; h=Date:From:To:Subject:In-Reply-To:From; b=28Fc+NtZfuekWwnyyiCfYvECwPbuvdFBgILRq73Yr2pcvHR015YCde1XUd68mqlAo 9TRKtqArLKbRHE4CtjeJx/CvR6C178/UEXnoJ5ThSMvYg3xoyluADlHANwNiJ6iyyG 3uisd0YuZR+szqgm2cnNv0L3ffJaf8zrnmyc2qxk=
X-Hashcash: 1:26:170507:jmap@ietf.org::5kTtCQanAy0QH0nN:nINEk
Date: Sun, 07 May 2017 13:39:12 +0100
From: Mike Cardwell <jmap@lists.grepular.com>
To: jmap@ietf.org
Message-ID: <20170507123912.GA22520@snake.grepular.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="TB36FDmn/VVEgNH/"
Content-Disposition: inline
In-Reply-To: <1494155213.3081934.968406608.5F9E8767@webmail.messagingengine.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: <https://mailarchive.ietf.org/arch/msg/jmap/0vRDv5XSRwpU6isqkCo079eCHMI>
Subject: Re: [Jmap] Hacker News Discussion - getMailboxes single data set vs tree primitives or paging
X-BeenThere: jmap@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: JSON Message Access Protocol <jmap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jmap>, <mailto:jmap-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jmap/>
List-Post: <mailto:jmap@ietf.org>
List-Help: <mailto:jmap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jmap>, <mailto:jmap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 May 2017 12:42:05 -0000

> https://news.ycombinator.com/item?id=14284526
> 
> It's a rather silly discussion that went off onto a tangent, that's HN for
> you.

I am the person who is responsible for that silly discussion and also ironically
the person who said he wasn't invested in the success of JMAP enough to bring it
to this very list. Well, here I am.

> Me:
>  A quick look at the protocol looks like in order to get a list of mailboxes,
>  you have to ask for them all. In IMAP, you can say, show me top level
>  mailboxes, show me direct child mailboxes of mailbox x, etc. In a previous
>  role, I was responsible for working on a mail archiving solution which had a
>  large hierarchy of folders (into the thousands IIRC) in a single account.
>  Access to the archive was via IMAP. Looks to me that JMAP here would require
>  the client to potentially fetch megabytes of data just to start showing a
>  list of folders, where the solution we had with IMAP probably took a couple
>  of kilobytes at most to get started.

> and the followup:

> Bron:
>  Of course nothing stops you storing thousands of mailboxes in a
>  non-heirarchical layout and making IMAP have large LIST fetches too.

> Me:
>  Yes. I would issue the LIST command, and as the IMAP server started
>  streaming the results to the client, I would immediately process each mailbox
>  as soon as it was down, rather than waiting for them all to be retrieved. I
>  fear that with a JSON based protocol, most server and client implementations
>  wouldn't deal with streaming (although they technically could), and would just
>  deal with completed blobs.

> I think it is worth discussing the tradeoffs in terms of additional complexity
> to offer paging, tree filtering (parentIds: [...]) or both in the standard.
> The pro side, you can access potentially millions of mailboxes without needing
> to know all the folder ids up front.  On the con side, it's more complexity for
> the server (client can always just fetch everything anyway if it doesn't care)
> - and most clients seem to fetch the entire tree anyway, so it just means more
> round trips for them.

I know it's from a different time, but in the RFC for IMAP implementation
recommendations, it says the following:

https://tools.ietf.org/html/rfc2683

   Some servers present Usenet newsgroups to IMAP users.  Newsgroups,
   and other such hierarchical mailbox structures, can be very numerous
   but may have only a few entries at the top level of hierarchy.  Also,
   some servers are built against mail stores that can, unbeknownst to
   the server, have circular hierarchies - that is, it's possible for
   "a/b/c/d" to resolve to the same file structure as "a", which would
   then mean that "a/b/c/d/b" is the same as "a/b", and the hierarchy
   will never end.  The LIST response in this case will be unlimited.

   Clients that will have trouble with this are those that use

       C: 001 LIST "" *

   to determine the mailbox list.  Because of this, clients should not
   use an unqualified "*" that way in the LIST command.  A safer
   approach is to list each level of hierarchy individually, allowing
   the user to traverse the tree one limb at a time

IMAP can essentially work with folders of infinite depth when clients
don't request the full list of folders up front. It can also work
with slow backends. What if you had a giant mail store backed by
Amazon S3? Your data could be spread over multiple buckets. Why wait
for all buckets to be retrieved before you start sending data to
the client? IMAP also lets you present a dynamically generated folder
structure where each new level is calculated on demand. Perhaps
someone will want to stick a dynamic JMAP frontend in front of the
https://www.mail-archive.com/ website where the folder structure is
"List Name/Post Year/Post month". That would be a lot of folders.

I posit that allowing you to request a complete list of all folders in
one go was a mistake in IMAP, and one that shouldn't be repeated in
JMAP.

-- 
Mike Cardwell  https://www.grepular.com
OpenPGP Key    DF70 D8E5 FBD6 8519 9257  C44C 0DA6 8B1E 1801 A332
XMPP OTR Key   8924 B06A 7917 AAF3 DBB1  BF1B 295C 3C78 3EF1 46B4