[Jmap] recommendations for gigantic mailboxes

Eric Wong <e@80x24.org> Thu, 25 June 2020 09:03 UTC

Return-Path: <e@80x24.org>
X-Original-To: jmap@ietfa.amsl.com
Delivered-To: jmap@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 73E093A0889 for <jmap@ietfa.amsl.com>; Thu, 25 Jun 2020 02:03:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qK1RWaKwlVLm for <jmap@ietfa.amsl.com>; Thu, 25 Jun 2020 02:03:37 -0700 (PDT)
Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 775253A0882 for <jmap@ietf.org>; Thu, 25 Jun 2020 02:03:37 -0700 (PDT)
Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 059F21F5AE; Thu, 25 Jun 2020 09:03:37 +0000 (UTC)
Date: Thu, 25 Jun 2020 09:03:36 +0000
From: Eric Wong <e@80x24.org>
To: jmap@ietf.org
Message-ID: <20200625090336.GA32628@dcvr>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/jmap/QsnI1ZLl4pACDK9IpqOqO5U7nJ4>
Subject: [Jmap] recommendations for gigantic mailboxes
X-BeenThere: jmap@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: JSON Message Access Protocol <jmap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jmap>, <mailto:jmap-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jmap/>
List-Post: <mailto:jmap@ietf.org>
List-Help: <mailto:jmap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jmap>, <mailto:jmap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jun 2020 09:03:39 -0000

Hi all, I'm considering implementing JMAP in public-inbox[1].

TL;DR: would it be reasonable to expose 10 million messages in a
single JMAP mailbox?  It can already be done for an NNTP newsgroup :>


public-inbox started with NNTP and HTTP support, but I recently
started implementing an IMAP server and it seems fine.  It's
mainly tested with mutt, isync|mbsync, offlineimap, and Gnus.

Whereas newsreaders typically have a rolling window where
messages expire; MUAs and IMAP <=> Maildir syncing tools tend to
hold messages forever.  That's slow when mailboxes get into the
order of 100K...

MUAs and Maildirs on traditional filesystems fall down with 400K
messages in the git@vger.kernel.org mailing list, so right now
it's broken into 9 segments capped at 50K messages, each:

  imaps://public-inbox.org/INBOX.comp.version-control.git.0
  ..
  imaps://public-inbox.org/INBOX.comp.version-control.git.8

400K is nothing, even.  LKML has 3.6 million and grows around
30K a month[2]:

  https://lore.kernel.org/lkml/
  nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel

So, JMAP is fresh ground; so I wonder if having a single mailbox
on the order of 10 million messages is reasonable for JMAP clients.
NNTP clients have no problem handling a newsgroup that large.

Or if I should go route I did with IMAP and partition mailboxes
virtually into 50K or so.  It doesn't affect the underlying
storage at all.  I wasn't really happy to do that for IMAP, but
the prevalance and limitations of existing MUAs had to be taken
into account (and I still work on a Centrino laptop from 2005,
sometimes).

Thanks for reading!


[1] https://public-inbox.org/README.html
    Archives mail in git; indexes using Xapian + SQLite.

[2] LKML required designing the v2 public-inbox format
    back in 2018 for scalability:
    https://public-inbox.org/public-inbox-v2-format.html
    I've tested it up to 10M or so before running out of space;
    but I expect it to do more in ~10 years (if civilization
    survives that long!)