[Jmap] the large email (attachment) problem

Jamey Sharp <jamey@minilop.net> Sat, 31 July 2021 01:06 UTC

Return-Path: <jamey@minilop.net>
X-Original-To: jmap@ietfa.amsl.com
Delivered-To: jmap@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 102173A1BB4 for <jmap@ietfa.amsl.com>; Fri, 30 Jul 2021 18:06:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=minilop.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LQq2WO-TMhrG for <jmap@ietfa.amsl.com>; Fri, 30 Jul 2021 18:06:29 -0700 (PDT)
Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1757D3A1BB3 for <jmap@ietf.org>; Fri, 30 Jul 2021 18:06:28 -0700 (PDT)
Received: by mail-pj1-x1031.google.com with SMTP id g23-20020a17090a5797b02901765d605e14so16765764pji.5 for <jmap@ietf.org>; Fri, 30 Jul 2021 18:06:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=minilop.net; s=google; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding; bh=Nvg3UGZeaOx7YBxGoDihKbsrZ3aRtB0OWP+kphB+v+I=; b=fnmQ0nc6ryzwrO6ENGbHfLIqr87raEjqRBbWBBmRMlgfe/xmyYMrNzXaCTmDh+kmJP XFcRTPG5qu0c4kE0JKl0ooFPDXYZ9xSMTjCs7XMa4k7s9s4LobXboYdLO2X7u+QRHBLt sW1Xc6jjxhAnV68R+D/jpWAxOAF3G6r7jCEY4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding; bh=Nvg3UGZeaOx7YBxGoDihKbsrZ3aRtB0OWP+kphB+v+I=; b=dWj4wG42w8tgv506isU0lotL3OuYbYZ8puZZoRLos7n09XTCK+dfNZGOutZDAz+BxB 69iG6T/abNopGzvpUavUKf7cEatI1QkL3rOPUMmc9hUARFzYhtPh8kSGuIWOKAyg3KdE XotOUkcv87PuIcKm5t6XAmvy12Z0cxWDzUjj+KbKO6xRm4pN9YyXKqEzd8H+MXtKWS4F wfjh+XW/emZnDCcUXKaeFTTvGhqgG1Z2ytxDDHOkkMThZd6hV6PmbIMPGpNGyRIDpYsg BxzIAmwipsi58J4W64kiSCt6UvNAGTgotP9W3DyB98jIcQMe1b51HEMDJsg6a+BACPph 4v+g==
X-Gm-Message-State: AOAM5332Tecnnl658vdwjqQeo5Sqk7356a21AULzLQAYiZ6aOCTWSK0B SSW6PfBAf60JvCvO6wjLHSqH/5APk82ddT5E
X-Google-Smtp-Source: ABdhPJzAoY39bzcntABaL4OUu87ty0BekqC/oorb6qHNaniUW61L0aYPUpniiapE9Z1W+hFL1WGD0w==
X-Received: by 2002:a05:6a00:1786:b029:32c:c315:7348 with SMTP id s6-20020a056a001786b029032cc3157348mr5555372pfg.42.1627693587082; Fri, 30 Jul 2021 18:06:27 -0700 (PDT)
Received: from eh (63-230-166-62.ptld.qwest.net. [63.230.166.62]) by smtp.gmail.com with ESMTPSA id n123sm4238257pga.69.2021.07.30.18.06.25 for <jmap@ietf.org> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 18:06:26 -0700 (PDT)
Received: by eh (sSMTP sendmail emulation); Fri, 30 Jul 2021 18:06:25 -0700
Date: Fri, 30 Jul 2021 18:06:25 -0700
From: Jamey Sharp <jamey@minilop.net>
To: jmap@ietf.org
Message-ID: <CAJi=jadiwMGvLXG7nK93Ht=TzN-QdmAqyE4Nf7UnGG47f3zSwg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-TUID: XejWAivLySco
Archived-At: <https://mailarchive.ietf.org/arch/msg/jmap/K4UL5clgmOx9_c9Gw_HhUVBHQcA>
Subject: [Jmap] the large email (attachment) problem
X-BeenThere: jmap@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: JSON Message Access Protocol <jmap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jmap>, <mailto:jmap-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jmap/>
List-Post: <mailto:jmap@ietf.org>
List-Help: <mailto:jmap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jmap>, <mailto:jmap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2021 01:06:34 -0000

I've been thinking about Bron's "large email problem" since the meeting 
on Monday. I know this isn't really in scope for JMAP WG but given that 
it came up here I'm going to share some thoughts here anyway…

I think the requirements on any solution are:
- preserve whatever level of confidentiality, integrity, and 
  authenticity are provided by attaching a file as a MIME part;
- enable mail agents to apply content filtering and retention policies 
  as if the file were attached as a MIME part;
- and, at the sender's discretion, allow recipients to access the 
  attachment even if they don't implement this solution.

On the other hand, I think it must not impose any particular file 
storage or lookup mechanism. In particular, we can't tell people to just 
send the hash of the content after making it available in some specific 
DHT. For example, I like IPFS as a content addressed store, but 
mandating its use would be a mistake. Giving senders the option to 
choose systems like IPFS is important, though, I think.


A sketch of a solution that I think meets all of these requirements 
looks like this:

In the text (plain or HTML) parts of the email, large attachments are 
still referenced by a URL provided by the sender, as has become common 
practice today. However, we add new attachment-like parts which serve 
several purposes: to indicate that a given URL should be treated like an 
attachment at the recipient, to provide metadata hints, and to provide 
alternate URLs for retrieving the attachment.

Alternative URLs can be provided so that the default URL exposed to 
unaware recipients can be an HTTP URL while more capable recipients can 
use e.g. an ipfs:// URL. Each source URL MAY be accompanied by an 
encryption key which can be used to decrypt the representation from that 
source. If populating that source exposes either the URL or the contents 
more broadly than the contents of the email, then that source SHOULD 
include an encryption key. (This extends any confidentiality protection 
applied to the email to cover the attachment as well.)

Metadata hints include content type, length, and disposition, which 
SHOULD be provided. These apply to the attachment regardless of which 
source the recipient retrieves it from. In addition, a cryptographically 
secure hash of the attachment contents MAY be provided. If any of the 
sources does not already bind the URL to the content immutably, then a 
hash SHOULD be provided. (This extends any integrity protection applied 
to the email to cover the attachment as well, and similarly for any 
authenticity protection.)


So for example, Gmail could generate a Drive link for the unencrypted 
file, but also pin an encrypted copy of the file into IPFS and provide 
the key and IPFS URL as an alternative source. A recipient that doesn't 
understand the attachment information would use the Drive link when the 
user asks for it, as usual. But a mail delivery agent supporting this 
extension could fetch from either source, apply content filtering, store 
a copy of the attachment for as long as the email is retained, and 
provide the local copy to the recipient instead of the original link.

The recommendation that either every source URL is immutable or the hash 
of the content is provided means that a mail delivery agent doesn't need 
to fetch the attachment again if it sees a second email with the same 
source (assuming the hash of the copy it already has matches the new 
expected one). For messages with many recipients this may have both 
privacy and performance advantages.

On the other hand, a sender which violates the immutability 
recommendation may cause recipients to see attachments from different 
emails, possibly with different recipients, which could violate 
confidentiality expectations. Should it be upgraded to MUST, or is this 
just a note for the security considerations section?

As an alternative to IPFS that this design enables, I could envision 
major email providers jointly operating a DHT for looking up those 
encrypted attachments which any of them have copies of. Given an HTTP 
interface to that DHT that takes a content ID and returns a redirect to 
one of the email providers' storage platforms, all of the participating 
providers could generate alternate links for their attachments using 
URLs to that interface. (Note that the content IDs here need to be a 
hash over the encrypted contents, not the plaintext.)


I'm curious what people think of this approach and I hope it turns out 
to be helpful in conversations with DISPATCH.

Jamey