[Jmap] the large email (attachment) problem

Jamey Sharp <jamey@minilop.net> Sat, 31 July 2021 01:06 UTC

Date: Fri, 30 Jul 2021 18:06:25 -0700
From: Jamey Sharp <jamey@minilop.net>
To: jmap@ietf.org
Message-ID: <CAJi=jadiwMGvLXG7nK93Ht=TzN-QdmAqyE4Nf7UnGG47f3zSwg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/jmap/K4UL5clgmOx9_c9Gw_HhUVBHQcA>
Subject: [Jmap] the large email (attachment) problem
Precedence: list

I've been thinking about Bron's "large email problem" since the meeting 
on Monday. I know this isn't really in scope for JMAP WG but given that 
it came up here I'm going to share some thoughts here anyway…

I think the requirements on any solution are:
- preserve whatever level of confidentiality, integrity, and 
  authenticity are provided by attaching a file as a MIME part;
- enable mail agents to apply content filtering and retention policies 
  as if the file were attached as a MIME part;
- and, at the sender's discretion, allow recipients to access the 
  attachment even if they don't implement this solution.

On the other hand, I think it must not impose any particular file 
storage or lookup mechanism. In particular, we can't tell people to just 
send the hash of the content after making it available in some specific 
DHT. For example, I like IPFS as a content addressed store, but 
mandating its use would be a mistake. Giving senders the option to 
choose systems like IPFS is important, though, I think.


A sketch of a solution that I think meets all of these requirements 
looks like this:

In the text (plain or HTML) parts of the email, large attachments are 
still referenced by a URL provided by the sender, as has become common 
practice today. However, we add new attachment-like parts which serve 
several purposes: to indicate that a given URL should be treated like an 
attachment at the recipient, to provide metadata hints, and to provide 
alternate URLs for retrieving the attachment.

Alternative URLs can be provided so that the default URL exposed to 
unaware recipients can be an HTTP URL while more capable recipients can 
use e.g. an ipfs:// URL. Each source URL MAY be accompanied by an 
encryption key which can be used to decrypt the representation from that 
source. If populating that source exposes either the URL or the contents 
more broadly than the contents of the email, then that source SHOULD 
include an encryption key. (This extends any confidentiality protection 
applied to the email to cover the attachment as well.)

Metadata hints include content type, length, and disposition, which 
SHOULD be provided. These apply to the attachment regardless of which 
source the recipient retrieves it from. In addition, a cryptographically 
secure hash of the attachment contents MAY be provided. If any of the 
sources does not already bind the URL to the content immutably, then a 
hash SHOULD be provided. (This extends any integrity protection applied 
to the email to cover the attachment as well, and similarly for any 
authenticity protection.)


So for example, Gmail could generate a Drive link for the unencrypted 
file, but also pin an encrypted copy of the file into IPFS and provide 
the key and IPFS URL as an alternative source. A recipient that doesn't 
understand the attachment information would use the Drive link when the 
user asks for it, as usual. But a mail delivery agent supporting this 
extension could fetch from either source, apply content filtering, store 
a copy of the attachment for as long as the email is retained, and 
provide the local copy to the recipient instead of the original link.

The recommendation that either every source URL is immutable or the hash 
of the content is provided means that a mail delivery agent doesn't need 
to fetch the attachment again if it sees a second email with the same 
source (assuming the hash of the copy it already has matches the new 
expected one). For messages with many recipients this may have both 
privacy and performance advantages.

On the other hand, a sender which violates the immutability 
recommendation may cause recipients to see attachments from different 
emails, possibly with different recipients, which could violate 
confidentiality expectations. Should it be upgraded to MUST, or is this 
just a note for the security considerations section?

As an alternative to IPFS that this design enables, I could envision 
major email providers jointly operating a DHT for looking up those 
encrypted attachments which any of them have copies of. Given an HTTP 
interface to that DHT that takes a content ID and returns a redirect to 
one of the email providers' storage platforms, all of the participating 
providers could generate alternate links for their attachments using 
URLs to that interface. (Note that the content IDs here need to be a 
hash over the encrypted contents, not the plaintext.)


I'm curious what people think of this approach and I hope it turns out 
to be helpful in conversations with DISPATCH.

Jamey

[Jmap] the large email (attachment) problem Jamey Sharp
Re: [Jmap] the large email (attachment) problem John Levine
Re: [Jmap] the large email (attachment) problem Jamey Sharp
Re: [Jmap] the large email (attachment) problem John R Levine
Re: [Jmap] the large email (attachment) problem Neil Jenkins
Re: [Jmap] the large email (attachment) problem Bron Gondwana
Re: [Jmap] the large email (attachment) problem Richard Clayton
Re: [Jmap] the large email (attachment) problem John Levine
Re: [Jmap] the large email (attachment) problem Neil Jenkins
Re: [Jmap] the large email (attachment) problem John R Levine