Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

Bryan Ford <brynosaurus@gmail.com> Sat, 07 November 2015 01:46 UTC

Return-Path: <brynosaurus@gmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B6C011A6FAC for <openpgp@ietfa.amsl.com>; Fri, 6 Nov 2015 17:46:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=unavailable
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JffX1Iwiknex for <openpgp@ietfa.amsl.com>; Fri, 6 Nov 2015 17:46:43 -0800 (PST)
Received: from mail-pa0-x22e.google.com (mail-pa0-x22e.google.com [IPv6:2607:f8b0:400e:c03::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B32D1A6F9A for <openpgp@ietf.org>; Fri, 6 Nov 2015 17:46:43 -0800 (PST)
Received: by pabfh17 with SMTP id fh17so139086603pab.0 for <openpgp@ietf.org>; Fri, 06 Nov 2015 17:46:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=mVwr8IO+IhwOh1lDN/ur96fqw4/X/VqFfIEwbLdNE4Q=; b=Vrc5G34TIFzb3YowkesoWujqLdyoKEgX8VVXAs5FHWyFNdlod16w+z4aWJRiWpjbKN k8IhgIl521w3xHaQg/7C3kFJ54uPyPHXN47w3C3iVNQLAOzjOznKBGYYaOALr6Y03ToX DWxgxV6/n5ep6jyO0/gTMdJ+5XXfCciZ552jhlTb2uubgxbIlGBEWMqQFeV4uQz80EPS PoHORT08sfCHh8qxoanVgniuVaVyXe/bUDRqQOMMbVj0nT37ppDVg4ABtjs7g4p58DbP IKdrdTjKLnrxC9tWPGjjxdD9jRmcAbwtzf82gmekB3hPyTqDw6aUWm5BxaSkka+4+dz1 U/sQ==
X-Received: by 10.68.233.233 with SMTP id tz9mr21723869pbc.15.1446860803136; Fri, 06 Nov 2015 17:46:43 -0800 (PST)
Received: from [192.168.11.11] (i223-218-148-140.s41.a014.ap.plala.or.jp. [223.218.148.140]) by smtp.gmail.com with ESMTPSA id ko11sm183461pbd.51.2015.11.06.17.46.39 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 06 Nov 2015 17:46:40 -0800 (PST)
Content-Type: multipart/signed; boundary="Apple-Mail=_7DF1BB2D-A879-4D92-82E9-EBBCB882AE05"; protocol="application/pkcs7-signature"; micalg="sha1"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Bryan Ford <brynosaurus@gmail.com>
In-Reply-To: <CAM_a8Jy-ZoGJ3qTgN5PFA2ZKnbtSy5GWhWhUeF2NHYgWUQ0zYA@mail.gmail.com>
Date: Sat, 07 Nov 2015 10:46:34 +0900
Message-Id: <3A98EA92-0C2F-46A7-8D06-880FC83CB110@gmail.com>
References: <87twp91d8r.fsf@alice.fifthhorseman.net> <CAM_a8Jy-ZoGJ3qTgN5PFA2ZKnbtSy5GWhWhUeF2NHYgWUQ0zYA@mail.gmail.com>
To: openpgp@ietf.org, "cfrg@irtf.org" <cfrg@irtf.org>
X-Mailer: Apple Mail (2.2104)
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/hg2zzgFxmKBM9dZbB3yvkkOJq90>
Cc: Zooko Wilcox-OHearn <zooko@leastauthority.com>, Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Subject: Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Nov 2015 01:46:46 -0000

To return to this thread - DKG brought up one important potential functionality goal for the next OpenPGP message format (streaming-mode integrity protection); then the thread diverged into a different and I think orthogonal - though equally interesting - potential functionality goal (namely random-access capability via Merkle trees as in Tahoe-LAFS).

I included a slide on this topic in my OpenPGP WG presentation (https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing <https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing> slide 10) and was hoping to solicit discussion but there wasn’t time, so perhaps we can continue here?

To be clear, there are two separate use-cases, each of which make sense without the other and require different technical solutions (but could also make sense together):

1. Streaming-mode integrity protection:  We want to make sure OpenPGP can be used Unix filter-style on both encryption and decryption sides, to process arbitrarily large files (e.g., huge backup tarballs), while satisfying the following joint requirements:

	(a) Ensure that neither the encryptor nor decryptor ever has to buffer the entire stream in memory or any other intermediate storage.
	(b) Ensure that the decryptor integrity-checks everything it decrypts BEFORE passing it onto the next pipeline stage (e.g., un-tar).

2. Random-access: Once a potentially-huge OpenPGP-encrypted file has been written to some random-access-capable medium, allow a reader to decrypt and integrity-check parts of that encrypted file without (re-)processing the whole thing: i.e., support integrity-protected random-access reads.

Let’s call these goals #1 and #2, respectively.

Achieving either goal will require dividing encrypted files into chunks of some kind, but the exact metadata these chunks need to have will vary depending on which goal we want to achieve (or both).

To achieve goal #1 properly, it appears that what we need is not only a MAC per chunk but a signature per chunk.  If the encryptor only signs a single aggregate MAC at the end, then the decryptor needs to process its input all the way to that signature at the end before it can be certain that any (even the first) bytes of the decrypted data are valid.  If the encryptor produces a Merkle tree at the end and signs its root as in Tahoe-LAFS (e.g., in pursuit of goal #2), the decryptor still needs to read to the end of its input before being able to integrity-check anything, and hence still fails to achieve goal #1.

To achieve goal #2, the Merkle tree approach used in Tahoe-LAFS seems well-suited.  But it assumes/requires the decryptor has a complete, random-access seekable copy of the encrypted file sitting somewhere, which is certainly a reasonable assumption in the context of a file system like Tahoe-LAFS, but violates requirement (a) of goal #1 and in particular may not hold when OpenPGP is used to filter-decrypt from unseekable media or a network stream to an un-tar type process.

We could very well design an OpenPGP format that addresses both goals together, if we decide both goals are valuable.  The encryptor would need to build its Merkle tree incrementally - without initially knowing its total size - and after encrypting and outputting each chunk, generate and sign a new Merkle tree root after each chunk.  We wouldn’t necessarily have to store an entirely new Merkle tree after each chunk, only a log2(N)-size list of hashes comprising the “update path” from the new chunk back to the root, reusing the un-updated parts of the Merkle tree already stored in earlier parts of the encrypted file.  (But this will require that earlier Merkle trees be “incrementally updatable”, which might violate the properties of some of the Merkle tree proposals suggested elsewhere in other threads.)

There are some obvious tradeoffs here, both in storage and complexity costs.  I’m not that worried about the storage efficiency costs, because even the costs of a log2(N)-size list of hashes per chunk can be amortized fairly effectively by making the chunks moderately large, e.g., 65KB or even 1MB.  There’s also the compute cost of performing a public-key signature rather than just a MAC per chunk, but that can be similarly amortized by using relatively large chunks.  On the other hand, large chunks might not be ideal for embedded/constrained devices.  And the implementation-complexity is certainly an issue regardless.

So some questions about this:

1. How important is the ability to achieve goal #1 above in the OpenPGP format (streaming-mode integrity-checking)?

2. How important is the ability to achieve goal #2 above in the OpenPGP format (random-access integrity-checking)? 

3. For whichever goal(s) we wish to be able to achieve, should those be *mandatory* or *optional* in the format?  That is, should *every* OpenPGPv5-encrypted file satisfy either or both of these goals, or should they be configurable or user-selectable (such that some encrypted files might contain per-chunk signatures and/or Merkle trees while others do not)?  Making either of these goals “supported but optional” might help mitigate any performance/storage cost concerns with either of them, but would only further increase the complexity of the overall OpenPGP spec and increase the “usability risk” of a user accidentally failing to enable a relevant option when he really should have (e.g., streaming-mode protection for backups).

4. What are reasonable upper- and lower-bounds for chunk sizes, and what are the considerations behind them?  

Thanks
Bryan