Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

Bryan Ford <brynosaurus@gmail.com> Sat, 07 November 2015 01:46 UTC

Content-Type: multipart/signed; boundary="Apple-Mail=_7DF1BB2D-A879-4D92-82E9-EBBCB882AE05"; protocol="application/pkcs7-signature"; micalg="sha1"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Bryan Ford <brynosaurus@gmail.com>
In-Reply-To: <CAM_a8Jy-ZoGJ3qTgN5PFA2ZKnbtSy5GWhWhUeF2NHYgWUQ0zYA@mail.gmail.com>
Date: Sat, 07 Nov 2015 10:46:34 +0900
Message-Id: <3A98EA92-0C2F-46A7-8D06-880FC83CB110@gmail.com>
References: <87twp91d8r.fsf@alice.fifthhorseman.net> <CAM_a8Jy-ZoGJ3qTgN5PFA2ZKnbtSy5GWhWhUeF2NHYgWUQ0zYA@mail.gmail.com>
To: openpgp@ietf.org, "cfrg@irtf.org" <cfrg@irtf.org>
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/hg2zzgFxmKBM9dZbB3yvkkOJq90>
Cc: Zooko Wilcox-OHearn <zooko@leastauthority.com>, Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Subject: Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?
Precedence: list

To return to this thread - DKG brought up one important potential functionality goal for the next OpenPGP message format (streaming-mode integrity protection); then the thread diverged into a different and I think orthogonal - though equally interesting - potential functionality goal (namely random-access capability via Merkle trees as in Tahoe-LAFS).

I included a slide on this topic in my OpenPGP WG presentation (https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing <https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing> slide 10) and was hoping to solicit discussion but there wasn’t time, so perhaps we can continue here?

To be clear, there are two separate use-cases, each of which make sense without the other and require different technical solutions (but could also make sense together):

1. Streaming-mode integrity protection: We want to make sure OpenPGP can be used Unix filter-style on both encryption and decryption sides, to process arbitrarily large files (e.g., huge backup tarballs), while satisfying the following joint requirements:

(a) Ensure that neither the encryptor nor decryptor ever has to buffer the entire stream in memory or any other intermediate storage.
(b) Ensure that the decryptor integrity-checks everything it decrypts BEFORE passing it onto the next pipeline stage (e.g., un-tar).

2. Random-access: Once a potentially-huge OpenPGP-encrypted file has been written to some random-access-capable medium, allow a reader to decrypt and integrity-check parts of that encrypted file without (re-)processing the whole thing: i.e., support integrity-protected random-access reads.

Let’s call these goals #1 and #2, respectively.

Achieving either goal will require dividing encrypted files into chunks of some kind, but the exact metadata these chunks need to have will vary depending on which goal we want to achieve (or both).

To achieve goal #1 properly, it appears that what we need is not only a MAC per chunk but a signature per chunk. If the encryptor only signs a single aggregate MAC at the end, then the decryptor needs to process its input all the way to that signature at the end before it can be certain that any (even the first) bytes of the decrypted data are valid. If the encryptor produces a Merkle tree at the end and signs its root as in Tahoe-LAFS (e.g., in pursuit of goal #2), the decryptor still needs to read to the end of its input before being able to integrity-check anything, and hence still fails to achieve goal #1.

To achieve goal #2, the Merkle tree approach used in Tahoe-LAFS seems well-suited. But it assumes/requires the decryptor has a complete, random-access seekable copy of the encrypted file sitting somewhere, which is certainly a reasonable assumption in the context of a file system like Tahoe-LAFS, but violates requirement (a) of goal #1 and in particular may not hold when OpenPGP is used to filter-decrypt from unseekable media or a network stream to an un-tar type process.

We could very well design an OpenPGP format that addresses both goals together, if we decide both goals are valuable. The encryptor would need to build its Merkle tree incrementally - without initially knowing its total size - and after encrypting and outputting each chunk, generate and sign a new Merkle tree root after each chunk. We wouldn’t necessarily have to store an entirely new Merkle tree after each chunk, only a log2(N)-size list of hashes comprising the “update path” from the new chunk back to the root, reusing the un-updated parts of the Merkle tree already stored in earlier parts of the encrypted file. (But this will require that earlier Merkle trees be “incrementally updatable”, which might violate the properties of some of the Merkle tree proposals suggested elsewhere in other threads.)

There are some obvious tradeoffs here, both in storage and complexity costs. I’m not that worried about the storage efficiency costs, because even the costs of a log2(N)-size list of hashes per chunk can be amortized fairly effectively by making the chunks moderately large, e.g., 65KB or even 1MB. There’s also the compute cost of performing a public-key signature rather than just a MAC per chunk, but that can be similarly amortized by using relatively large chunks. On the other hand, large chunks might not be ideal for embedded/constrained devices. And the implementation-complexity is certainly an issue regardless.

So some questions about this:

1. How important is the ability to achieve goal #1 above in the OpenPGP format (streaming-mode integrity-checking)?

2. How important is the ability to achieve goal #2 above in the OpenPGP format (random-access integrity-checking)?

3. For whichever goal(s) we wish to be able to achieve, should those be *mandatory* or *optional* in the format? That is, should *every* OpenPGPv5-encrypted file satisfy either or both of these goals, or should they be configurable or user-selectable (such that some encrypted files might contain per-chunk signatures and/or Merkle trees while others do not)? Making either of these goals “supported but optional” might help mitigate any performance/storage cost concerns with either of them, but would only further increase the complexity of the overall OpenPGP spec and increase the “usability risk” of a user accidentally failing to enable a relevant option when he really should have (e.g., streaming-mode protection for backups).

4. What are reasonable upper- and lower-bounds for chunk sizes, and what are the considerations behind them?

Thanks
Bryan

Attachment: smime.p7s

[openpgp] streamable AEAD construct for stored da… Daniel Kahn Gillmor
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Watson Ladd
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Daniel Kahn Gillmor
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Watson Ladd
Re: [openpgp] streamable AEAD construct for store… vedaal
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Zooko Wilcox-OHearn
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Natanael
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Natanael
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Taylor R Campbell
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Andy Lutomirski
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Adam Langley
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Andy Lutomirski
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Björn Edström
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Andrey Jivsov
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Peter Todd
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Andy Lutomirski
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Peter Todd
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Andy Lutomirski
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Peter Todd
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Bryan Ford
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Andy Lutomirski
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Bryan Ford
Re: [openpgp] [Cfrg] streamable AEAD construct fo… ianG
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Nils Durner
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Bryan Ford
Re: [openpgp] [Cfrg] streamable AEAD construct fo… Bryan Ford

Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

Attachment: smime.p7s