[openpgp] streamable AEAD construct for stored data?

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Fri, 30 October 2015 10:01 UTC

Return-Path: <dkg@fifthhorseman.net>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 333771B2A37 for <openpgp@ietfa.amsl.com>; Fri, 30 Oct 2015 03:01:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.542
X-Spam-Level: *
X-Spam-Status: No, score=1.542 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DATE_IN_PAST_06_12=1.543] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fC02QTC88O4N for <openpgp@ietfa.amsl.com>; Fri, 30 Oct 2015 03:01:01 -0700 (PDT)
Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108]) by ietfa.amsl.com (Postfix) with ESMTP id 80A831B2A45 for <openpgp@ietf.org>; Fri, 30 Oct 2015 03:01:01 -0700 (PDT)
Received: from fifthhorseman.net (y125068.ppp.asahi-net.or.jp [118.243.125.68]) by che.mayfirst.org (Postfix) with ESMTPSA id 96570F98C; Fri, 30 Oct 2015 06:00:52 -0400 (EDT)
Received: by fifthhorseman.net (Postfix, from userid 1000) id A456E20161; Thu, 29 Oct 2015 19:11:48 -0400 (EDT)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: cfrg@irtf.org, openpgp@ietf.org
User-Agent: Notmuch/0.20.2 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu)
Date: Fri, 30 Oct 2015 08:11:48 +0900
Message-ID: <87twp91d8r.fsf@alice.fifthhorseman.net>
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/9-97IksogqDYdLZ7OYW3f8wnM-0>
Subject: [openpgp] streamable AEAD construct for stored data?
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Oct 2015 10:01:04 -0000

Hi CFRG folks--

We're looking into fixing the OpenPGP symmetrically-encrypted data
formats for RFC4880bis.  The structures are used for mail messages but
also for large file encryption.  It's clear that the OpenPGP CFB mode
isn't designed to modern symmetric encryption standards, so we're hoping
to introduce a better approach.

We need, among other things to address integrity protection in a more
meaningful way than the current OpenPGP MDC (modification detection
code), which is basically a SHA-1 hash of the cleartext.  This was never
much better than a band-aid.  And as discussed in the recent "OpenPGP
SEIP downgrade attack" thread, an "integrity-protected" packet with an
MDC can be stripped down to produce a syntactically-valid packet without
integrity protection.

But one of our constraints is the OpenPGP use case that streams
decrypted data, like this:

 pgp --decrypt <backup.pgp | tar x

It's unlikely that this use case will go away.

With the MDC approach, or even with a full-packet AEAD approach, the
decryption command either has to (a) buffer all data before producing
output, or to (b) risk producing intermediate output that it later
discovers is not integrity-protected.

A better approach would be a streamable/chunked AEAD approach -- this
would allow the decrypting process to release integrity-checked chunks
as it goes.

AGL describes this problem here:

 https://www.imperialviolet.org/2014/06/27/streamingencryption.html

and he roughly outlines a generic construction of a composable/chunkable
approach using AEAD:

> Ideally such a scheme would take an AEAD and produce something very
> like an AEAD in that it takes a key, nonce and additional data, but
> can safely work in a streaming fashion. I don't think it need be very
> complex: take 64 bits of the nonce from the underlying AEAD as the
> chunk number, always start with chunk number zero and feed the
> additional data into chunk zero with a zero byte prefix. Prefix each
> chunk ciphertext with a 16 bit length and set the MSB to indicate the
> last chunk and authenticate that indication by setting the additional
> data to a single, 1 byte. The major worry might be that for many
> underlying AEADs, taking 64 bits of the nonce for the chunk counter
> leaves one with very little (or none!) left.

Two examples of projects that take something like this approach are
miniLock and Tahoe-LAFS:

 https://github.com/kaepora/miniLock
 https://tahoe-lafs.org/trac/tahoe-lafs/browser/docs/specifications/file-encoding.rst

I'm unaware of any formalization of this approach, though.  Does anyone
know of one?  If OpenPGP were to adopt AGL's construct, are there
specific risks to be aware of?

This approach still has two notable problems i can see, which may or may
not be addressable (but if they are, i'd love to hear it):

 a) it doesn't deal with truncation -- the initially-streamed data has
    already been streamed by the time a truncation is discovered.
    (there may be no way to fix this; it seems kind of like a fact of
    nature, and if so, systems should only do streaming decryption if
    they're capable of coping with truncation)

 b) it doesn't seem to compose as well with asymmetric signatures as one
    might like: a signature over the whole material can't itself be
    verified until one full pass through the data; and a signature over
    just the symmetric key would prove nothing, since anyone getting the
    symmetric key could forge an arbitrary valid, decryptable stream.
    Is there an intermediate approach that would combine an asymmetric
    signature with a chunkable authenticated encryption such that a
    decryptor could stream one pass and be certain of its origin (at
    least up until truncation, if (a) can't be resolved)?

Thoughts, pointers, or suggestions would be much appreciated.

         --dkg