Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

Natanael <natanael.l@gmail.com> Fri, 30 October 2015 12:25 UTC

Return-Path: <natanael.l@gmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2F9071A89A8 for <openpgp@ietfa.amsl.com>; Fri, 30 Oct 2015 05:25:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.599
X-Spam-Level:
X-Spam-Status: No, score=-0.599 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=unavailable
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eEgJl2alelxD for <openpgp@ietfa.amsl.com>; Fri, 30 Oct 2015 05:25:05 -0700 (PDT)
Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com [IPv6:2a00:1450:400c:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5377D1A898B for <openpgp@ietf.org>; Fri, 30 Oct 2015 05:25:05 -0700 (PDT)
Received: by wijp11 with SMTP id p11so9901749wij.0 for <openpgp@ietf.org>; Fri, 30 Oct 2015 05:25:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=RveZKYEv5CKPecK/6xC9HzTAJYDUp/2vHFt0vTvpURA=; b=Bysr6hPEoIwFDQEQQJzjn2fTxCINyTEJiqdhmICmJ+tDBx7dLPracE1/XctdR/uYMe iutafD9cXctgHQ0KPi/n7m7CiQeNpxXGpeOOlSkUKxic8sUHkhlpXGP+MgFVUk3EGDPw esjJRFDMlG8d7r+nIZP0abcPCfOkgfoV7Rq5MgY87mRwt8TrwYZOXZZZtYYqKcvV352i qGlJwDtJvhQVxWJw36rlRmxUktU5D/nZgfgS6dbciLqrxFnxTCxTgSZGqmhM74IBafmT 0AzXK6RpCbpvbgIfXImfRNK0+FxpaUApj1bGQNAEcjxoSi9ltgCKhr9uyAvRQYvzSlxv 32fA==
MIME-Version: 1.0
X-Received: by 10.194.89.166 with SMTP id bp6mr9008067wjb.96.1446207903905; Fri, 30 Oct 2015 05:25:03 -0700 (PDT)
Received: by 10.194.175.33 with HTTP; Fri, 30 Oct 2015 05:25:03 -0700 (PDT)
Received: by 10.194.175.33 with HTTP; Fri, 30 Oct 2015 05:25:03 -0700 (PDT)
In-Reply-To: <87twp91d8r.fsf@alice.fifthhorseman.net>
References: <87twp91d8r.fsf@alice.fifthhorseman.net>
Date: Fri, 30 Oct 2015 13:25:03 +0100
Message-ID: <CAAt2M19s-grQ9No9kAhBDCKKTw6oj9wHdiU5XuGjLO5Y8B87dA@mail.gmail.com>
From: Natanael <natanael.l@gmail.com>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Content-Type: multipart/alternative; boundary="089e011604526245e2052351848d"
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/fHySyiEID2UA1L9b0jtQ8rC_ZHg>
X-Mailman-Approved-At: Sun, 01 Nov 2015 07:51:58 -0800
Cc: openpgp@ietf.org, cfrg@irtf.org
Subject: Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Oct 2015 12:25:08 -0000

Den 30 okt 2015 11:01 skrev "Daniel Kahn Gillmor" <dkg@fifthhorseman.net>:
>
> Hi CFRG folks--
>
> We're looking into fixing the OpenPGP symmetrically-encrypted data
> formats for RFC4880bis.  The structures are used for mail messages but
> also for large file encryption.  It's clear that the OpenPGP CFB mode
> isn't designed to modern symmetric encryption standards, so we're hoping
> to introduce a better approach.
>
> We need, among other things to address integrity protection in a more
> meaningful way than the current OpenPGP MDC (modification detection
> code), which is basically a SHA-1 hash of the cleartext.  This was never
> much better than a band-aid.  And as discussed in the recent "OpenPGP
> SEIP downgrade attack" thread, an "integrity-protected" packet with an
> MDC can be stripped down to produce a syntactically-valid packet without
> integrity protection.
>
> But one of our constraints is the OpenPGP use case that streams
> decrypted data, like this:

[...]

> This approach still has two notable problems i can see, which may or may
> not be addressable (but if they are, i'd love to hear it):
>
>  a) it doesn't deal with truncation -- the initially-streamed data has
>     already been streamed by the time a truncation is discovered.
>     (there may be no way to fix this; it seems kind of like a fact of
>     nature, and if so, systems should only do streaming decryption if
>     they're capable of coping with truncation)

Does it help to define the ciphertext length and check that first, before
decrypting? Doesn't help if the file isn't local and the connection is
broken, but then at least your software should detect that and halt of
necessary.

>  b) it doesn't seem to compose as well with asymmetric signatures as one
>     might like: a signature over the whole material can't itself be
>     verified until one full pass through the data; and a signature over
>     just the symmetric key would prove nothing, since anyone getting the
>     symmetric key could forge an arbitrary valid, decryptable stream.
>     Is there an intermediate approach that would combine an asymmetric
>     signature with a chunkable authenticated encryption such that a
>     decryptor could stream one pass and be certain of its origin (at
>     least up until truncation, if (a) can't be resolved)?
>
> Thoughts, pointers, or suggestions would be much appreciated.

To solve B what you need to do is something like signing a list of
ciphertext hashes/authentication tags.

One thought I've had before (my idea is to use it for FDE*) is to for
example use HMAC over segments (including counters) or to extract AEAD tags
(prefixed with counters to preserve order) and create a Merkle tree hash of
those lists when creating the message, to then sign that Merkle tree, such
that when you decrypt and recreate the tags for comparison you can confirm
that nothing has been modified (with the level of assurance that the tags
can provide). Or you skip the standard AEAD and MAC constructs and use a
signed Merkle tree hash of the ciphertext itself as your own custom MAC.

One benefit of using a hash-tree like algorithm with a signature is the
reduction of storage overhead and memory usage, and that you retain the
ability to independently verify each segment. Ideally you would use a
hash-tree like algorithm which also can be generated efficiently in a
single pass over even large ciphertexts with reasonable memory usage (does
anybody know of one?).

Not that I've also read the referenced Tahoe-LAFS link, it looks like
they're doing something very close to what I described above, but slightly
different:
They use AES-CTR over the whole file with a unique key, one plain hash over
the entire ciphertext, one Merkle tree hash over the ciphertext blocks and
one Merkle tree hash over the erasure coded shares of the blocks, if I'm
reading it correctly, with all three hashes stored in plaintext with the
shares and then hashed together (also including the length of the file).
They also have some sort of hash chain, but the graphics don't load and I
can't figure out how exactly it is applied, beyond potentially having to do
with confirming the order of blocks. Instead of using HMAC they use double
SHA256 in a particular format.

Minus the erasure coding** and with an added signature of the file
hashes/header, that's almost exactly what I imagined, I'm just worried
about the risk of the performance penalty limiting adoption. If the
performance of this method is considered acceptable or can be improved to
an acceptable level, I definitely support using it.

* A bit off topic here, but for FDE I imagine changing the encryption key
every write-session using a KDF and session counter, keeping an
authenticated encrypted list of which segments uses which session write
keys. This way you prevent partial ciphertext reversal and prevent
detection of when ciphertext segments repeat over time (re-zeroized or
restored files). You can also arbitrarily re-encrypt random segments to
obscure your real write patterns (and reduce the list size occasionally by
purging unused keys after re-encrypting the last segments using them).

** In Tahoe-LAFS, erasure coding of blocks is used to allow you to split
files across storage nodes with minimal risk of data loss. That's not
applicable here, as it can be applied independently when considered
necessary.