Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

Zooko Wilcox-OHearn <> Sun, 01 November 2015 00:48 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id BED2F1B4A1E for <>; Sat, 31 Oct 2015 17:48:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.278
X-Spam-Status: No, score=-1.278 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id E-D-Rc0o7wME for <>; Sat, 31 Oct 2015 17:48:05 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 296631B4A1B for <>; Sat, 31 Oct 2015 17:48:05 -0700 (PDT)
Received: by igbdj2 with SMTP id dj2so33544490igb.1 for <>; Sat, 31 Oct 2015 17:48:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=d2OyG09I3CvzDZlK36LqKUSKbr6VVMjpuxy55DN0TSE=; b=qkZfExblD8DJPVIS6hwc5HII9X7haQMTb5nqUj8udGgg4DLVGtSQGCkL9RgazOyrte SETAd8IiGA8jKNAFAJ+dWXsA8rH9a+OMrKLzq/vpKaXKMY20FUG2bMVI+dpo3d/o/ZCA v749uNdto3Sf51bM9ZSzyQIIyd2xbk34r21aSHhQc5DiYmG0G7cdNg6f4oCC7WlnU8dh cuFql2oZgOpczvnUowKD02eMo93fdsMkC3XP1OwSaQA8VoGgYbXcfpzKBCqEBPJONfk0 u/SZNMR6EI5WcwzOxl4+iFnPGRae63vIkT+Ow0+oLLGIZgifmh14GQ4rcsTZKOSpXdkT vSlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=d2OyG09I3CvzDZlK36LqKUSKbr6VVMjpuxy55DN0TSE=; b=YFhmTVdswsnWXFzDpLOF5oCwm1szVg6HQ1JmQNN/+VHFSJR2TyXOs+LFn2RO1k7qLJ ZU7O8QjXRk7yIhTJqJA8Kky8X6W09CYFCV8w0w5yYcv6/n415WW+1S3WgVwedWtHjc51 16IR5fcm/nobbcV3SRii+tr0DNQ5RL9bpy5yUwZa/E00OT2s2jJW7/TR5ubjUdWYiC3K PCjRNbcHP/MWXgGIXu7jGMxIvquunlD88z2TlEczf9sCNycZVxavWPOQHCouUTGalgvD JpLou146mUnzuQtymMpUGQlpHBfyM2QG0BY3v2Y0Zw8NlrZHie+Md0Fp90IkRMU785U6 Gw0w==
X-Gm-Message-State: ALoCoQnSTWnrYB/pzKRRYmR22zNqNwP3fUL1+MQnThNaO5XgcaEypltw757sQ0LlqwMS3IsVRaTj
MIME-Version: 1.0
X-Received: by with SMTP id j5mr4954180igh.70.1446338884242; Sat, 31 Oct 2015 17:48:04 -0700 (PDT)
Received: by with HTTP; Sat, 31 Oct 2015 17:48:04 -0700 (PDT)
In-Reply-To: <>
References: <>
Date: Sun, 1 Nov 2015 00:48:04 +0000
Message-ID: <>
From: Zooko Wilcox-OHearn <>
To: Daniel Kahn Gillmor <>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Cc:, "" <>
Subject: Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 01 Nov 2015 00:48:06 -0000

Hi folks:

I'm one of the designers of Tahoe-LAFS.

First of all, I'd like to agree that this is an important issue — we
do need a data integrity protocol with which a reader can verify a
subset of the data. This is especially useful for heads — leading
bytes — of the data, so that when someone executes a command like

curl | gpg --decrypt | tar x

the "gpg" process can operate with limited RAM/storage and can also be
sure that any data it passes to the tar process is authentic, before
it passes that data to the tar process.

In addition, some techniques can also allow authenticated reads of
arbitrary spans of the data. The Merkle Tree approach, such as is used
in Tahoe-LAFS, allows this. Therefore if you're reading a file from
Tahoe-LAFS then if you do something like "fseek(filehandle,
1000000000, 0); fread(buf, 1, 1000000, filehandle);" then you'll get
the one million bytes of data which begins one billion bytes into the
file, but you'll get the assurance from Tahoe-LAFS that those one
million bytes are cryptographically authenticated.

Second, I'd like to emphasize something that dkg pointed out in the
first post — that even if we do this correctly at this layer, then
there is still a risk of truncation attacks. For example, imagine that
someone runs:

curl | gpg --decrypt | sh

Even if gpg can ensure cryptographic integrity of every byte before it
passes that byte to sh, this is vulnerable to potentially damaging
attacks by interrupting the flow of ciphertext to gpg. This is a
problem that can't be solved by the gpg process in this example. It
has to be solved by the next process in the chain — tar in the first
example or sh in the second. But let's remember that it is a real

>  b) it doesn't seem to compose as well with asymmetric signatures as one
>     might like: a signature over the whole material can't itself be
>     verified until one full pass through the data; and a signature over
>     just the symmetric key would prove nothing, since anyone getting the
>     symmetric key could forge an arbitrary valid, decryptable stream.
>     Is there an intermediate approach that would combine an asymmetric
>     signature with a chunkable authenticated encryption such that a
>     decryptor could stream one pass and be certain of its origin (at
>     least up until truncation, if (a) can't be resolved)?

This is a big deal, and an under-appreciated one. A lot of modern
cryptography was developed in the model of a bilateral and synchronous
connection between two parties. In that model, this isn't a problem.
You have a shared secret, and anything that you receive that *you*
didn't send you can assume that the other party sent. (So you have to
prevent replay and reflection attacks, but if you've done so, then
this isn't a problem.)

But in a more asynchronous/persistent model, and in a model with more
than two parties, then you can't rely on that and you need something

The way we do this in Tahoe-LAFS (like Taylor Campbell explained in
this thread), is that the writer generates a Merkle Tree over the data
and transmits, along with each block, the Merkle Branch needed to
authenticate that block.

Now the reason we do it that way in Tahoe-LAFS is that we want to bind
all the bytes of (one version of) the file together. If a reader reads
the first million bytes of a file, and then reads the second million
bytes of the file, you don't want an attacker to have the option of
supplying the first million from one version of the file and the next
million from a different version of the file, without the reader
realizing this, even if both versions of the file were, at some point,
signed by the legitimate writer.

So using the Merkle Tree provides a convenient way to:

* bundle all the bytes of the file into a single crypto value (the
Merkle Tree root) which we can then use as a "stand-in" for the
complete contents of a single version of the file, for authentication

* suffer low worst-case overhead for a read of an arbitrary span of
data (i.e., if you're reading a random span out of the middle of a
file, not just reading from the beginning)

But, we made that decision back before we were willing to rely on ECC,
so our public key digital signatures were big old 2048-bit RSA sigs.
Now that we would be willing to rely on svelte Ed25519 sigs, we might
consider including a pkdigsig with each block instead of a Merkle
Branch with each block. It would require careful engineering about the
identification and versioning of the file, either way.