Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

On Oct 30, 2015 8:25 AM, "Natanael" <natanael.l@gmail.com> wrote:
>
>
> Den 30 okt 2015 11:01 skrev "Daniel Kahn Gillmor" <dkg@fifthhorseman.net>:
> >
> > Hi CFRG folks--
> >
> > We're looking into fixing the OpenPGP symmetrically-encrypted data
> > formats for RFC4880bis.  The structures are used for mail messages but
> > also for large file encryption.  It's clear that the OpenPGP CFB mode
> > isn't designed to modern symmetric encryption standards, so we're hoping
> > to introduce a better approach.
> >
> > We need, among other things to address integrity protection in a more
> > meaningful way than the current OpenPGP MDC (modification detection
> > code), which is basically a SHA-1 hash of the cleartext.  This was never
> > much better than a band-aid.  And as discussed in the recent "OpenPGP
> > SEIP downgrade attack" thread, an "integrity-protected" packet with an
> > MDC can be stripped down to produce a syntactically-valid packet without
> > integrity protection.
> >
> > But one of our constraints is the OpenPGP use case that streams
> > decrypted data, like this:
>
> [...]
>
> > This approach still has two notable problems i can see, which may or may
> > not be addressable (but if they are, i'd love to hear it):
> >
> >  a) it doesn't deal with truncation -- the initially-streamed data has
> >     already been streamed by the time a truncation is discovered.
> >     (there may be no way to fix this; it seems kind of like a fact of
> >     nature, and if so, systems should only do streaming decryption if
> >     they're capable of coping with truncation)
>
> Does it help to define the ciphertext length and check that first, before
decrypting? Doesn't help if the file isn't local and the connection is
broken, but then at least your software should detect that and halt of
necessary.
>
> >  b) it doesn't seem to compose as well with asymmetric signatures as one
> >     might like: a signature over the whole material can't itself be
> >     verified until one full pass through the data; and a signature over
> >     just the symmetric key would prove nothing, since anyone getting the
> >     symmetric key could forge an arbitrary valid, decryptable stream.
> >     Is there an intermediate approach that would combine an asymmetric
> >     signature with a chunkable authenticated encryption such that a
> >     decryptor could stream one pass and be certain of its origin (at
> >     least up until truncation, if (a) can't be resolved)?
> >
> > Thoughts, pointers, or suggestions would be much appreciated.

Use authenticated encryption so no signatures are required. Detached
signature verification is used for large public messages already: no
streaming needed.
>
> To solve B what you need to do is something like signing a list of
ciphertext hashes/authentication tags.

The idea below demands conditions beyond MAC security.

>
> One thought I've had before (my idea is to use it for FDE*) is to for
example use HMAC over segments (including counters) or to extract AEAD tags
(prefixed with counters to preserve order) and create a Merkle tree hash of
those lists when creating the message, to then sign that Merkle tree, such
that when you decrypt and recreate the tags for comparison you can confirm
that nothing has been modified (with the level of assurance that the tags
can provide). Or you skip the standard AEAD and MAC constructs and use a
signed Merkle tree hash of the ciphertext itself as your own custom MAC.
>
> One benefit of using a hash-tree like algorithm with a signature is the
reduction of storage overhead and memory usage, and that you retain the
ability to independently verify each segment. Ideally you would use a
hash-tree like algorithm which also can be generated efficiently in a
single pass over even large ciphertexts with reasonable memory usage (does
anybody know of one?).
>
> Not that I've also read the referenced Tahoe-LAFS link, it looks like
they're doing something very close to what I described above, but slightly
different:
> They use AES-CTR over the whole file with a unique key, one plain hash
over the entire ciphertext, one Merkle tree hash over the ciphertext blocks
and one Merkle tree hash over the erasure coded shares of the blocks, if
I'm reading it correctly, with all three hashes stored in plaintext with
the shares and then hashed together (also including the length of the
file). They also have some sort of hash chain, but the graphics don't load
and I can't figure out how exactly it is applied, beyond potentially having
to do with confirming the order of blocks. Instead of using HMAC they use
double SHA256 in a particular format.
>
> Minus the erasure coding** and with an added signature of the file
hashes/header, that's almost exactly what I imagined, I'm just worried
about the risk of the performance penalty limiting adoption. If the
performance of this method is considered acceptable or can be improved to
an acceptable level, I definitely support using it.
>
> * A bit off topic here, but for FDE I imagine changing the encryption key
every write-session using a KDF and session counter, keeping an
authenticated encrypted list of which segments uses which session write
keys. This way you prevent partial ciphertext reversal and prevent
detection of when ciphertext segments repeat over time (re-zeroized or
restored files). You can also arbitrarily re-encrypt random segments to
obscure your real write patterns (and reduce the list size occasionally by
purging unused keys after re-encrypting the last segments using them).
>
> ** In Tahoe-LAFS, erasure coding of blocks is used to allow you to split
files across storage nodes with minimal risk of data loss. That's not
applicable here, as it can be applied independently when considered
necessary.
>
>
> _______________________________________________
> Cfrg mailing list
> Cfrg@irtf.org
> https://www.irtf.org/mailman/listinfo/cfrg
>