[openpgp] The AEAD Conundrum as it pertains to OpenPGP

Jon Callas <joncallas@icloud.com> Fri, 29 March 2019 22:30 UTC

From: Jon Callas <joncallas@icloud.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Message-Id: <BFC5E4EE-01D8-4552-BBB0-F6B09D511160@icloud.com>
Date: Fri, 29 Mar 2019 15:30:15 -0700
Cc: Jon Callas <joncallas@icloud.com>
To: "openpgp@ietf.org" <openpgp@ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/8Lk5H36wXZmnm-f1hvBJ49aB46A>
Subject: [openpgp] The AEAD Conundrum as it pertains to OpenPGP
Precedence: list

I’m going to lay this out quasi-rigorously. Or at least as rigorously as I feel like doing.

In this missive, I am going to show that the stated goals of AEAD in OpenPGP are impossible to obtain, and that one of the goals is going to have to give.

Premises:

(O1) An OpenPGP implementation MUST be able to process a message of arbitrary size.
(O2) An OpenPGP implementation MUST have a reasonable bound on memory; in fact, it must be possible to make an OpenPGP implementation that works on some tightly-bound system. Not as tight as an Arduino, but really tight.

(A1) An AEAD implementation must be all or nothing; by this we mean that if it detects an error in the authentication, it must return an error and no plaintext.
(A2) An AEAD implementation must never release unauthenticated plaintext

A1 and A2 are almost the same thing. I mean A1 to say you can’t release plaintext that you know is bad. I mean A2 to say that you can’t release plaintext that you don’t know is good. I am this positing that there is something that at least one bit that didn’t fail, but doesn’t yet pass. In short, we have known good data, known bad data, and something in the middle. I also presume that we will be able to put the middle stuff into either the pass or fail buckets, we just can’t yet.

Example: There’s a 10 byte data blob. 9 bytes have come in on the wire. Those nine bytes are in that middle area where we don’t know (yet) if they’re good or bad. We presume that we *will* know, but there’s exponential backoff happening in the interpipes and so we gotta wait. Should the connection break, then we’ll consider them to be bad, because of premise A2.

Thus:

O1 and O2 tell us that we must have a chunking system. Since a message can be of arbitrary size and we have to have a memory bound, we have to process things in chunks.

From A1 and A2, we know that any given chunk gets released either as a unified whole, either good or bad.

Consider this small example.

Alice sends to Bob “AXBX” and somewhere along the way, Eve modifies the B. When Bob receives it, his implementation detects the error perfectly, knowing that the third position was modified. (How? Beats me.) Despite knowing that the AX is good, Bob’s implementation tosses the whole thing away and tells Bob there was an error.

Okay, now let’s look at “AXXBX” and again, Eve modified the B. (How? My presumption is that Eve is modifying the next to the last byte, but there are other explanations.) Same as the case above, toss the whole thing out. Ditto for “AXXXBX” and so on.

I am thus creating the set of messages M where each m ∈ M is of the form “A” followed by some number of “X”s and then “BX”. In each of these, Eve modifies the B. I think that we can agree that for all of these messages, the message gets rejected because Eve modified the penultimate byte. In short, all messages m ∈ M are invalid and must be rejected, with an error message and no plaintext.

But wait...

Consider the case with “AXX...BX” where the B falls on the first byte of the second chunk.

The first chunk is valid and gets released, but there’s an error in the second chunk, so we error and reject.

This is a contradiction; we have a message that is in error, and yet we’ve released almost all of it.

What this means is that you cannot both satisfy O1 and O2 along with A1 and A2.

QED.

Discussion:

In the AEAD discussion we’ve had, we’re wanting to do what I just showed is impossible. A strict reading of AEAD semantics along with arbitrary message sizes is contradictory.

Obviously, we *can* relax the semantics to say that we only mean O1,O2,A1,A2 within a chunk. That’s easy.

Similarly, we can *approximate* strict behavior in many circumstances. Example: consider decrypting a large file, and we validate each chunk and write it to the file, then when we get to the end, we clean up the file, delete it, and return an error. There are many places where this sort of backtracking is indistinguishable from the strict semantics.

Nonetheless, there are places where you must release plaintext that will be processed by some other system (imagine a shell session) and then you learn there is an error. Moving finger writes, and so forth and so on.

I also want to note that the smaller the chunk size, the bigger this contradiction is a real issue. That’s why I started with something small in my layout of it. It seems utterly obvious to me that with a four-byte message we must, must, must reject it and burn it with fire. But if it was a petabyte with an indeterminate length (we don’t know it’s over until it’s over)? Ennnnnnh, sure, whatever; it doesn’t bother me that we don’t learn it right away. Even for 256K, I’m not bothered. I’m willing to nod and say, “Yeah, it’s a problem, but not a big one” at 64K, even. But even as we approach smaller and smaller messages, I believe that the solution is that you have as large a chunk as is reasonable. Bigger chunk means better security.

This affects the AEAD discussion in some ways that I’ve noticed.

Some of the agitation has been over wanting to preserve the strict AEAD behavior, in both directions. The major reason for an arbitrarily large chunk size is that you cannot in all cases both chunk and preserve strict semantics. You have to pick one, and relax (or give up on) the other.

As I said, in many cases this doesn’t matter, and as many of us (like Peter Gutmann and I) have noted, OpenPGP is most often used for something that is more like storage than communications, so you can do the emulation fallback. If I am decrypting a file, an S3 bucket, or even a MIME part in an email, I can backtrack. In an interactive protocol like TLS, backtracking is not possible in general. In fact, I’m hard pressed to think of a specific where it’s possible.

I believe that underlying this is a confusion about encryption semantics. There are often huge semantic differences between interactive protocols and static protocols when it comes to classes of errors. There are many breaks that are interactive-only. The breaks of PKCS#1 V1.5, almost all compression problems, and more are huge problems in an interactive protocol, and within epsilon of zero-problem for many, many deltas.

I think this is a source of many things that I’ve thought were confusing.

I am happy with some looseness on the AEAD semantics. In short, this problem doesn’t really bother me, and I can cope with, “just deal with it” as an answer to the contradiction I lay out. However, I recognize that not everyone is that way, and there are people who have really good reasons for wanting strict semantics. I am puzzled that the people who are most concerned about strict security appear to be the once who are most adamant about small chunks. And that makes me at least raise an eyebrow.

I think that the best solution for us all is some compromise similar to the one that I laid out yesterday. Make a chunk size that’s reasonable (64K, 256K, 10K, who cares) and *allow* people who want more to go there knowing that they will be trading strict security for interoperability. The tighter the AEAD semantics, the more you need to allow for larger chunks. I believe that the more you believe tight security is necessary, then the more *willing* you ought to be to allow people with special needs to go off in the weeds on their own. Since that is not the case — the people who want strict semantics are arguing for small chunks and listing their belief in strictness as a reason for small chunks, I am confused.

Jon

[openpgp] The AEAD Conundrum as it pertains to Op… Jon Callas