Re: [openpgp] AEAD Chunk Size

Bart Butler <bartbutler@protonmail.com> Wed, 17 April 2019 00:05 UTC

Return-Path: <bartbutler@protonmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 99ECB1201A5 for <openpgp@ietfa.amsl.com>; Tue, 16 Apr 2019 17:05:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=protonmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vv0SJYqeIvp0 for <openpgp@ietfa.amsl.com>; Tue, 16 Apr 2019 17:05:56 -0700 (PDT)
Received: from mail-40132.protonmail.ch (mail-40132.protonmail.ch [185.70.40.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CEB7F120199 for <openpgp@ietf.org>; Tue, 16 Apr 2019 17:05:55 -0700 (PDT)
Date: Wed, 17 Apr 2019 00:05:48 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=default; t=1555459552; bh=KjN+ZFxg7Zvs6CITzBVjkw79RnIRudJm393+eBD2At8=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References: Feedback-ID:From; b=AyJfTpu1YTkp2Rlnl/QowI5lRWre0Wojw0Ci66UBSBK38ZgdhemdAXHNjxfDgsuPO lyz5A5Po6t3KKcQPbDa+7UCubHtkNV51YV+vxn8clBBHJKx8gOn9piGGmu+FF0xh9y ekif9IuH14S35Omb3Qx6O1chkUk8w2002AIl8npo=
To: Jon Callas <joncallas@icloud.com>
From: Bart Butler <bartbutler@protonmail.com>
Cc: "openpgp@ietf.org" <openpgp@ietf.org>, Justus Winter <justuswinter@gmail.com>, "Neal H. Walfield" <neal@walfield.org>, Peter Gutmann <pgut001@cs.auckland.ac.nz>
Reply-To: Bart Butler <bartbutler@protonmail.com>
Message-ID: <YMBMgZGGCSQb4Bnp9xRFkBfOn-I97FrycqHK4NvuHUkgtmL6_UaumtHJwJc-4nbmACSHrA4CWqEeLMDUuoVFMq0Vc6M0fwO8G40Mq1heEgI=@protonmail.com>
In-Reply-To: <18FF6D9C-B285-406E-A344-E6362646DE68@icloud.com>
References: <87mumh33nc.wl-neal@walfield.org> <878swzp4fb.fsf@europa.jade-hamburg.de> <E65F6E9D-8B0B-466D-936B-E8852F26E1FF@icloud.com> <87d0m9hl62.wl-neal@walfield.org> <FEE9711C-3C64-493C-8125-89696B882E0A@icloud.com> <2di2bK8m-7HtDeoUEH9oPqs-bL-IKSE0CjkgFShPMLOlUyeDBVkVGApdjnIpS6YRAeKU3ibGCZCtwLden-N6zK5W4fqIghRGDa5dU720nEs=@protonmail.com> <73739F8A-5E9F-4277-B053-FDD2E8D81B17@icloud.com> <cc75QwJwTIffqLK7fzZ3A2Pw1Vb3_lkhSHfYRPyASZcxceG2c0Cpbld529WsXosP7X9x4agikpGD4dVTXK8iaRkblS9Jokv1tD2TceQBbyE=@protonmail.com> <18FF6D9C-B285-406E-A344-E6362646DE68@icloud.com>
Feedback-ID: XShtE-_o2KLy9dSshc6ANALRnvTQ9U24aqXW2ympbGschdpHbU6GYCTUCtfmGhY9HmOyP1Uweyandwh1AVDFrQ==:Ext:ProtonMail
MIME-Version: 1.0
Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="pgp-sha512"; boundary="---------------------f46540c63e1032020a1e33e76e3da95e"; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/gzQtqVcYmCSIAr4q6anrKQv-t94>
Subject: Re: [openpgp] AEAD Chunk Size
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2019 00:05:59 -0000

Hi Jon, 


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, April 15, 2019 5:00 PM, Jon Callas <joncallas@icloud.com> wrote:

> 

> 

> > On Mar 30, 2019, at 9:11 PM, Bart Butler bartbutler=40protonmail.com@dmarc.ietf.org wrote:
> 

> [...]
> 

> > > OpenPGP is in general the latter case rather than the former. I believe it’s less important to have strict semantics on failures because it’s usually storage.
> > 

> > I agree. I would say my point is that with sufficiently small chunks, the user/decrypter can choose what kind of failure behavior is appropriate. Large chunks robs the decrypter of that.
> 

> We are mostly in violent agreement, I do believe. I feel like I'm saying something like "a quarter is a coin with George Washington on one side and an eagle on the other" and you're saying "a quarter is a coin with an eagle on one side and George Washington on the other." We're talking about the same coin, with a slightly different point of view.
> 

> I wouldn't use a term like "rob" because that assigns value to the condition. I think there are places where rejection matters and is a Good Thing. I think there places where it is not a good thing and is even a Bad Thing. That's why I was using terms like "strict semantics" and a lot of conditionals.
> 


I said 'rob' because I think fundamentally that the release semantics should be something that is decided by the decrypter, not the encrypter, as only the decrypter knows what kind of release semantics are safe or not. For example, I have a 32 MB PGP/MIME message. I want to show a preview in my email client. If we use 8K chunks, I can read the first chunk, know that it hasn't been messed with, and display it safely. If the spec allows a 32 MB chunk, as an application developer I have some choices:

1. I can load the entire 32 MB and be really slow/bandwidth intensive
2. I can not show a preview for this message
3. I can ignore release semantics and do it anyway, risking the Problem That Shall Not Be Named

All of the options are terrible for a UX perspective. Meanwhile, if the chunk size is capped, this makes it easy, and if I, as an application developer, need strict release semantics for the entire file/message, I can do that too.

Now, with your proposal, the other implementations and I can come to some agreement that hey, we just aren't going to allow chunks meaningfully higher than the cap, what you call "normative" agreement. That's fine, but I'm worried that these norms don't tend to be well-documented (I'm not sure the MAY in the RFC will be sufficient), and someone somewhere is going to write an implementation at some point which exclusively uses big chunks. When they do, our implementations will reject them, and then their users will complain to the app developers, who will in turn complain to the implementers.

I'm certainly not so arrogant to assume I can anticipate all future needs here. But I think it's telling that we can come up with several negative consequences of allowing the large chunks and the only benefit is something that can be achieved at the application layer (or as an option at the implementation layer even) if desired anyway.

I also think that forcing no-release semantics via packet structure is misguided because app developers/implementors are likely to ignore if it becomes too annoying. That is, I anticipate some implementers just allowing some kind of unsafe mode that releases plaintext early with no integrity checks if this comes up (essentially streaming not along AEAD chunk boundaries), and I'm in general uncomfortable with choosing to build a feature whose failure case is "massive security hole", not to mention one that we've seen before with the Problem That Shall Not Be Named. Do we want to allow people to create messages which *cannot* be safely streamed when we have the choice not to do this with zero functional downside? Strict release semantics can always be enforced at the implementation or application level.

> I don't want to bury my lede any deeper than this. What I'm saying is:
> 

> -   The more you want strict AEAD semantics of no-release, the fewer chunks you want.
> -   It seems to me that the people who most believe in strict AEAD release are also the ones who are arguing for smaller packets. These seem to be in opposition to each other. I've been confused through this discussion because the rationales seem in opposition and confused. I don't get it, and I want to understand; you all are smart people whom I respect, so if I'm confused, maybe I'm not getting something.

This feeling is completely mutual. I respect everyone in this discussion and know that all of you are smart people. I will try to rephrase what I think is the fundamental question here, and it's not what release semantics should be--those can be enforced in lots of places, and as you said, vary by use case, which is very compatible with my views. I think the fundamental question here is this: 


*Should we allow creation of valid messages which cannot be streamed and attempt to force strict no-release at the protocol layer?*

I think, in the absence of a compelling reason to, the answer to this is a pretty clear no.

>     

>     We might differ in that I have a nuanced opinion about AEAD rejection. I think that there are places where it matters, and places where you don't. For example, in networking, particularly the parts of the network stack where you can easily get a forged packet. You want to reject that packet as early as possible. Moreover, these places are always using very small packets. (I'm going to wave my hand and say that under a megabyte is "very small" for these purposes.)
>     

>     But in archival storage, you don't want to reject something because there's a media error, you want to recover as much as possible. You might even be required to do so by law. I have real-world anecdotes if you want to hear them.
>     

>     On a network, rejection is a good thing. You reply a NAK to the sender and they retransmit. In archival storage, there's no retransmitting on a media error. That's the case where it's a Bad Thing, and in fact, it might even be better to use CFB mode and an MDC than AEAD. It also might not, and much depends on which AEAD mode one used.
>     

>     Nonetheless, if you believe in strict semantics, you also likely want the fewest number of chunks. If there is more than one chunk, you have to stage the output, you have to process everything (unless you're going to say that the timing side-channel is not important)

Why do you have to stage the output in the multi-chunk case? The only difference in the multi-chunk case is that I'd check AEAD tags multiple times instead of just at the end. There's no reason why I'd have to do anything with the output differently than a single chunk if I embrace strict no-release. I could buffer it the exact same way I was buffering the single chunk and the application/consumer doesn't have to know there is any difference.

Fundamentally, multi-chunk just gives you options. There is nothing stopping an implementation from doing strict no-release. 


>     

>     Sometimes this is not possible. Ironically, the place where it's most possible is in storage, where it's the least needed. In online protocols,
>     

> 

> > OK, I think this is the part that I don't understand. Why does it matter what chunking scheme is used here? If my app requires all-or-nothing semantics, I would program my app to enforce that all chunks must pass and not release plaintext unless that happened, with no truncation, etc. So why would every joint be a vulnerability?
> > 

> > > > What value does large-chunk AEAD actually provide? What I'm getting from the AEAD Conundrum message is that it's a way for the message encrypter to leverage the "don't release unauthenticated chunks" prohibition to force the decrypter to decrypt the whole message before releasing anything. Why do we want to give the message creator this kind of power? Why should the message creator be given the choice to force her recipient to either decrypt the entire message before release or be less safe than she would have been with smaller chunks?
> > 

> > > Let me summarize the conundrum: If you want strict AEAD no-release semantics, you want a fewer number of chunks.
> > 

> > I guess this is my fundamental question. You can force no-release semantics at the application level for any chunk size scheme, right?
> 

> Yes, you can, provided that there's a way to report that back, and your caller checks the return value.

You (as an implementation) could just not return the plaintext until the entire message was read. There's nothing stopping implementations from having a strict no-release mode.

> 

> I suppose this really means no, you can't force it, because the library writer can't force the application code to check the error return.

Well, the library can always just not return the plaintext if we don't think it's safe. I just don't think it's the encrypter's business to be deciding what is safe or not for the decrypter.

> 

> I have heard that some issues that we're Not Going To Talk About had among the issues improper checking GnuPG's report of an MDC failure was an issue in at least one place.
> 


Sure, but this could have been configured as a hard failure. The apps didn't configure it as a hard failure because that would have collided with UX/application concerns, and I fear that that collision will occur again if we allow it to, with likely the same result.

> > > If you respond to a security request with a performance answer, you literally don’t know what you’re talking about. So let’s toss that aside.
> > 

> > I apologize, I was not trying to create a strawman here, but I am completely at a loss for what the benefit of large chunks is.
> 

> From a standpoint of debate technique, coming up with a strawman makes your whole side of it weaker because attacking a strawman is attacking a strawman. It makes it look like you don't understand, when you actually have a different issue. I think it has added to the confusion I have been suffering from. The chunk size question is about adjusting security parameters, and thus when you say, "it won't help performance" I can't help but think that we're not discussing the same thing at all, as I'm talking security, and you're talking performance.
> 

> Good to put that to bed. Back to the chunk size debate.
> 

> I don't know the specific benefits, either. I heard people asking for it, and I'm defending the idea for them.
> 

> I believe that an underlying difference between your thinking and mine is that you're looking at this as an application writer, and I'm looking at it like a protocol / API that has many clients, some of whom (and the largest ones) aren't written yet.
> 

> Moreover, there are a lot of people who use OpenPGP for a lot of things that we don't know about. As Peter Gutmann pointed out there are a lot of EDI systems, back ends of financial systems, and so on that internally use OpenPGP implementations. They're not here. I'm trying to watch out for them.
> 

> There are also people around who want to do something and for a lot of reasons find it difficult to speak up. I'm not editor any more, Werner is and I have every faith in him. Sometimes, though, old habits die hard.
> 


I'm sympathetic to all of this, and I don't want to put anyone on the spot. It would be really great if anyone who has a use case for large chunks speaks up though, either through this thread or privately to me, Jon, or anyone else they feel comfortable speaking with, because I do not want anyone's voice to not be heard, and if there is a use case for large chunks I do want to hear about it before this decision is finalized.

> I tend to see the AEAD packet format as being a successor to the existing streaming, indefinite length things. That allows chunking up to 2^30 and while absurdly large, it has never been an issue.
> 


Well, except that streaming this old stuff is unsafe if ciphertext modification is a threat.

> In my head, I think why not allow up to that, since it would preserve anyone's weird thing?
> 

> On the other side, implementers need guidance. Today, the guidance is folklore with all the issues that go with it. It's better not to have folklore. But, if we basically said, "do what you're doing today" then we'd be looking at 8K chunks, as that's what GnuPG does today.
> 

> The clauses I suggested about MAY support larger / MAY give larger the finger seemed to be a compromise that would work because it gives you the guidance you need; it lets whoever these people are the ability to do what they want; and lastly should there be a consensus that it needs to be larger in the real world, a consensus of implementers can change it without a new document. It seemed to me that everyone wins.
> 


For the record, I'm pretty much OK with this, I just think it's opening us up to future problems that it would be best to avoid.

> Yet I thought I perceived that you not only wanted to win, but you wanted to salt the earth in the other people's territory. Fixing an upper bound on memory has a long history of Famous Last Words going back to the old clichéd "640K is more than enough for anyone." The gods punish hubris.

I'm sorry I gave that impression or was overly strident. I consider this a rare opportunity to fix something before it becomes a problem rather than afterward with a bunch of legacy baggage in tow. I have no interest in "winning" this argument for it's own sake--I would be happy to get a counter-argument for large chunks that made me think "yes, there is a use case and that's why we want to risk having these future problems".

> 

> Okay -- let's sort all this out. I really think we are ALMOST done here.
> 

> Here's what I stated before.
> 

> > > (1) MUST support up to <small-chunk-size> chunks.
> > > (2) SHOULD support up to <larger-chunk-size> chunks, as these are common..
> > > (3) MAY support larger up to the present very large size.
> > > (4) MAY reject or error out on chunks larger than <small-chunk-size>, but repeating ourselves, SHOULD support <larger-chunk-size>.
> > 

> > > Clauses (3) and (4) set up a sandbox for the people who want very large chunks. They can do whatever they want, and the rest of us can ignore them.. Why get rid of that? It doesn’t add any complexity to the code. It lets the people who want the huge ones do them in their own environment and not bother other people.
> > 

> > > My concern is over (1) and (2) and specifically that there’s both <small> and <large> sizes.
> > 

> > > I think that’s an issue. If there are two numbers we are apt to end up with skew before settling on one, so it’s better to agree on just one. That’s the real wart in my proposal.
> > 

> > I'm OK with eliminating (2) and just using the MAY part to take care of any legacy 256K messages OpenPGP.js users might have. As I said, we don't have any of these messages in production yet and I'd err on the side of a cleaner spec.
> 

> Me too. I think saying 256K is fine. I have an intuition it ought to be at least as large as the largest Jumbo Frame, and that's 9K so round to 16K. Let me restate the proposal.
> 

> (1) MUST support up to <chunk-size> chunks.
> (3) MAY support larger up to the present very large size.
> (4) MAY reject or error out on chunks larger than <chunk-size>
> 

> And it seems that 256K is the proposal for <chunk-size>. Are we agreed on all that?
> 


As some respondents would like 8K or 16K, I'm fine with doing that instead of 256K. I would like to check with the maintainers of our libraries to find out if there's any reason I'm ignoring that would favor one or the other before committing though.

> > I just really want to understand the benefit of large chunks for security and right now I clearly do not.
> 

> If you believe that no-release is a Good Thing, then you want fewer chunks, ideally only 1 chunk. That's it. That's the ONLY reason.
>

I think I discussed this to death above so I won't add to the word count here.

-Bart

> I believe that no-release can be a Good Thing, but rarely is for OpenPGP's primary use case. As I said in my other missive, I don't think that it's even possible in the general case. Networking packets, yes -- both possible and desirable. Files, no -- neither possible nor desirable.
> 

> Jon