Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

ianG <iang@iang.org> Sun, 08 November 2015 10:43 UTC

Return-Path: <iang@iang.org>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F35D11A87C7 for <openpgp@ietfa.amsl.com>; Sun, 8 Nov 2015 02:43:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5c3mc4rc3DnF for <openpgp@ietfa.amsl.com>; Sun, 8 Nov 2015 02:43:05 -0800 (PST)
Received: from virulha.pair.com (virulha.pair.com [209.68.5.166]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B796F1A8796 for <openpgp@ietf.org>; Sun, 8 Nov 2015 02:43:05 -0800 (PST)
Received: from tormenta.local (iang.org [209.197.106.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by virulha.pair.com (Postfix) with ESMTPSA id 1E9EB6D72F; Sun, 8 Nov 2015 05:43:04 -0500 (EST)
To: openpgp@ietf.org
References: <87twp91d8r.fsf@alice.fifthhorseman.net> <CAM_a8Jy-ZoGJ3qTgN5PFA2ZKnbtSy5GWhWhUeF2NHYgWUQ0zYA@mail.gmail.com> <3A98EA92-0C2F-46A7-8D06-880FC83CB110@gmail.com>
From: ianG <iang@iang.org>
Message-ID: <563F2736.6090603@iang.org>
Date: Sun, 08 Nov 2015 10:43:02 +0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <3A98EA92-0C2F-46A7-8D06-880FC83CB110@gmail.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/UmExcnCEyB64K20SSD5Mvq-C3I0>
Subject: Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 08 Nov 2015 10:43:08 -0000

On 7/11/2015 01:46 am, Bryan Ford wrote:
> To be clear, there are two separate use-cases, each of which make sense
> without the other and require different technical solutions (but could
> also make sense together):
>
> 1. Streaming-mode integrity protection:  We want to make sure OpenPGP
> can be used Unix filter-style on both encryption and decryption sides,
> to process arbitrarily large files (e.g., huge backup tarballs), while
> satisfying the following joint requirements:
>
> (a) Ensure that neither the encryptor nor decryptor ever has to buffer
> the entire stream in memory or any other intermediate storage.

Yes.

> (b) Ensure that the decryptor integrity-checks everything it decrypts
> BEFORE passing it onto the next pipeline stage (e.g., un-tar).


ok.  So this is where a program-level option comes in.  In streaming 
mode, the streamer can keep decrypting and passing it across to the 
reader, and then break when an integrity check fails.

In streaming mode, this is how we would expect it to operation.  A user 
program can however offer some options in this case.  Eg., do an 
integrity check pass before hand as a separate option;  and turn the 
integrity checks into warnings, keep decrypting the data, knowing that 
there is garble in there, keep streaming.  Both two useful options a 
program could offer.

So I'd say NO - streaming is streaming, and there isn't a requirement in 
the spec to be sure about the entire file before hand.  That's just a 
quirk of the streaming mode that users will have to accept.


> 2. Random-access: Once a potentially-huge OpenPGP-encrypted file has
> been written to some random-access-capable medium, allow a reader to
> decrypt and integrity-check parts of that encrypted file without
> (re-)processing the whole thing: i.e., support integrity-protected
> random-access reads.
>
> Let’s call these goals #1 and #2, respectively.
...
> We could very well design an OpenPGP format that addresses both goals
> together, if we decide both goals are valuable. ...
>
> There are some obvious tradeoffs here, both in storage and complexity
> costs.  I’m not that worried about the storage efficiency costs,...
>   And the implementation-complexity is certainly an issue regardless.


Nod.  Let's see how the requirements go first, and whether there is a 
reasonable design possible second.

> So some questions about this:
>
> 1. How important is the ability to achieve goal #1 above in the OpenPGP
> format (streaming-mode integrity-checking)?


It's certainly important.  If we want to bring everyone across to a new 
format, and start ditching the old (from the standard) then we have to 
provide an equivalent to common use cases.

I'm inclined to say that stream-mode must be integrity checked.  We want 
to achieve the same standard across the board, we don't want to say "if 
X, then Y, but if the Z, then not Y and maybe W..." and complicate the 
user understanding.


> 2. How important is the ability to achieve goal #2 above in the OpenPGP
> format (random-access integrity-checking)?


Random access is a new feature.  It's certainly an *attractive* feature 
for the inner geek, just because.  But I am not seeing a clear use case 
as yet, at the user level.  If I think about the command line, I can't 
see a way a user would say "decrypt from blocks 1234 to 8960" without 
getting into some arcane geeky construction like doing dd(1) or somesuch 
... which no sane end-user does.

What I am seeing is that this would be an API call to other systems 
which do know what they want.  This would be quite useful for a backup 
for example, or an rsync-like tool.  Being able to re-start the backup 
is incredibly useful, being able to set off a backup to do a sort of 
"rsync" phased copy from "state N" without phase errors would be fantastic.

We would be then entering into the library space rather than the 
end-user interface space.  This might actually be a good thing, it might 
tear our childlike grip from the command line and drag us into the new 
millenium in time for the next decade.  It might finally kill off our 
obsession with email :)

Or it could be mission creep, scope enlargement, or the sinking of the 
project if we become all things to all other projects building GUIs on top?


> 3. For whichever goal(s) we wish to be able to achieve, should those be
> *mandatory* or *optional* in the format?


I'd really like to see one format.  The boolean logic that goes with 
different formats just ripples through the users minds and creates 
confusions.  Every confusion creates loss of users.  Every user we lose 
to confusion is a breach of security because they go on to do it 
cleartext or some other inadequate tool.  If we have 10 such confusions 
scattered across the code, we'll probably half the number of users.

That's without even talking about bugs, and security snafus and the 
potential for choosing the wrong mode and breaking the lot...  E.g., it 
took me 2 years to find out the reason why SVN would break every month 
was that the client side was mounted on a Mac OSX drive that had an 
*option* to select case insensitivity...  dozens of mandays lost in 
rectification/recovery/rebuilding client repos because of an obscure option.

There is a reason the MiB run around and insert multiple-mode madness 
into people's minds in groups.  It makes security brittle.  It makes it 
easy for them to futz.


> That is, should *every*
> OpenPGPv5-encrypted file satisfy either or both of these goals, or
> should they be configurable or user-selectable (such that some encrypted
> files might contain per-chunk signatures and/or Merkle trees while
> others do not)?  Making either of these goals “supported but optional”
> might help mitigate any performance/storage cost concerns with either of
> them, but would only further increase the complexity of the overall
> OpenPGP spec and increase the “usability risk” of a user accidentally
> failing to enable a relevant option when he really should have (e.g.,
> streaming-mode protection for backups).


Yup.  And then he goes off an uses another tool.  Coz the sales force 
have realised that taking options away makes the sale easier, and the 
user can't see the schlock under the hood anyway.


> 4. What are reasonable upper- and lower-bounds for chunk sizes, and what
> are the considerations behind them?


Defer to later.



iang