Re: [openpgp] review of the SOP draft

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Mon, 11 November 2019 23:00 UTC

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Antoine Beaupré <anarcat@torproject.org>, openpgp@ietf.org
In-Reply-To: <87mud28fds.fsf@curie.anarc.at>
References: <87mud28fds.fsf@curie.anarc.at>
Autocrypt: addr=dkg@fifthhorseman.net; prefer-encrypt=mutual; keydata= mDMEXEK/AhYJKwYBBAHaRw8BAQdAr/gSROcn+6m8ijTN0DV9AahoHGafy52RRkhCZVwxhEe0K0Rh bmllbCBLYWhuIEdpbGxtb3IgPGRrZ0BmaWZ0aGhvcnNlbWFuLm5ldD6ImQQTFggAQQIbAQUJA8Jn AAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgBYhBMS8Lds4zOlkhevpwvIGkReQOOXGBQJcQsbzAhkB AAoJEPIGkReQOOXG4fkBAO1joRxqAZY57PjdzGieXLpluk9RkWa3ufkt3YUVEpH/AP9c+pgIxtyW +FwMQRjlqljuj8amdN4zuEqaCy4hhz/1DbgzBFxCv4sWCSsGAQQB2kcPAQEHQERSZxSPmgtdw6nN u7uxY7bzb9TnPrGAOp9kClBLRwGfiPUEGBYIACYWIQTEvC3bOMzpZIXr6cLyBpEXkDjlxgUCXEK/ iwIbAgUJAeEzgACBCRDyBpEXkDjlxnYgBBkWCAAdFiEEyQ5tNiAKG5IqFQnndhgZZSmuX/gFAlxC v4sACgkQdhgZZSmuX/iVWgD/fCU4ONzgy8w8UCHGmrmIZfDvdhg512NIBfx+Mz9ls5kA/Rq97vz4 z48MFuBdCuu0W/fVqVjnY7LN5n+CQJwGC0MIA7QA/RyY7Sz2gFIOcrns0RpoHr+3WI+won3xCD8+ sVXSHZvCAP98HCjDnw/b0lGuCR7coTXKLIM44/LFWgXAdZjm1wjODbg4BFxCv50SCisGAQQBl1UB BQEBB0BG4iXnHX/fs35NWKMWQTQoRI7oiAUt0wJHFFJbomxXbAMBCAeIfgQYFggAJhYhBMS8Lds4 zOlkhevpwvIGkReQOOXGBQJcQr+dAhsMBQkB4TOAAAoJEPIGkReQOOXGe/cBAPlek5d9xzcXUn/D kY6jKmxe26CTws3ZkbK6Aa5Ey/qKAP0VuPQSCRxA7RKfcB/XrEphfUFkraL06Xn/xGwJ+D0hCw==
Date: Mon, 11 Nov 2019 18:00:17 -0500
Message-ID: <87h83arpby.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha256"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/IKdMvexDdxE3WBa-4hSJglPWNfM>
Subject: Re: [openpgp] review of the SOP draft
Precedence: list

Hi Antoine--

Thanks for the thoughtful review.  It was super long though!  I've
opened a bunch of issues from the stuff you raised here, but more
dicsussion follows inline below.

On Mon 2019-11-11 12:57:51 -0500, Antoine Beaupré wrote:
> https://gitlab.com/dkg/openpgp-stateless-cli/merge_requests/9

I've reviewed and merged the changes requested there, definitely useful
to have this clarifying editorial work done as patches, thanks!

>> This separation should make it easier to provide interoperability testing for the object security work, and to allow implementations to consume and produce new cryptographic primitives as needed.
>  
> I don't understand the part after ", and allow..." what does that
> actually mean? What do we mean by "new cryptographic primitives" here
> exactly?

The intent is that if every OpenPGP implementation provides a `sop`
interface (or potentially a superset of it), then we can write
interoperabiity tests and the like that drive all of them in a reliable
way.

We can, for example, generate new OpenPGP objects that incorporate new
primitives, and feed them to a stable of `sop` implementations, to
determine whether those implementations can consume them.

Or, we can drive them with simple inputs, and see which cryptographic
primitives they choose to use as they produce output.

If you think this text is clearer, i can incorporate it directly in the
draft.

> [...]
>
>> Obviously, the user will need to manage their secret keys (and their
>> peers' certificates) somehow, but the goal of this interface is to
>> separate out that task from the task of interacting with OpenPGP
>> messages.
>  
> It's unclear to me how or if the SOP specification takes into account
> the current design of GnuPG, specifically the part where secrets are
> handled by a separate process, gpg-agent, which is designed to be
> separate from the other parts of gnupg. From what I understand, the
> "agents" keep the secrets and do the operations on behalf of other parts
> of gnupg. But here sop would do so. Are we designing an agent here?
>
> What about OpenPGP cards like the Yubikey? How does sop interoperate
> with those?

sop is not GnuPG, and it doesn't claim or intend to be.

An implementation of sop *may* choose to support other forms of secret
key access (the "@FOO:" namespace is carved out to allow for that), but
the goal of sop is that it is simple enough that every implementation
that can handle OpenPGP secret keys and certificates should be capable
of implementing the interface.

> I find those examples confusing. Multiple arguments, in particular,
> seems ambiguous. Is it "CERT DATA"? or "CERT DATA"?

???  i think those are the same thing, but i'll just assume you meant
"DATA CERTS" at the end.  The answer is that there must be exactly one
SIGNATURE object to verify and there may be multiple certs, so the only
possible way to do it is SIGNATURE first, then CERTS.

But that doesn't address your larger point:

> There should be *mandatory* commandline *options* instead, that clearly
> state the purpose of (say) the "CERT" argument. It's a common in APIs,
> to rely on the order of arguments for meaning, and I think it's often a
> mistake. We should explicitely use *options* instead of *arguments* (as
> in `--foo=bar` instead of just `bar`) for critical parameters like
> secret or verification keys.

I would welcome a MR that makes this change across the entire spec, to
see whether it looks reasonable.  If people on this list prefer that
structure, i would be fine with adopting it, though i think it makes
the CLI more verbose than i would like.

I've trimmed out a lot of your comments below that were to do with
whether positional arguments are sensible or not.  that's not because i
don't care, but i don't think arguing about it in this one message is
helpful.

i've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/7 to
record the overall concern.  if you decide to make an MR for it, please
reference that issue in the MR!

> In general, I feel using the numeric error codes in the document make it
> (needlessly?) harder to read. When i got to this section, my first
> reaction was: "69?? why 69? and why 37? where the heck do those come
> from and why do they matter?" We should at least include a reference to
> the "Failure modes section" in the Introduction section. In Terminology
> maybe? And maybe refer to it here.
>
> In general, I'm worried there might be inconsistencies between the table
> in the "Failure modes" section and the various hardcoded integers
> peppered through the document. This practice also makes the document
> more difficult to review and maintain in the future. We might instead
> use constant names like `SUCCESS`, `NO_GOOD_SIG` that *then* have
> integer values in the later section. This could also provide for good
> constants to use in a library implementation.

I really like this idea, and i encourage someone™ who also likes it to
make a MR that implements it.

i've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/1 to
keep track of it.

>> For all commands that have an `--armor|--no-armor` option, it defaults
>> to `--armor`, meaning that any output OpenPGP material should be
>> ASCII-armored (section 6 of {{I-D.ietf-openpgp-rfc4880bis}})
>> by default.
>  
> Is this on input or output? or both? It's clarified later, I
> think, but it should be made explicit here as well.

the text you've quoted says "any output OpenPGP material".  I'm not sure
how to make that more visible, but i welcome proposed edits.

> How do we generate purpose-specific subkeys?

With `sop`, you do not ;)

If you want to do fancy OpenPGP certificate generation, you do that with
your toolkit's own fancy features.

I've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/2 to
track that maybe we do want some rough guidance about what kinds of
secret key capabilities we want any `sop` to be able to generate here
though.

>> sign: Create a Detached Signature {#sign}
> […]
>> If `--as=text` and the input `DATA` is
>> not valid `UTF-8`, `sop sign` fails with a return code of 53.
>
> Why do we mandate UTF-8 here? Explain.

We don't mandate UTF-8 unless the signer claims that the thing being
signed is text.  If so, it really does need to be UTF-8.  I have no
patience for non-UTF-8-encoded text in 2019.

OpenPGP embeds UTF-8 explicitly in its User ID formatting.  Any OpenPGP
implementation must already handle UTF-8.

if anyone thinks that dealing with different character encodings is a
good idea, please consider that the character encoding is not recorded
in the signature itself, leading charset-switching attacks like those in
https://dkg.fifthhorseman.net/notes/inline-pgp-harmful/

Do you think this information belongs in this document?

> In general, I find the `--as` arguments to be a little confusing and I
> don't undrestand what they bring to the table.

OpenPGP has two different forms of cryptographic signature, which are
dealt with by recipients differently.  The textual form canonicalizes
line-endings and the binary does not.  When signing a document, you need
to indicate to the thing doing the signing which form you are expecting
to create.

If you have a different suggestion, i'm happy to hear it.

>> Example:
>> 
>>     $ sop sign --as=text alice.sec < message.txt >message.txt.asc
>>     $ head -n1 < message.txt.asc
>>     -----BEGIN PGP SIGNATURE-----
>>     $
>
> Another good example of the "argument vs option" problem. If I would see
> a `sop sign` command, the first thing I would try would be:
>
>     sop sign document
>
> and expect it to find the right private key to sign the document
> with. Of course, we don't do this in sop, which is fine, but I'll note
> that we allow implementations to do so.

We deliberately *do not* allow implementations to do so.  sop requires
that you indicate which secret keys you want to sign with.  it is one or
more KEY objects, not zero or more.

If you think that `sop` is supposed to sign with other objects, then
i've done a bad job at drafting this proposal.  I've opened
https://gitlab.com/dkg/openpgp-stateless-cli/issues/5 to try to clarify this.

> By forcing the arguments here to be the signing key, we make it
> difficult to let the implementation pick the right key.

The entire point of `sop` is that `sop` *does not know* anything about
the keys.  it is stateless, and you have to tell it explicitly which key
to use.

>> If `--as` is set to either `text` or `mime`, then `--sign-with`
>> will sign as a canonical text document.  In this case, if the input
>> `DATA` is not valid `UTF-8`, `sop encrypt` fails with a return code of
>> 53.
>  
> What is `mime` here? Why is it necessary? Expand.

i've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/6 to
track this.

>> `--session-key-out` can be used to learn the session key on
>> successful decryption.
>
> "learn"? What does that mean? It seems it means "write to a file". 
> If so that should be said explicitely here.

*What* the user of sop does is that they learn of the session key that
was used for the encrypted message.  *How* they learn it is by receiving
it from the program, either via the filesystem, or via @FD:NNN.

If you think that's unclear, i'd welcome a clarifying patch.

>> If `sop decrypt` fails for any reason and the identified `--session-key-out`
>> file already exists in the filesystem, the file will be unlinked.
>  
> This seems dangerous! Why do we delete a file we haven't created?
> Explain.

We don't want the user to run `sop`, and then inspect a file that was
already in the filesystem thinking that it is `sop`s output.  If you
think that's a bad decision, please suggest what we should do
differently.

>> [`--with-session-key`] enables decryption of the `CIPHERTEXT` using the session key directly against the `SEIPD` packet.
>> This option can be used multiple times if several possible session keys should be tried.
>
> What happens if both "in" and "out" are provided? I can venture a guess,
> but it would be important to make that explicit as there can be horrible
> bugs there.

Please do venture a guess, in the form of proposed text! I'd also love
to hear what the horrible bugs are.  I don't see them.

>> `--with-password` enables decryption based on any `SKESK` packets in the `CIPHERTEXT`.
>> This option can be used multiple times if the user wants to try more than one password.
>  
> We should include SKESK in terminology, because it's the first time we
> encounter it here and I have close to no idea what it means.

https://gitlab.com/dkg/openpgp-stateless-cli/issues/12

I'm not sure we should put it in "terminology", but we surely should
document it better the first time it appears (it is documented the
second time it appears in the text.

>> If `sop decrypt` tries and fails to use a supplied `PASSWORD`, and it
>> observes that there is trailing `UTF-8` whitespace at the end of the
>> `PASSWORD`, it will retry with the trailing whitespace stripped.
>  
> Explain why we do magic things with whitespace. Consider not doing magic
> at all as magic can be evil.

I expect the following use case to be common:

    echo correct horse battery staple > password.txt
    sop decrypt --with-password=password.txt < ciphertext > cleartext

If we don't strip whitespace, it will fail on one side or the other.

sop tries to define what sensible magic should be here, and i think i've
gotten it right.

as for magic being evil: if we don't try to do magic, then we will be
failing in ways that users don't understand, and implementers will get
bug reports that they respond to with "have you tried trimming trailing
whitespace and then trying again?"  that kind of round trip is pointless
when the tool could just avoid the problem in the first place by being
reasonable about what kinds of things people tend to do.

    https://gitlab.com/dkg/openpgp-stateless-cli/issues/13

>> `--verify-out` produces signature verification status to the
>> designated file.
>> 
>> `sop decrypt` does not fail (that is, the return code is not modified)
>> based on the results of signature verification.  The caller MUST check
>> the returned `VERIFICATIONS` to confirm signature status.  An empty
>> `VERIFICATIONS` output indicates that no valid signatures were found.
>> If `sop decrypt` itself fails for any reason, and the identified
>> `VERIFICATIONS` file already exists in the filesystem, the file will
>> be unlinked.
>> 
>> `--verify-with` identifies a set of certificates whose signatures would be
>> acceptable for signatures over this message.
>
> Not failing explicitely on verification seems very dangerous. It relies
> on callers properly reading the spec and realizing this is the only
> exception where exit codes don't suffice in providing a general state of
> the program. I would strongly recommend failing here, just like regular
> verify.

if the person invokes `sop decrypt` with no arguments, should it fail?

if the person invokes `sop decrypt` with `--verify-out` and
`--verify-with`, should `sop decrypt` produce any output?

I'm trying to imagine using this in a MUA.  In my MUA i have what i
think are the keys for my peer, and i see a message from them.

I don't yet know whether it's signed.

I want to decrypt the message to read it, and in the process, i want to
find out whether it has been signed.  I'd prefer to avoid two passes in
this common use case.

If i supply the --verify-* arguments, and sop fails, then i don't get
the cleartext of the message (not in any reliable way, anyhow).  If i
don't supply the --verify-* arguments, and sop succeeds, i've lost any
signature data.

The subcommand is "decrypt", so `sop` treats successful decryption as a
success.  If you want to understand some additional state, you have to
inspect that state somewhere else, and that's in the contents of
`--verify-out`.

does that make sense?  do you think any of that explanation belongs in
the document itself?

> As an aside, why can't we compose verify and decrypt here and just keep
> "verify" out of "decrypt" altogether? I would guess that's (a
> limitation?) part of the OpenPGP standard, but maybe it would be nice to
> explicitely expand on this here as well.

Do you want to propose a way to compose them?  I suppose we could have
`sop decrypt` offer a `--signatures-to` argument, which the sender could
then use for `sop verify`, if anything comes out.  That's kind of an
interesting suggestion, and it might reduce the overall complexity of
`sop`.

However:

 * It means that a caller would need to handle the data twice (a minor
   inefficiency)

 * the signatures might not be verifiable if the thing that is signed is
   not a simple literal data packet, but is, say, a compressed data
   packet wrapped around a literal data packet.

So i don't see how to do this safely (or efficiently, but the
verifiability of the signature is more important than the efficiency
argument)

>> If the caller is interested in signature verification, both
>> `--verify-out` and at least one `--verify-with` must be supplied.  If
>> only one of these arguments is supplied, `sop decrypt` fails with a
>> return code of 23.
>  
> Another argument for failing on bad signatures: if we fail on bad
> arguments of --verify, why don't we fail on bad signatures?

failing on bad arguments is "you've asked an ill-formed question".
That's legitimate and useful to let the operator know that things are
not what they seem.

Failing on a bad signature would be "the answer is no".

If the primary operation you're asking for is verification, then failing
on bad signatures is a reasonable outcome -- only succeed if you succeed
in verifying.

But if the primary operation is decryption, i don't think we should fail
on signature validity for reasons outlined above.

>> armor: armor: Add ASCII Armor
>> -----------------------------
>
> [...]
>
>> If the incoming data is already armored, and the `--allow-nested` flag
>> is not specified, the data MUST be output with no modifications.
>> Data is considered ASCII armored iff the first 14 bytes are exactly
>> `-----BEGIN PGP`. This operation is thus idempotent by default.
>  
> Explain why we want idempotent and why we want to do this guessing game.

I'm at a bit of a loss here, because these seem obvious to me.

We want a guessing game because users who don't know what they want are
going to guess anyway, and they're likely to guess wrong.  We might as
well guess on their behalf if they don't know.

We want idempotent because "ensure this thing is armored" is a
reasonable semantic request -- indeed, it's probably the *only*
reasonable semantic request.  Perhaps we could drop --allow-nested?

> This @ENV: and @FD: stuff really makes me uncomfortable. It's a neat
> hack for commandline applications, but it would break down when
> designing a library API, as the "type" of data passed around the API
> would be ambiguous, or at least with possible side effects. That feels
> like "design smell" here and I would like this to be changed.

FWIW, the @ENV: and @FD: conventions are specifically CLI conventions
and i don't expect them to be translated to any programmatic API.

for example, in the pythonic variant of this framework that i'm working
on, i've treated these objects as labeled bytestreams. ("labeled"
meaning -- if you get a set of these objects, each one has a bytes-like
thing, and a textual name that you can use in error reporting)

I admit that i'm struggling a bit with whether i want to pass around
bytes objects directly (simple, what i've got now) or some sort of
asyncio handle that the bytes are readable from (fancier, probably nicer
to use in the kinds of programs that i will eventually want to write
with this). https://gitlab.com/dkg/python-sop/issues/1 But that has
nothing to do with @ENV and @FD, as those aren't exposed to the python
functions at all.

> I would recommend using equivalent environment variables to the
> parameters instead, for example SIGN_WITH for --sign-with and so
> on. This would, of course, require switching positional arguments to
> options but I already explain why that would be a good idea anyways
> earlier.

environment variables won't work for arguments that can be supplied
multiple times, because then we have to invent a new delimiting scheme.
I definitely don't want to do that.

> File descriptors could be passable as distinct options, like
> --sign-with-fd for --sign-with.

This is an interesting proposal, though i don't see how --sign-with=@FD:3
is much different from --sign-with-fd=3  -- i guess it lets you use
files that are literally named @FD:3 ?  Is that important?

If you could open a merge request that proposes this change i'd be happy
to consider it.

I've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/14 to
keep track of it.

> I've dealt with commandline applications that have special meanings with
> @, and in retrospect, it was a bad idea. In particular, Python's
> argparse module supports using a prefix argument to mean "read options
> from this file" and I've used it to implement crude configuration file
> support for monkeysign and other programs. It's confusing for users and
> does not work very well.

afaict, this is not mandatory in argparse, and i wouldn't recommend it
for any sop implementation:
https://docs.python.org/3/library/argparse.html#fromfile-prefix-chars

I think the way it's specified in `sop` right now is pretty pincipled
and hard to screw up, but i could be wrong.

> Specifically in this case, I would also worry about security
> vulnerabilities with untrusted filenames being passed to the program.

can you explain this more?

>> CERTS {#certs}
>> -----
>> 
>> One or more OpenPGP certificates (section 11.1 of {{I-D.ietf-openpgp-rfc4880bis}}), aka "Transferable Public Key".
>> May be armored.
>> 
>> Although some existing workflows may prefer to use one `CERTS` object with multiple certificates in it (a "keyring"), supplying exactly one certificate per `CERTS` input will make error reporting clearer and easier.
>
> This last bit is in contradiction with `extract-cert` command
> documentation which says it will "only contain one cert". Maybe we
> should just pick one and stick with it here?

I have carefully considered this, and i do not think these are in
contradiction.

extract-cert explicitly says it will contain only one cert *because* the
other places where `CERTS` might be supplied could contain more than
one.

The fact is that people use "keyrings" today, including some that
contain hundreds or thousands of keys.  If a distro, for example, wants
to use `sop` to verify that a package is signed by one of their
developers, i don't want the distro to need to put each developer's key
in a separate file.

> That last part doesn't *look* like "arbitrary text" to me. It looks like
> some explanatory message of the operation. If that's the case, we should
> make that explicit and say why the text is present at all. Calling it a
> "note" or "message" would already be an improvement.

patches welcome, particularly for this kind of editorial cleanup :)

>> A `sop` implementation MAY return other error codes than those listed
>> above.
>  
> This sounds like a bad idea. I interpret that as meaning that I can
> return an error code 2 instead of error code 3 if i fancy. If we're
> going to pick numbers, we should either enforce them or not, but don't
> dance around the issue and encourage people to diverge from the spec.
>
> Or at least, if you allow divergence, explain why it can be allowed.

I've documented this concern as
https://gitlab.com/dkg/openpgp-stateless-cli/issues/10

> It would also be great if we could explain where those magic numbers
> come from in the first place. I suspect they were chosen to not overlap
> with existing error codes, but that's just a guess.

Justus picked 69 in his OpenPGP Interoperability Test Suite.  I chose
the others as "reasonable-sized primes" just for fun.  I don't think
this information belongs in this document, as it doesn't matter.

>> Detached Signatures {#detached-signatures}
>> -------------------
>> 
>> `sop` deals with detached signatures as the baseline form of OpenPGP signatures.
>> 
>> The main problem this avoids is the trickiness of handling a signature that is mixed inline into the data that it is signing.
>
> Should we expand on "trickiness" here?
>  
> Also: how *do* we deal with inline signatures? Are those deprecated now?

From my POV, yes, inline signatures are recipes for disaster, because
the difficulty of figuring out what specifically was signed when you
have an inline signature is not at all obvious.  I've noted that this
needs to be reflected in the spec at:

  https://gitlab.com/dkg/openpgp-stateless-cli/issues/9

That said, i've heard from some of the folks handling package managers
who have specific reasons for wanting inline signatures (also, Neal
Walfield appears to think they're useful in his
https://gitlab.com/sequoia-pgp/pgpcat).  so we probably need to tackle
them somehow here.  (it's even in "future work"!)  I've recorded that
concern here:

   https://gitlab.com/dkg/openpgp-stateless-cli/issues/11

>> FIXME: if an encrypted OpenPGP message arrives without metadata, it is difficult to know which signers to consider when decrypting.
>> How do we do this efficiently without invoking `sop decrypt` twice, once without `--verify-*` and again with the expected identity material?
>  
> Maybe we could use a "sop probe" command for this and other things?

I don't understand what you think a "sop probe" command would do.  if
you'd like to propose it as an MR, i'd consider it, though.

>> Compression {#compression}
[…]
> How about decryption? Do we attempt decompression during decrypt?

It will be interesting to see what implementers do!  I've left `sop`
deliberately agnostic there, and i would like to learn from test suites
what the answer is.

     --dkg

Attachment: signature.asc

[openpgp] review of the SOP draft Antoine Beaupré
Re: [openpgp] review of the SOP draft Daniel Kahn Gillmor
Re: [openpgp] review of the SOP draft Antoine Beaupré
Re: [openpgp] review of the SOP draft Daniel Kahn Gillmor
Re: [openpgp] review of the SOP draft Antoine Beaupré
Re: [openpgp] review of the SOP draft Clint Adams
Re: [openpgp] review of the SOP draft Daniel Kahn Gillmor
Re: [openpgp] review of the SOP draft Daniel Kahn Gillmor

Re: [openpgp] review of the SOP draft

Attachment: signature.asc