[openpgp] review of the SOP draft

Antoine Beaupré <anarcat@torproject.org> Mon, 11 November 2019 17:58 UTC

Return-Path: <anarcat@torproject.org>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7CF83120B15 for <openpgp@ietfa.amsl.com>; Mon, 11 Nov 2019 09:58:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BcqCeiAUiC1F for <openpgp@ietfa.amsl.com>; Mon, 11 Nov 2019 09:58:01 -0800 (PST)
Received: from marcos.anarc.at (marcos.anarc.at [206.248.172.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 385F9120AFB for <openpgp@ietf.org>; Mon, 11 Nov 2019 09:57:53 -0800 (PST)
Received: by marcos.anarc.at (Postfix, from userid 1000) id 6110D10E087; Mon, 11 Nov 2019 12:57:52 -0500 (EST)
Received: by curie.anarc.at (Postfix, from userid 1000) id 842711208E2; Mon, 11 Nov 2019 12:57:51 -0500 (EST)
From: Antoine Beaupré <anarcat@torproject.org>
To: openpgp@ietf.org
Organization: Tor
Date: Mon, 11 Nov 2019 12:57:51 -0500
Message-ID: <87mud28fds.fsf@curie.anarc.at>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/ICYPRUVckvqcmprIKoLQbe3iydg>
X-Mailman-Approved-At: Mon, 11 Nov 2019 10:57:01 -0800
Subject: [openpgp] review of the SOP draft
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Nov 2019 18:00:09 -0000

Hi,

I've taken some time to do a a short review of dkg's "SOP" interface
specification, version 01, as provided here:

https://tools.ietf.org/html/draft-dkg-openpgp-stateless-cli-01

First, I want to say this is excellent and much-needed work. OpenPGP
interfaces have traditionnally been hard to use and a major obstacle to
adoption of strong cryptography in our communities. Having a standard
and *simple* way of interoperating with the underlying OpenPGP
primitives would go a *long* way in significantly improving OpenPGP
adoption and online privacy in general. So I salute this (first) effort
in improving this situation.

Second, I have proposed a series of changes to the document here:

https://gitlab.com/dkg/openpgp-stateless-cli/merge_requests/9

Patch set:

https://gitlab.com/dkg/openpgp-stateless-cli/merge_requests/9.patch

Those are mostly small fixes that might not require a full discussion
here, although I of course welcome feedback on those here as well. It
might be preferable to comment on the issue directly if you have a
GitLab account, obviously.

Finally, there are some things about the document I wanted to comment on
and that I don't have an easy patch for, so I will comment on the
document, inline, here instead. I hope that is proper form, let me know
if there is a better way to do this.

> Introduction
> ============
> 

[...]

> This separation should make it easier to provide interoperability testing for the object security work, and to allow implementations to consume and produce new cryptographic primitives as needed.
 
I don't understand the part after ", and allow..." what does that
actually mean? What do we mean by "new cryptographic primitives" here
exactly?

[...]

> Obviously, the user will need to manage their secret keys (and their
> peers' certificates) somehow, but the goal of this interface is to
> separate out that task from the task of interacting with OpenPGP
> messages.
 
It's unclear to me how or if the SOP specification takes into account
the current design of GnuPG, specifically the part where secrets are
handled by a separate process, gpg-agent, which is designed to be
separate from the other parts of gnupg. From what I understand, the
"agents" keep the secrets and do the operations on behalf of other parts
of gnupg. But here sop would do so. Are we designing an agent here?

What about OpenPGP cards like the Yubikey? How does sop interoperate
with those?

[...]

> Examples
> ========
> 
> These examples show no error checking, but give a flavor of how `sop` might be used in practice from a shell.
> 
> The key and certificate files described in them (e.g. `alice.sec`) could be for example those found in {{I-D.draft-bre-openpgp-samples-00}}.
> 
> ~~~
> sop generate-key "Alice Lovelace <alice@openpgp.example>" > alice.sec
> sop extract-cert < alice.sec > alice.pgp
> 
> sop sign --as=text alice.sec < announcement.txt > announcement.txt.asc
> sop verify announcement.txt.asc alice.pgp < announcement.txt
> 
> sop encrypt --sign-with=alice.sec --as=mime bob.pgp < msg.eml > encrypted.asc
> sop decrypt alice.sec < ciphertext.asc > cleartext.out
> ~~~

I find those examples confusing. Multiple arguments, in particular,
seems ambiguous. Is it "CERT DATA"? or "CERT DATA"?

There should be *mandatory* commandline *options* instead, that clearly
state the purpose of (say) the "CERT" argument. It's a common in APIs,
to rely on the order of arguments for meaning, and I think it's often a
mistake. We should explicitely use *options* instead of *arguments* (as
in `--foo=bar` instead of just `bar`) for critical parameters like
secret or verification keys.

Then "arguments" are left for more regular file parameters, the primary
purpose of the command (e.g. "sign, encrypt, decrypt this file", with
"verify" being of course the tricky bit).

So instead of:

> sop sign --as=text alice.sec < announcement.txt > announcement.txt.asc

I would suggest:

> sop sign --as=text --sign-with=alice.sec < announcement.txt >announcement.txt.asc

For example. The idea would be to use `--PURPOSE-with` pattern (where
PURPOSE is `sign`, `verify`, etc) that is already used in `encrypt
--sign-with`. Maybe it could be shortened to just --with in some cases
(like decrypt, sign and verify). The above example would become:

sop generate-key "Alice Lovelace <alice@openpgp.example>" > alice.sec
sop extract-cert < alice.sec > alice.pgp

sop sign --as=text --sign-with=alice.sec < announcement.txt > announcement.txt.asc
sop verify announcement.txt.asc --verify-with=alice.pgp < announcement.txt

sop encrypt --sign-with=alice.sec --as=mime bob.pgp < msg.eml > encrypted.asc
sop decrypt --decrypt-with=alice.sec < ciphertext.asc > cleartext.out

Or, as a diff:

@@ -99,11 +103,11 @@ The key and certificate files described in them (e.g. `alice.sec`) could be for
 sop generate-key "Alice Lovelace <alice@openpgp.example>" > alice.sec
 sop extract-cert < alice.sec > alice.pgp
 
-sop sign --as=text alice.sec < announcement.txt > announcement.txt.asc
-sop verify announcement.txt.asc alice.pgp < announcement.txt
+sop sign --as=text --sign-with=alice.sec < announcement.txt > announcement.txt.asc
+sop verify announcement.txt.asc --verify-with=alice.pgp < announcement.txt
 
 sop encrypt --sign-with=alice.sec --as=mime bob.pgp < msg.eml > encrypted.asc
-sop decrypt alice.sec < ciphertext.asc > cleartext.out
+sop decrypt --decrypt-with=alice.sec < ciphertext.asc > cleartext.out
 ~~~
 
 Subcommands

> Subcommands
> ===========
> 
> `sop` uses a subcommand interface, similar to those popularized by systems like `git` and `svn`.
>
> If the user supplies a subcommand that `sop` does not implement, it
> fails with a return code of 69.  If a `sop` implementation does not
> handle a supplied option for a given subcommand, it fails with a
> return code of 37.
 
In general, I feel using the numeric error codes in the document make it
(needlessly?) harder to read. When i got to this section, my first
reaction was: "69?? why 69? and why 37? where the heck do those come
from and why do they matter?" We should at least include a reference to
the "Failure modes section" in the Introduction section. In Terminology
maybe? And maybe refer to it here.

In general, I'm worried there might be inconsistencies between the table
in the "Failure modes" section and the various hardcoded integers
peppered through the document. This practice also makes the document
more difficult to review and maintain in the future. We might instead
use constant names like `SUCCESS`, `NO_GOOD_SIG` that *then* have
integer values in the later section. This could also provide for good
constants to use in a library implementation.

> For all commands that have an `--armor|--no-armor` option, it defaults
> to `--armor`, meaning that any output OpenPGP material should be
> ASCII-armored (section 6 of {{I-D.ietf-openpgp-rfc4880bis}})
> by default.
 
Is this on input or output? or both? It's clarified later, I
think, but it should be made explicit here as well.

[...]

> generate-key: Generate a Secret Key {#generate-key}
> -----------------------------------
> 
>     sop generate-key [--armor|--no-armor] [--] [USERID…]
> 
>  - Standard Input: ignored
>  - Standard Output: `KEY` ({{key}})
> 
> Generate a single default OpenPGP certificate with zero or more User
> IDs.

[...]

How do we generate purpose-specific subkeys?

[...]

> sign: Create a Detached Signature {#sign}
> ---------------------------------
> 
>     sop sign [--armor|--no-armor]
>          [--as={binary|text}] [--] KEY [KEY...]
> 
>  - Standard Input: `DATA` ({{data}})
>  - Standard Output: `SIGNATURE` ({{signature}})
> 
> `--as` defaults to `binary`.  If `--as=text` and the input `DATA` is
> not valid `UTF-8`, `sop sign` fails with a return code of 53.
 
Why do we mandate UTF-8 here? Explain.

In general, I find the `--as` arguments to be a little confusing and I
don't undrestand what they bring to the table.
 
> Example:
> 
>     $ sop sign --as=text alice.sec < message.txt >message.txt.asc
>     $ head -n1 < message.txt.asc
>     -----BEGIN PGP SIGNATURE-----
>     $

Another good example of the "argument vs option" problem. If I would see
a `sop sign` command, the first thing I would try would be:

    sop sign document

and expect it to find the right private key to sign the document
with. Of course, we don't do this in sop, which is fine, but I'll note
that we allow implementations to do so. By forcing the arguments here to
be the signing key, we make it difficult to let the implementation pick
the right key.

We should follow POLA (Principle Of Least Astonishment) here and allow
users to provide the document as an argument, and use something like
`--sign-with=KEY` (possibly multiple times) to provide the private key
material.

[...]

> encrypt: Encrypt a Message {#encrypt}
> --------------------------
> 
>     sop encrypt [--as={binary|text|mime}]
>         [--armor|--no-armor]
>         [--with-password=PASSWORD...]
>         [--sign-with=KEY...]
>         [--] [CERTS...]

[...]

> `--as` defaults to `binary`.

[...]

> If `--as` is set to either `text` or `mime`, then `--sign-with`
> will sign as a canonical text document.  In this case, if the input
> `DATA` is not valid `UTF-8`, `sop encrypt` fails with a return code of
> 53.
 
What is `mime` here? Why is it necessary? Expand.

[...]

> Example:
> 
> (In this example, `bob.bin` is a file containing Bob's binary-formatted OpenPGP certificate.
> Alice is encrypting a message to both herself and Bob.)
> 
>     $ sop encrypt --as=mime --sign-with=alice.key alice.asc bob.bin < message.eml >encrypted.asc
>     $ head -n1 encrypted.asc
>     -----BEGIN PGP MESSAGE-----
>     $

This is another good example of how the arguments and options can become
confusing. Looking at the above commandline, I can't tell what alice.asc
or bob.bin are. Is bob.bin a private key? Maybe not, because it's a
`.bin`. But what if it was also named `.asc`? What if I reverse the two
arguments by mistake? I could encrypt to bob instead of alice! Or wait,
those are *both* keys I encrypt to! Phew, I'm safe... But wait, what if
I forgot the `<`!!

See where I'm going? :)

Having an explicit --encrypt-with=alice.asc (or --encrypt-to?) would be
much better here. It would also make much more sense in an API.
 
> decrypt: Decrypt a Message {#decrypt}
> --------------------------
> 
>     sop decrypt [--session-key-out=SESSIONKEY]
>         [--with-session-key=SESSIONKEY...]
>         [--with-password=PASSWORD...]
>         [--verify-out=VERIFICATIONS
>          [--verify-with=CERTS...]
>          [--verify-not-before=DATE]
>          [--verify-not-after=DATE] ]
>         [--] [KEY...]
> 
>  - Standard Input: `CIPHERTEXT` ({{ciphertext}})
>  - Standard Output: `DATA` ({{data}})
> 
> `--session-key-out` can be used to learn the session key on
> successful decryption.

"learn"? What does that mean? It seems it means "write to a file". 
If so that should be said explicitely here.
 
> If `sop decrypt` fails for any reason and the identified `--session-key-out`
> file already exists in the filesystem, the file will be unlinked.
 
This seems dangerous! Why do we delete a file we haven't created?
Explain.

> `--session-key-in` enables decryption of the `CIPHERTEXT` using the session key directly against the `SEIPD` packet.
> This option can be used multiple times if several possible session keys should be tried.

What happens if both "in" and "out" are provided? I can venture a guess,
but it would be important to make that explicit as there can be horrible
bugs there.
 
> `--with-password` enables decryption based on any `SKESK` packets in the `CIPHERTEXT`.
> This option can be used multiple times if the user wants to try more than one password.
 
We should include SKESK in terminology, because it's the first time we
encounter it here and I have close to no idea what it means.
 
> If `sop decrypt` tries and fails to use a supplied `PASSWORD`, and it
> observes that there is trailing `UTF-8` whitespace at the end of the
> `PASSWORD`, it will retry with the trailing whitespace stripped.
 
Explain why we do magic things with whitespace. Consider not doing magic
at all as magic can be evil.
 
> `--verify-out` produces signature verification status to the
> designated file.
> 
> `sop decrypt` does not fail (that is, the return code is not modified)
> based on the results of signature verification.  The caller MUST check
> the returned `VERIFICATIONS` to confirm signature status.  An empty
> `VERIFICATIONS` output indicates that no valid signatures were found.
> If `sop decrypt` itself fails for any reason, and the identified
> `VERIFICATIONS` file already exists in the filesystem, the file will
> be unlinked.
> 
> `--verify-with` identifies a set of certificates whose signatures would be
> acceptable for signatures over this message.

Not failing explicitely on verification seems very dangerous. It relies
on callers properly reading the spec and realizing this is the only
exception where exit codes don't suffice in providing a general state of
the program. I would strongly recommend failing here, just like regular
verify.

As an aside, why can't we compose verify and decrypt here and just keep
"verify" out of "decrypt" altogether? I would guess that's (a
limitation?) part of the OpenPGP standard, but maybe it would be nice to
explicitely expand on this here as well.

> If the caller is interested in signature verification, both
> `--verify-out` and at least one `--verify-with` must be supplied.  If
> only one of these arguments is supplied, `sop decrypt` fails with a
> return code of 23.
 
Another argument for failing on bad signatures: if we fail on bad
arguments of --verify, why don't we fail on bad signatures?

> armor: armor: Add ASCII Armor
> -----------------------------

[...]

> If the incoming data is already armored, and the `--allow-nested` flag
> is not specified, the data MUST be output with no modifications.
> Data is considered ASCII armored iff the first 14 bytes are exactly
> `-----BEGIN PGP`. This operation is thus idempotent by default.
 
Explain why we want idempotent and why we want to do this guessing game.

[...]

> Input/Output Indirect Types
> ===========================
> 
> Some material is passed to `sop` indirectly, typically by referring to
> a filename containing the data in question.  This type of data may
> also be passed to `sop` on Standard Input, or delivered by `sop` to
> Standard Output.
> 
> If the filename for any indirect material used as input has the
> special form `@ENV:xxx`, then contents of environment variable `$xxx`
> is used instead of looking in the filesystem.
> 
> If the filename for any indirect material used as either input or
> output has the special form `@FD:nnn` where `nnn` is a decimal
> integer, then the associated data is read from file descriptor `nnn`.

This @ENV: and @FD: stuff really makes me uncomfortable. It's a neat
hack for commandline applications, but it would break down when
designing a library API, as the "type" of data passed around the API
would be ambiguous, or at least with possible side effects. That feels
like "design smell" here and I would like this to be changed.

I would recommend using equivalent environment variables to the
parameters instead, for example SIGN_WITH for --sign-with and so
on. This would, of course, require switching positional arguments to
options but I already explain why that would be a good idea anyways
earlier.

File descriptors could be passable as distinct options, like
--sign-with-fd for --sign-with.

I've dealt with commandline applications that have special meanings with
@, and in retrospect, it was a bad idea. In particular, Python's
argparse module supports using a prefix argument to mean "read options
from this file" and I've used it to implement crude configuration file
support for monkeysign and other programs. It's confusing for users and
does not work very well.

Specifically in this case, I would also worry about security
vulnerabilities with untrusted filenames being passed to the program.

[...]

> CERTS {#certs}
> -----
> 
> One or more OpenPGP certificates (section 11.1 of {{I-D.ietf-openpgp-rfc4880bis}}), aka "Transferable Public Key".
> May be armored.
> 
> Although some existing workflows may prefer to use one `CERTS` object with multiple certificates in it (a "keyring"), supplying exactly one certificate per `CERTS` input will make error reporting clearer and easier.

This last bit is in contradiction with `extract-cert` command
documentation which says it will "only contain one cert". Maybe we
should just pick one and stick with it here?
 

[...]

> VERIFICATIONS {#verifications}
> -------------
> 
> One line per successful signature verification.  Each line has three
> structured fields delimited by a single space, followed by arbitrary
> text to the end of the line.
> 
>  - ISO-8601 UTC datestamp
>  - Fingerprint of the signing key (may be a subkey)
>  - Fingerprint of primary key of signing certificate (if signed by primary key, same as the previous field)
>  - arbitrary text
 
That last part doesn't *look* like "arbitrary text" to me. It looks like
some explanatory message of the operation. If that's the case, we should
make that explicit and say why the text is present at all. Calling it a
"note" or "message" would already be an improvement.

> Example:
> 
>     2019-10-24T23:48:29Z C90E6D36200A1B922A1509E77618196529AE5FF8 C4BC2DDB38CCE96485EBE9C2F20691179038E5C6 certificate from dkg.asc

For the record, arbitrary text looks like:

> This is some garbage I just jklfa bldasjkl ajklblablabla.

I can provide more samples or arbitrary text as required. I'm an
experience freelance writer and large samples can be found on
https://anarc.at/ ;)

[...]

> Failure Modes
> =============
> 
> When `sop` succeeds, it will return 0 and emit nothing to Standard
> Error.  When `sop` fails, it fails with a non-zero return code, and
> emits one or more warning messages on Standard Error.  Known return
> codes include:
> 
> Return | Meaning
> ---:|--------------------------------------------------
>  0 | Success
>  3 | No acceptable signatures found (`sop verify`)
> 13 | Asymmetric algorithm unsupported (`sop encrypt`)
> 17 | Certificate not encryption-capable (e.g., expired, revoked, unacceptable usage flags) (`sop encrypt`)
> 19 | Missing required argument
> 23 | Incomplete verification instructions (`sop decrypt`)
> 29 | Unable to decrypt (`sop decrypt`)
> 31 | Non-`UTF-8` password (`sop encrypt`)
> 37 | Unsupported option
> 41 | Invalid data type (no secret key where `KEY` expected, etc)
> 53 | Non-text input where text expected
> 69 | Unsupported subcommand

I already mentioned some problems with those failure codes, but I will
repeat here the suggestion that symbolic names instead of integers
should be used for primary referencing in the document here again.

> A `sop` implementation MAY return other error codes than those listed
> above.
 
This sounds like a bad idea. I interpret that as meaning that I can
return an error code 2 instead of error code 3 if i fancy. If we're
going to pick numbers, we should either enforce them or not, but don't
dance around the issue and encourage people to diverge from the spec.

Or at least, if you allow divergence, explain why it can be allowed.
 
It would also be great if we could explain where those magic numbers
come from in the first place. I suspect they were chosen to not overlap
with existing error codes, but that's just a guess.

[...]

> Detached Signatures {#detached-signatures}
> -------------------
> 
> `sop` deals with detached signatures as the baseline form of OpenPGP signatures.
> 
> The main problem this avoids is the trickiness of handling a signature that is mixed inline into the data that it is signing.

Should we expand on "trickiness" here?
 
Also: how *do* we deal with inline signatures? Are those deprecated now?

[...]

> Guidance for Consumers
> ======================
> 
> While `sop` is originally conceived of as an interface for interoperability testing, it's conceivable that an application that uses OpenPGP for object security would want to use it.
> 
> FIXME: more guidance for how to use such a tool safely and efficiently goes here.
> 
> FIXME: if an encrypted OpenPGP message arrives without metadata, it is difficult to know which signers to consider when decrypting.
> How do we do this efficiently without invoking `sop decrypt` twice, once without `--verify-*` and again with the expected identity material?
 
Maybe we could use a "sop probe" command for this and other things?

> Security Considerations
> =======================
> 
> The OpenPGP object security model is typically used for confidentiality and authenticity purposes.
> 
> Signature Verification {#signature-verification}
> ----------------------
> 
> In many contexts, an OpenPGP signature is verified to prove the origin and integrity of an underlying object.
> 
> When `sop` checks a signature (e.g. via `sop verify` or `sop decrypt --verify-with`), it MUST NOT consider it to be verified unless all of these conditions are met:
> 
>  * The signature must be made by a signing-capable public key that is present in one of the supplied certificates
>  * The certificate and signing subkey must have been created before or at the signature time
>  * The certificate and signing subkey must not have been expired at the signature time
>  * The certificate and signing subkey must not be revoked with a "hard" revocation
>  * If the certificate or signing subkey is revoked with a "soft" revocation, then the signature time must predate the revocation
>  * The signing subkey must be properly bound to the primary key, and cross-signed
>  * The signature (and any dependent signature, such as the cross-sig or subkey binding signatures) must be made with strong cryptographic algorithms (e.g., not `MD5` or a 1024-bit `RSA` key)
 
Latter seems to contradict section 7.5, which says:

>    For performance reasons, an implementation may choose to ignore
>    validation on certificate and key material supplied to it.  The
>    security implications are of doing so depend on how the certs and
>    keys are managed outside of "sop".

So should we check supplied keys or not? but I guess that's covered by...

[...]

> Signature validity is a complex topic, and this documentation cannot list all possible details.

Is there a reference we could add here to cover that topic?

> Compression {#compression}
> -----------
> 
> The interface as currently specified does not allow for control of compression.
> Compressing and encrypting data that may contain both attacker-supplied material and sensitive material could leak information about the sensitive material (see the CRIME attack).
> 
> Unless an application knows for sure that no attacker-supplied material is present in the input, it should not compress during encryption.

How about decryption? Do we attempt decompression during decrypt?

That's about it! My comments might be a little dry and are the results
of notes I jotted down on "paper" (a PDF really), a copy of which is
available here:

https://paste.anarc.at/publish/2019-11-10-sxrDKhJL9R0/sop-Exported.pdf

I hope the comments are still useful and please don't interpret them as
me being upset or too critical. I love this work and it's only in a
spirit of collaboration that I bring up those concerns. :)

Thank you for your work!

A.

PS: I mistakenly took the modified version of the document, as available
on the HEAD of the merge request, to comment on the document. I have
tried, as much as possible, to revert changes to the original in the
examples, but some other changes I suggested might have crept up in the
quoted text. That shouldn't affect the comments I have brought up in any
other way.

-- 
Antoine Beaupré
torproject.org system administration