[openpgp] Proposing a Simplification of Message Syntax
Paul Schaub <vanitasvitae@fsfe.org> Fri, 07 October 2022 12:57 UTC
Return-Path: <vanitasvitae@fsfe.org>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6B3A8C14CE2E for <openpgp@ietfa.amsl.com>; Fri, 7 Oct 2022 05:57:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.237
X-Spam-Level:
X-Spam-Status: No, score=-1.237 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rn2zwKI6UqCj for <openpgp@ietfa.amsl.com>; Fri, 7 Oct 2022 05:57:53 -0700 (PDT)
Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2BEEEC14F740 for <openpgp@ietf.org>; Fri, 7 Oct 2022 05:57:52 -0700 (PDT)
Received: from fews1.riseup.net (fews1-pn.riseup.net [10.0.1.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.riseup.net", Issuer "R3" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 4MkSyl4XB6zDrrR for <openpgp@ietf.org>; Fri, 7 Oct 2022 12:57:51 +0000 (UTC)
X-Riseup-User-ID: 5A710F9410A39B8FE5FFC485323442388A989CDCA9D5D5BE4CB44F81225B0166
Received: from [127.0.0.1] (localhost [127.0.0.1]) by fews1.riseup.net (Postfix) with ESMTPSA id 4MkSyl019vz5vXG for <openpgp@ietf.org>; Fri, 7 Oct 2022 12:57:50 +0000 (UTC)
Content-Type: multipart/alternative; boundary="------------R6XEtdag9OAHihhjT6xJ5VnW"
Message-ID: <ee13a1f7-95b8-425a-24e0-73d42f8f7b1e@fsfe.org>
Date: Fri, 07 Oct 2022 14:57:48 +0200
MIME-Version: 1.0
To: openpgp@ietf.org
Content-Language: en-US
From: Paul Schaub <vanitasvitae@fsfe.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/uepOF6XpSegMO4c59tt9e5H1i4g>
Subject: [openpgp] Proposing a Simplification of Message Syntax
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Oct 2022 12:57:57 -0000
Hey List! I hope this mail can distract a bit from the current drama ;) Recently I attempted to create a parser for OpenPGP messages which *only* accepts messages that are valid according to the syntax of OpenPGP messages. (See https://www.rfc-editor.org/rfc/rfc4880#section-11.3 for the syntax). This syntax is very *very* flexible, probably too flexible. The interop test suite shows how different implementations do or do not reject malformed messages: https://tests.sequoia-pgp.org/#Malformed_messages This triggered my efforts of creating a new parser. While I have come up with a solution that validates the syntax for every possible valid message, I think it is worthwhile trying to simplify the syntax to make implementing parsers easier. In case you are interested, I wrote a blog post about my findings: https://blog.jabberhead.tk/2022/09/14/using-pushdown-automata-to-verify-packet-sequences/ One stumbling block I came across while implementing is, that it is not totally clear, what data is even hashed into a signature: When creating a signed message over the plaintext "Foo" using a One-Pass-Signature, I understand that depending on the signature type, "Foo"s line endings are converted and the result is hashed. Note, that the Literal Data Packet itself is not hashed, so neither the filename, nor the modification date are protected. This makes sense to some degree and I understand this decision. So when verifying the message 'OPS LIT("Foo") SIG', only "Foo" is used to calculate the hash. But are signatures always created over the plaintext, regardless of the position of the signature in the message? When creating a signed message while using compression, most implementations generate a message like this: 'COMP(OPS LIT("Foo") SIG)'. Verifying such a message is pretty straight forward. Decompress the message and use the content "Foo" to calculate the signature hash. But the message syntax allows for more crazy constructions: The message 'OPS COMP(LIT("Foo")) SIG' is valid according to the message syntax. But what data must be used to calculate the signature? "Foo" or COMP(LIT("Foo"))? I understand that most implementations assume "Foo" to be the signed data once again. The message 'OPS ENC(LIT("Foo")) SIG' illustrates best what my point is I think: If in this case the signature is once again computed over "Foo", then a) the verifier needs to decrypt the message before verifying and b) information about "Foo" is leaked as part of the hash value outside of the encrypted container. It is therefore not very far-fetched to assume that the signature might have been made over 'ENC(LIT("Foo"))' instead of "Foo", perhaps as some kind of notarization. This would not require access to the decryption key, neither for generating, nor for verifying the signature. While I think the specifications intent is that "Foo" is signed, other interpretations might not be impossible. Speaking of notarizations; The One-Pass-Signature packet has a special "nested" flag, which if i understand correctly can be used to make "Signatures over Signatures". In the following, OPS[1] means a One-Pass-Signature with the nested flag set to 1. In the message 'OPS[1](OPS[0] LIT SIG) SIG' for example, the "inner" part of the message 'OPS[0] LIT SIG' is a normal signed message, while the outer part 'OPS[1] [...] SIG' is a nested signature. But what information is used to calculate the hash for the outer signature? I *think* some implementations calculate the signature over the encoding of the inner message, so 'OPS[0] LIT SIG'. The description of the nested flag is very ambiguous.Does for instance 'OPS[1] LIT("Foo") SIG' need to be handled different from 'OPS[0] LIT("Foo") SIG'? What about 'OPS[1] COMP(OPS[0] LIT SIG) SIG'? Handling 'OPS[0] OPS[1](OPS[0] COMP(OPS[0] OPS[1](SIG ENC(OPS[0] LIT SIG)) SIG) SIG) SIG) SIG SIG' would be a nightmare (I'm not even sure if I bracketed this example properly), although it is a perfectly valid message according to the syntax. Sure, this example is way over-exaggerated, but still illustrates my point). I would like to propose to simplify the message syntax to get rid of some of the craziness: * PLAINTEXT := Plaintext * LITERAL_MESSAGE := Literal Data Packet over PLAINTEXT * PREPEND-SIGNED_MESSAGE := Signature Packet, LITERAL_MESSAGE | Signature Packet, PREPEND-SIGNED_MESSAGE * ONE-PASS-SIGNED_MESSAGE := One-Pass-Signature Packet, LITERAL_MESSAGE, Signature Packet | One-Pass-Signature Packet, ONE-PASS-SIGNED_MESSAGE, Signature Packet * SIGNED_MESSAGE := PREPEND-SIGNED_MESSAGE | ONE-PASS-SIGNED_MESSAGE * COMPRESSED_MESSAGE := Compressed Data Packet over LITERAL_MESSAGE or over SIGNED_MESSAGE * ESKS := ESK | ESK ESKS * ESK := Public-Key-Encrypted Session-Key Packet | Symmetric-Key-Encrypted Session-Key Packet * ENCRYPTED_DATA := Symmetrically Encrypted Integrity Protected Data Packet over LITERAL_MESSAGE or over SIGNED_MESSAGE or over COMPRESSED_MESSAGE * ENCRYPTED_MESSAGE := ENCRYPTED_DATA | ESKS ENCRYPTED_DATA (I left out Symmetrically Encrypted Data Packets (without Integrity Protection) for sake of simplicity. Position of the Modification Detection Packet is left as an exercise to the reader as well) Effectively this means: * OPS packets can only be followed by a LIT packet or other OPS packets. * "prepended" SIG packets can also only be followed by either a LIT packet or other SIG packets. No mixing of One-Pass-Signatures and "prepended"-style signatures. * COMP packets can only be created around LIT packets, OPS* LIT SIG* combos or SIG* LIT combos. A message can further only contain at max. one single COMP layer. No more COMP(ENC) or COMP(COMP(COMP(...))) shenaningans, as that doesn't make sense to do anyways. I know that there are proponents to the idea of getting rid of the Compressed Data Packet altogether. * At most one ENC layer per message. If a message uses encryption, the ENC layer MUST be the outer-most layer of a message. Further I'd like to propose the following interpretation for the nested flag: * If the "nested" flag of an OPS packet is set, the encoding of subsequent packets until its corresponding signature packet is used to calculate the signature. This means that 'OPS[1] LIT("Foo") SIG' is calculated over 'LIT("Foo")' instead of "Foo", which might break with existing implementations, though only if the nested flag is set. This should make it very clear, what the nested flag does. * In case of 'OPS[0] OPS[1] OPS[0] LIT("Foo") SIG SIG SIG', the signatures are calculated over (from inner to outer): "Foo", 'OPS[0] LIT("Foo") SIG', 'OPS[0] LIT("Foo") SIG' * In case of 'OPS[1] OPS[1] OPS[1] LIT("Foo") SIG SIG SIG', signatures are calculated over (inner to outer): LIT("Foo"), 'OPS[1] LIT("Foo") SIG', 'OPS[1] OPS[1] LIT("Foo") SIG SIG'. The nested flag acts kind of like a bracket, so in 'OPS[0] OPS[1] OPS[0] LIT("Foo") SIG SIG SIG' the two outermost signatures are "on the same level" and both over the inner most sig + message. This means that a parser which encounters an OPS packet with flag 1, it can assume that all subsequent data needs to be hashed. If instead an OPS packet with flag 0 is found, hashing can be postponed until either another OPS with flag 1 is found (in which case data following that OPS is hashed), or a literal data packet, in which case the literal data packets content is hashed. What do you think? Did I miss or misunderstood anything? Are you depending on use-cases that might break by this simplification? Happy Hacking!
- Re: [openpgp] Proposing a Simplification of Messa… Paul Schaub
- [openpgp] Proposing a Simplification of Message S… Paul Schaub
- Re: [openpgp] Proposing a Simplification of Messa… Paul Schaub
- Re: [openpgp] Proposing a Simplification of Messa… Daniel Kahn Gillmor