[openpgp] text vs. binary in an OpenPGP "Signed Message"

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Sat, 22 February 2025 00:25 UTC

Return-Path: <dkg@fifthhorseman.net>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A7ACC1CAF42 for <openpgp@ietfa.amsl.com>; Fri, 21 Feb 2025 16:25:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.d=fifthhorseman.net header.b="t/lN3VhV"; dkim=pass (2048-bit key) header.d=fifthhorseman.net header.b="C79N7APJ"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RlHmnh663PYO for <openpgp@ietfa.amsl.com>; Fri, 21 Feb 2025 16:25:37 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [IPv6:2001:470:1:116::7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9328CC1D4CD8 for <openpgp@ietf.org>; Fri, 21 Feb 2025 16:25:37 -0800 (PST)
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019; t=1740183935; h=from : to : subject : date : message-id : mime-version : content-type : from; bh=4hFA3olQlD5l+CSYLepX4fx87Jdi36UgwsLHyyu15rA=; b=t/lN3VhVGsoGhdZu/dWpp7N9xJ0PgUceWlq5f20aC7jbFAclu9msVeMCZiGoXiDlZlutC BzLzYrgylPsZpWwBw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019rsa; t=1740183935; h=from : to : subject : date : message-id : mime-version : content-type : from; bh=4hFA3olQlD5l+CSYLepX4fx87Jdi36UgwsLHyyu15rA=; b=C79N7APJvTShkL7tsGYLhFRyUVa4xUoyGDJ5//oD8ZUPMvY+0OyF2/aedEj+hKZIv+PqG IpGwqmdsH4UinxFuWJ875TS92I3teJhBCugTM0fAIO9dRAbhfk5XfL+WOEm+HsNMecgDiPw roYsR1y9cj70B2zieUmLdarHAdC2xmKF2Y9kTGlfVbcUrM+N9cVHZEGkOD+DHEDZORh8gWE aDE2UPRBK1a2gsf4a6c2JCQavUXQz1DyRZVOQipdKRGxSWpBB4qUZSyJt/PEd6QHa7BkL5I RZwnrLvR44RyvlAH6yTZNb1dFPR5EIFq23+pz/8EYdBcarGlvYxJSmh+T4cw==
Received: from fifthhorseman.net (unknown [IPv6:2600:380:bc31:a755:a14f:7ac7:aa15:1bde]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id B57C9F9B1 for <openpgp@ietf.org>; Fri, 21 Feb 2025 19:25:24 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 5BC8413F69B; Fri, 21 Feb 2025 19:25:18 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: openpgp@ietf.org
Autocrypt: addr=dkg@fifthhorseman.net; prefer-encrypt=mutual; keydata= xjMEZXEJyxYJKwYBBAHaRw8BAQdA5BpbW0bpl5qCng/RiqwhQINrplDMSS5JsO/YO+5Zi7HNFzxk a2dAZmlmdGhob3JzZW1hbi5uZXQ+wsARBBMWCgB5AwsJB0cUAAAAAAAeACBzYWx0QG5vdGF0aW9u cy5zZXF1b2lhLXBncC5vcmcS78JIJ7JbALqPiKEmva7/Pp16WwXWm9hbe5+B/UvnfwMVCggCmwEC HgEWIQTUdwQMcMIValwphUm7fpEBSV5r9wUCZadfkAUJBdnwRQAKCRC7fpEBSV5r9yNXAP442N0c zvisBroQSKKpo+OWm2JpnEJWoVheeJvoRtkBGQEA+edHylby8IGcNccq7rmM2rAXdofvrU1o6qow V+mmDwbOMwRnio4OFgkrBgEEAdpHDwEBB0Cw9HzJFl9lZn3UBaUqSMSgxjcdbd0MwNVcGZ8t8wdN EcLAvwQYFgoBMQWCZ4qODgkQu36RAUlea/dHFAAAAAAAHgAgc2FsdEBub3RhdGlvbnMuc2VxdW9p YS1wZ3Aub3JnhcN+tn41cAg01Kk56zcAfpdsh8j98PDe00mqKPfFvaYCmwK+oAQZFgoAbwWCZ4qO DgkQeAuFTtnCtJZHFAAAAAAAHgAgc2FsdEBub3RhdGlvbnMuc2VxdW9pYS1wZ3Aub3JnxsD8Sk5P Wgx8c/Zseo6OlCjyDC+Ogm17gTaUUIpxjWYWIQRjrBGOWy5dZsiKhad4C4VO2cK0lgAAdcQA/1RG dmrmvVxkBY2qNPjtERNwPga8Pf4IdlenrZ03NXM4AQC+TDHMpD7d5obEvUy8GYI3oThzYItPP8vv ChY+wbaIBRYhBNR3BAxwwhVqXCmFSbt+kQFJXmv3AAAKbgD+K1MZXnRKPdmA8DgNysyGRZY8cSVH HQcC7ZAAtV3i2+wA/0CyOYrbFYbyTRALgoERR07OHFoP+fJopQLMNQARVUELzjgEZ4qN+RIKKwYB BAGXVQEFAQEHQDTGlR+Qmn334e+bPqvojJVdFsiBf0leAAHP+ESqop8NAwEIB8LAAAQYFgoAcgWC Z4qN+QkQu36RAUlea/dHFAAAAAAAHgAgc2FsdEBub3RhdGlvbnMuc2VxdW9pYS1wZ3Aub3JnA5Lw b3wOOcoodImuVNw4PYq1U65FDC1Q2JMFIcJXqF0CmwwWIQTUdwQMcMIValwphUm7fpEBSV5r9wAA 6egA/j3QANSmogZ5VTF5KlI+BBye9ud/w9j7RLcCHU6u8AA1AQC3FGaNuv+uWOSa+eeEoI/aZrGd X5el8b/m6aXDDxDjDg==
Date: Fri, 21 Feb 2025 19:25:17 -0500
Message-ID: <871pvq4yhe.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Message-ID-Hash: I3LEHQQ3PLPQOMGLEIBAUA3HVWK74JG4
X-Message-ID-Hash: I3LEHQQ3PLPQOMGLEIBAUA3HVWK74JG4
X-MailFrom: dkg@fifthhorseman.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-openpgp.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [openpgp] text vs. binary in an OpenPGP "Signed Message"
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/RLMBugGhg_c9xT7zmDrkBklRZB8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Owner: <mailto:openpgp-owner@ietf.org>
List-Post: <mailto:openpgp@ietf.org>
List-Subscribe: <mailto:openpgp-join@ietf.org>
List-Unsubscribe: <mailto:openpgp-leave@ietf.org>

I've been looking into interoperability for the OpenPGP "Signed Message"
construct, and found myself in a corner of unspecified behavior due to
the distinct ways that OpenPGP differentiates between "text" and
"binary" data.

The simplest modern OpenPGP "Signed Message" looks like this:

- One-Pass Signature Packet (OPS)
- Literal Data Packet (LIT)
- Signature Packet (SIG)

Compactly, we can represent that as "OPS LIT SIG"

Each Literal Data Packet can be tagged with a "format" octet [0].  The
two well-specified values for this format octet are either 'b' (binary)
or 'u' (Unicode text).  A 'u'-formatted packet…

> MUST be encoded with UTF-8 (see [RFC3629]) and stored with <CR><LF>
> text endings (that is, network-normal line endings).

[0] https://www.rfc-editor.org/rfc/rfc9580.html#section-5.9-3.1.1

I'll denote a Literal Data Packet with format 'b' as LITb, and with 'u'
as LITu.

The Signature packet has a Signature Type octet [1], which for the
OpenPGP Message construct is probably constrained to one of two values:

- 0x00 - Binary Document Signature
- 0x01 - Text Document Signature

Signature type 0x01 is expected to be made over …

> the text data with its line endings converted to <CR><LF>.

[1] https://www.rfc-editor.org/rfc/rfc9580.html#sigtype-text

I'll denote a Binary Document Signature Packet as SIG0 and a Text
Document Signature Packet as SIG1.

The OPS has a corresponding Signature Type octet [2], which I believe is
expected match the Signature Type octet of the corresponding signature.
I'll denote these as OPS0 and OPS1.

[2] https://www.rfc-editor.org/rfc/rfc9580.html#section-5.4-3.2.1

Let's start by assuming that the OPS matches the SIG (though
i don't see where the spec mandates that), and that we shouldn't expect
any implementation to validate such a Signed Message.

So, we still have four different valid kinds of "Signed Message" with a
single signature:

- (a) OPS0 LITb SIG0
- (b) OPS1 LITt SIG1
- (c) OPS0 LITt SIG0
- (d) OPS1 LITb SIG1

Every implementation i've tested appears to agree how to verify (a) and
(b).  I've yet to find an implementation that generates (c).  But i've
found multiple implementations that produce (d):

    https://gitlab.com/sequoia-pgp/sequoia-sop/-/issues/46 and
    https://github.com/pgpainless/pgpainless/issues/465, so far…

There are at least two different interpretations about what to do with
(d): GnuPG attempts to verify the unmodified bytestream within the LITb,
without converting the line-endings to CRLF, while every other
implementation i've tried (including at least RNP, Sequoia, PGPainless,
rpgp, and gosop) appears to try to convert the bytestream to CRLF before
verification.

So i guess my questions for the WG are:

- Is it a bug if a signer produces (c) or (d)?

- Should a verifier reject (c) or (d) automatically as malformed?

- If not, should a verifier that encounters (d) attempt to apply CRLF
  line endings to the LITb?

What do you think?

     --dkg

PS For now, maybe we can set aside the uglier scenarios, like mismatched
signature types in the OPS+SIG combo, multiple SIGs of different 
types in the same message, or these constructs within an encrypted or
compressed wrapping. 😬