Re: [openpgp] Ambiguity about signature vs literal packet encoding formats

Daniel Huigens <d.huigens@protonmail.com> Wed, 24 January 2024 19:52 UTC

Return-Path: <d.huigens@protonmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7EC88C151068 for <openpgp@ietfa.amsl.com>; Wed, 24 Jan 2024 11:52:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=protonmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5YdFNU5P6zmZ for <openpgp@ietfa.amsl.com>; Wed, 24 Jan 2024 11:52:31 -0800 (PST)
Received: from mail-40134.protonmail.ch (mail-40134.protonmail.ch [185.70.40.134]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1C2A8C14CEFD for <openpgp@ietf.org>; Wed, 24 Jan 2024 11:52:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1706125948; x=1706385148; bh=NyAUIPA0tgJhdMYaXg899Z6QA49eyX534PaYb2azyAI=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=Tq21U79BUhLm31kQPnnzzWsq8hJUmj6jKXEQ2InD5DRuEnxTAnCPKhy3HGZP7I/mS ofime8dQCQ262rCqXAPYCP6iJpkmLGw8M1p4bIgtPtTeGGDZg16M1ywMAgldp9XG3A 8ZnluZv9n3LGKmDcMAGhH08Aq/GmtPK14MkXelHl6WQUStm40/pf8Y1ZbLZ7NZ5duo 78l4FLQW7ubUxhe6P0TlX2l3R65r+WcykQ9xbh87lrDDt3LAHTnzkF+g8QlJJaDTHI M3pVrHIH2fmFpIMPVlJdAu4mNuKOvmaguCWYNba18IpchozbeOTApl5kayuABtxL5/ U7qZICmF+STJw==
Date: Wed, 24 Jan 2024 19:52:07 +0000
To: Andrew Gallagher <andrewg=40andrewg.com@dmarc.ietf.org>
From: Daniel Huigens <d.huigens@protonmail.com>
Cc: "openpgp\\\\\\\\@ietf.org" <openpgp@ietf.org>
Message-ID: <aHEmWkAO8Z1wm2aFOWHmNc760rHPteBvjs5avYK4yMot8_AX1zPfXTmSGmPVXqIfFV0FHAjbcw9YX-8qA_GDxFS6qP4ed0dAXZ8RyN56dy0=@protonmail.com>
In-Reply-To: <69543475-D5B7-474C-AAB0-0CF9989D4B4D@andrewg.com>
References: <69543475-D5B7-474C-AAB0-0CF9989D4B4D@andrewg.com>
Feedback-ID: 2934448:user:proton
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/2KN_wxt4CMoh_9-Qb5kp7epLWYQ>
Subject: Re: [openpgp] Ambiguity about signature vs literal packet encoding formats
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jan 2024 19:52:35 -0000

Hi Andrew,

After struggling with similar issues in OpenPGP.js some years back,
my mental model is as follows:

- The literal data type gives a hint about what kind of data it might
  contain*
- The signature type gives an instruction about how to normalize the
  data before hashing it during signing and verification

Signature type 0x01 "simply" means that 0x0A bytes (ASCII Line Feed)
without a preceding 0x0D byte (ASCII Carriage Return) should be
converted to 0x0D 0x0A (CR LF) before hashing.

We used to also normalize lone 0X0D (CR) to CR LF, but no longer do,
and haven't had complaints, so this doesn't appear to be necessary
for interoperability.

Note that the above doesn't involve interpreting the data as text, nor
does it rely on its encoding, and thus it may seem "wrong" if the data
contains text in an encoding such as UTF-16, where the above procedure
does not result in converting \n to \r\n. (In fact it might result in
invalid UTF-16 - but again it's not interpreted as text and won't end
up on the wire, this is just the bytes passed to the hash function.)
It's a bit strange but this seems to be most interoperable :)

I think I brought this topic up at some point but it was
(reasonably) deemed to be out of scope for the crypto refresh. But
indeed it would be valuable to write this down somewhere, I think.

---

(*) In the crypto refresh, this was made stronger, so that in the future
the mental model could be:

- The literal data type indicates what kind of data it contains,
  where `b` could be any data, and `u` must be UTF-8 encoded text

But, to be pedantic, the statement that "the Literal packet contains
binary data" does not, in my opinion, preclude it from containing text,
because UTF-8 encoded text is also binary data. So I don't think the
crypto refresh is contradictory on that point.

That being said, I do agree that generating a literal data packet with
type `b` and then a signature with type 0x01 (text) is quite strange,
and perhaps we could discourage or disallow that in the future. But
currently I would argue it's allowed and receiving implementations
should handle it (for better or worse).

Best,
Daniel


On Wednesday, January 24th, 2024 at 19:52, Andrew Gallagher wrote:

> Hi, all.
> 
> It has come to my attention that it is possible (and apparently permitted!) to generate a text (0x01) signature over a binary literal data packet. The test suite does not currently cover this scenario well, seemingly because of lack of support for the required SOP commands:
> 
> https://tests.sequoia-pgp.org/#Signed_messages
> 
> https://gitlab.com/sequoia-pgp/openpgp-interoperability-test-suite/-/issues/17
> https://gitlab.com/sequoia-pgp/openpgp-interoperability-test-suite/-/issues/63
> https://gitlab.com/sequoia-pgp/openpgp-interoperability-test-suite/-/issues/73
> 
> But it would now appear that this is a real interop issue:
> 
> https://github.com/rpgp/rpgp/pull/263
> 
> tl;dr: A text document can be encoded using non-normalised line endings in a binary literal data packet, and have a text signature over it that is correctly calculated over a normalised (but ephemeral) copy of that document. If this signature is detached, and then verified over the detached document, it is obvious that the document must be normalised before verification, and the verification should therefore succeed. But it does not appear to be obvious that the same is true of that document before detachment, since at least two implementations do not currently perform this normalisation.
> 
> Does the group feel this a case of the receiving implementations not being sufficiently lenient in what they consume, or the generating implementations not being sufficiently strict in what they produce? ;-)
> 
> Crypto-refresh appears to be contradictory. It allows generating applications to put text data in a binary-marked literal data packet, even if the application knows for sure it will be text-signed:
> 
> > If the implementation is certain that the data is textual and is encoded with UTF-8 (for example, if it will follow this literal data packet with a signature packet of type 0x01 (see Section 5.2.1), it MAY set the format octet to u. Otherwise, it MUST set the format octet to b.
> 
> 
> But it tells a receiving application that `b` means binary data, without exception:
> 
> > The body of this packet consists of:
> > • A one-octet field that describes how the data is formatted.
> > If it is a b (0x62), then the Literal packet contains binary data.
> 
> 
> https://datatracker.ietf.org/doc/html/draft-ietf-openpgp-crypto-refresh-13#name-literal-data-packet-type-id
> 
> If this field does not contain authoritative information about the the literal data packet encoding, would it not make more sense to default it to NUL, or some other meaningless value, rather than a potentially misleading one? Otherwise, surely the literal data packet encoding MUST match the signature type? Alternatively a receiving application MUST ignore this field and infer it from the signature type when verifying that signature.
> 
> A
> 
> PS: Before DKG panics, this is aimed at the future semantics clarification document, which appears more and more necessary as time passes :-)
> 
> _______________________________________________
> openpgp mailing list
> openpgp@ietf.org
> https://www.ietf.org/mailman/listinfo/openpgp