Re: [openpgp] text signatures

Daniel Huigens <d.huigens@protonmail.com> Fri, 02 December 2022 18:15 UTC

Date: Fri, 02 Dec 2022 18:15:24 +0000
To: "Neal H. Walfield" <neal@walfield.org>
From: Daniel Huigens <d.huigens@protonmail.com>
Cc: IETF OpenPGP WG <openpgp@ietf.org>
Message-ID: <MKqSg9Ykhc47sN0BfE2hC72p8ZTNGOhDprlFpUTRnCl3SiXls8iEVQ8S1Zq_jmzBvseTpO91SudCVirz0UYAQ8a_qbeVF8c7RtHb4aTo1pg=@protonmail.com>
In-Reply-To: <87bkol4x1c.wl-neal@walfield.org>
References: <87mt8b5dmk.wl-neal@walfield.org> <DoVNV3lGG-ohQLhfXZW5zx49v5_y8QHxza1uhOXUhjpY_bhVEX8B1hd8OG-ZHx1--EiV0039t-Oz9zTUBEGROaaHWWBp8ejfXhZzXFMIjc0=@protonmail.com> <87bkol4x1c.wl-neal@walfield.org>
Feedback-ID: 2934448:user:proton
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/u6ZdubpZubjAlerB7WgNzhnXDko>
Subject: Re: [openpgp] text signatures
Precedence: list

Hey Neal,

> My understanding of the current text in 5.2.4. is that the signing and
> verification routines have to do the conversion to UTF-8 on the fly,
> i.e., the signed data may not actually be encoded with UTF-8, and the
> converted data is not necessarily emitted. That, of course, implies
> that the implementation somehow knows the correct encoding, which I
> don't think is always true.

The intention of this change was not to introduce a conversion between
different encodings when computing signatures. IMO it should either have
happened earlier, or the implementation should use type=binary, instead.

RFC 4880 already says that:

   Unless otherwise specified, the character set for text is the UTF-8
   [RFC3629] encoding of Unicode [ISO10646].

so - the original change was just trying to clarify that at this point,
the string (conceptually an array of Unicode code points, say) needs to
be encoded to UTF-8 bytes when hashing it.

In implementations / programming languages whose string type represents
an array of Unicode code points, this seems obvious. In implementations
/ languages whose string type is just an array of bytes, less so. Worse,
existing such implementations have taken (non-UTF-8) strings and used
them here, violating RFC4480.

But, I think the intended fix here is not to introduce a conversion step
during signing, rather, implementations should either do so much earlier
(and also emit the converted UTF-8-encoded string in the literal data
packet, if any), or even require the application to only pass Unicode
strings, or in case it's passed binary or a string of unknown encoding
(if that's possible in the used programming language), generate a
signature with type=binary.

Particularly for detached signatures, this is important because if the
(non-UTF-8) text is transmitted separately from the signature, it won't
verify if we convert it during signing but not verification, so I think
it's better to just consider it as opaque bytes.

> Perhaps the change to how signatures are computed in 5.2.4 should only
> apply to v5 signatures, since it represents a change in semantics.

The above seems like backwards-compatible guidance, so I don't think we
really need to restrict this to v5 signatures?

> Would it make sense to add a warning to the text that older software
> may not have performed the encoding when hashing the signature data?

Yeah, we could add a warning that older software may have signed
non-UTF-8-encoded data as type=text, indeed.

> Then, for historical v3 and v4 signatures, an OpenPGP implementation
> could create two hash contexts, one where only line ending
> normalization is performed, and another where both line ending and
> UTF-8 conversions are performed?

This shouldn't be needed, even to verify non-conformant signatures out
there, I believe it should be sufficient to always take the signed data
as-is, perform line ending normalization on it, and verify that. The
only thing to keep in mind is that since it might not be UTF-8 encoded
text, it's best to do the line ending normalization in terms of bytes
(i.e. convert lone 0x0A to 0x0D 0x0A). For spec-conform signatures, you
can instead operate on it as a unicode string (i.e. convert \n to \r\n).

> I'd be for requiring that text signatures (signature type 0x1) made by
> v5 key be over UTF-8 data. Otherwise, they should be considered
> invalid.

Yeah, I agree.

Best,
Daniel

[openpgp] text signatures Neal H. Walfield
Re: [openpgp] text signatures Daniel Huigens
Re: [openpgp] text signatures Neal H. Walfield
Re: [openpgp] text signatures Daniel Huigens