Re: [openpgp] text signatures

Daniel Huigens <d.huigens@protonmail.com> Fri, 02 December 2022 18:15 UTC

Return-Path: <d.huigens@protonmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D8388C14CF02 for <openpgp@ietfa.amsl.com>; Fri, 2 Dec 2022 10:15:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=protonmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hPV1j6v3JcCl for <openpgp@ietfa.amsl.com>; Fri, 2 Dec 2022 10:15:32 -0800 (PST)
Received: from mail-4322.protonmail.ch (mail-4322.protonmail.ch [185.70.43.22]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 09D61C14F733 for <openpgp@ietf.org>; Fri, 2 Dec 2022 10:15:32 -0800 (PST)
Date: Fri, 02 Dec 2022 18:15:24 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1670004930; x=1670264130; bh=iMT523D/5iOqHXjuESau3Ll1K7b5Mi9Vl/vZo0IIklU=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=RSnAu7kUT198fMYr9r+H3A6DKJKERJeRUee4klWqwiAkHhvbdjGjRdZyuyVhRXyGo SuZC7dgs1Ae28M8fUpFrUCaB3eCDI5pJB9oQ7w4JVg7gsbQZPgQBzf/yvQwOivN1x8 dgj89L5LA61oxwb4u/grNADXaQeqri2rpJdCvU/VTJCxjcxlp1ntKp0pNiVx2PMbjp uf7tjz/Y23bBDuF1ejzFKDp/1KzObKdbS7IicPmSfRoudyz1vcDQw4rKoWz78+Yr1Q EZ0IZVfaGDge+2ZFW8U68X/rnHruUfoUsTXEWDj/x8+Vdog2OAad9ORh8+cEQGkOup JnZEkt7Snk/Xw==
To: "Neal H. Walfield" <neal@walfield.org>
From: Daniel Huigens <d.huigens@protonmail.com>
Cc: IETF OpenPGP WG <openpgp@ietf.org>
Message-ID: <MKqSg9Ykhc47sN0BfE2hC72p8ZTNGOhDprlFpUTRnCl3SiXls8iEVQ8S1Zq_jmzBvseTpO91SudCVirz0UYAQ8a_qbeVF8c7RtHb4aTo1pg=@protonmail.com>
In-Reply-To: <87bkol4x1c.wl-neal@walfield.org>
References: <87mt8b5dmk.wl-neal@walfield.org> <DoVNV3lGG-ohQLhfXZW5zx49v5_y8QHxza1uhOXUhjpY_bhVEX8B1hd8OG-ZHx1--EiV0039t-Oz9zTUBEGROaaHWWBp8ejfXhZzXFMIjc0=@protonmail.com> <87bkol4x1c.wl-neal@walfield.org>
Feedback-ID: 2934448:user:proton
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/u6ZdubpZubjAlerB7WgNzhnXDko>
Subject: Re: [openpgp] text signatures
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Dec 2022 18:15:36 -0000

Hey Neal,

> My understanding of the current text in 5.2.4. is that the signing and
> verification routines have to do the conversion to UTF-8 on the fly,
> i.e., the signed data may not actually be encoded with UTF-8, and the
> converted data is not necessarily emitted. That, of course, implies
> that the implementation somehow knows the correct encoding, which I
> don't think is always true.

The intention of this change was not to introduce a conversion between
different encodings when computing signatures. IMO it should either have
happened earlier, or the implementation should use type=binary, instead.

RFC 4880 already says that:

   Unless otherwise specified, the character set for text is the UTF-8
   [RFC3629] encoding of Unicode [ISO10646].

so - the original change was just trying to clarify that at this point,
the string (conceptually an array of Unicode code points, say) needs to
be encoded to UTF-8 bytes when hashing it.

In implementations / programming languages whose string type represents
an array of Unicode code points, this seems obvious. In implementations
/ languages whose string type is just an array of bytes, less so. Worse,
existing such implementations have taken (non-UTF-8) strings and used
them here, violating RFC4480.

But, I think the intended fix here is not to introduce a conversion step
during signing, rather, implementations should either do so much earlier
(and also emit the converted UTF-8-encoded string in the literal data
packet, if any), or even require the application to only pass Unicode
strings, or in case it's passed binary or a string of unknown encoding
(if that's possible in the used programming language), generate a
signature with type=binary.

Particularly for detached signatures, this is important because if the
(non-UTF-8) text is transmitted separately from the signature, it won't
verify if we convert it during signing but not verification, so I think
it's better to just consider it as opaque bytes.

> Perhaps the change to how signatures are computed in 5.2.4 should only
> apply to v5 signatures, since it represents a change in semantics.

The above seems like backwards-compatible guidance, so I don't think we
really need to restrict this to v5 signatures?

> Would it make sense to add a warning to the text that older software
> may not have performed the encoding when hashing the signature data?

Yeah, we could add a warning that older software may have signed
non-UTF-8-encoded data as type=text, indeed.

> Then, for historical v3 and v4 signatures, an OpenPGP implementation
> could create two hash contexts, one where only line ending
> normalization is performed, and another where both line ending and
> UTF-8 conversions are performed?

This shouldn't be needed, even to verify non-conformant signatures out
there, I believe it should be sufficient to always take the signed data
as-is, perform line ending normalization on it, and verify that. The
only thing to keep in mind is that since it might not be UTF-8 encoded
text, it's best to do the line ending normalization in terms of bytes
(i.e. convert lone 0x0A to 0x0D 0x0A). For spec-conform signatures, you
can instead operate on it as a unicode string (i.e. convert \n to \r\n).

> I'd be for requiring that text signatures (signature type 0x1) made by
> v5 key be over UTF-8 data. Otherwise, they should be considered
> invalid.

Yeah, I agree.

Best,
Daniel