Re: [openpgp] A way to securely define cleartext signature charset

Jon Callas <joncallas@icloud.com> Sat, 15 September 2018 06:47 UTC

Return-Path: <joncallas@icloud.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D3005129C6B for <openpgp@ietfa.amsl.com>; Fri, 14 Sep 2018 23:47:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=icloud.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0vVhJSA2rHuJ for <openpgp@ietfa.amsl.com>; Fri, 14 Sep 2018 23:47:39 -0700 (PDT)
Received: from st13p27im-asmtp004.me.com (st13p27im-asmtp004.me.com [17.162.190.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD3A81292AD for <openpgp@ietf.org>; Fri, 14 Sep 2018 23:47:39 -0700 (PDT)
Received: from process-dkim-sign-daemon.st13p27im-asmtp004.me.com by st13p27im-asmtp004.me.com (Oracle Communications Messaging Server 8.0.2.2.20180531 64bit (built May 31 2018)) id <0PF300J003JHKZ00@st13p27im-asmtp004.me.com> for openpgp@ietf.org; Sat, 15 Sep 2018 06:47:39 +0000 (GMT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=04042017; t=1536994059; bh=E5YcCgYURw9gsEmYIoAzfs4ZLUR/MjBAJCAY8AYMhBo=; h=Content-type:MIME-version:Subject:From:Date:Message-id:To; b=rTHCu1XNOYSlp9Hlf5xKzBh+6pDDnUrByx8N12gDZE7j3zV5cqmotP/oLaPb/eLkI r3dbZz/RWTBvCZ329D3mj8cev7fvZhSfW/Jn6x3o2qmvWXCuR2Dt6tFsEAjxM5kTsG 1DVNGlX2whIO3QlJV9AYBKsFpki/6fO5ldY6gUltbf9XVq5eMMz8M5jN9hS5PTaA65 d9AovFfOgaDSZ94RG+DiV8xpLeYVqm1X//tfJ8FKm6G4r/R5c4Y7OLH2lyH7nZ93jU ySt4/I1bm5AHQz4+pxZOQHXecIU/PqaqnKnwmaX9Nr3S5ZeflT0p+qYYqIZV8RVT0k xSaIE2WXWGGVA==
Received: from icloud.com ([127.0.0.1]) by st13p27im-asmtp004.me.com (Oracle Communications Messaging Server 8.0.2.2.20180531 64bit (built May 31 2018)) with ESMTPSA id <0PF300S4U47BDS00@st13p27im-asmtp004.me.com>; Sat, 15 Sep 2018 06:47:38 +0000 (GMT)
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809150072
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-09-15_02:,, signatures=0
Content-type: text/plain; charset="utf-8"
MIME-version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jon Callas <joncallas@icloud.com>
In-reply-to: <4583135.Hku5QGgJE0@esus>
Date: Fri, 14 Sep 2018 23:47:34 -0700
Cc: Jon Callas <joncallas@icloud.com>, openpgp@ietf.org
Content-transfer-encoding: quoted-printable
Message-id: <3C389CCA-5C16-44AE-B2D2-13A028B2DC6A@icloud.com>
References: <BY2PR16MB0278DB57063BDB6F519B882BE9050@BY2PR16MB0278.namprd16.prod.outlook.com> <8B546F88-AD17-4EBE-B8F8-F2D72D02CE8A@icloud.com> <4583135.Hku5QGgJE0@esus>
To: Andre Heinecke <aheinecke@intevation.de>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/Oaj_ewVd39XfxfqWR-8zST-yN04>
Subject: Re: [openpgp] A way to securely define cleartext signature charset
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Sep 2018 06:47:42 -0000


> On Sep 11, 2018, at 1:36 AM, Andre Heinecke <aheinecke@intevation.de> wrote:
> 
> Hi Jon, Neil,
> 
> thanks for your comments!
> 
> On Monday, September 10, 2018 4:53:03 PM CEST Jon Callas wrote:
>>> On Sep 10, 2018, at 11:23 AM, Neil Hunsperger 
> <Neil_Hunsperger=40symantec.com@dmarc.ietf.org> wrote:
>>> I'll add a data point. Some years back, the PGP Desktop product added an 
> unsigned "Charset" header to its ASCII armor. The result looked like this:
> 
> That would also be an option. I don't prefer it because it would be unsigned 
> but it would already help with usability issues.

[…]

> 
> And that is the problem. E.g. a webmailer in which you paste UTF-8 Text, then 
> the webmailer sees that it can encode that message as latin 1 and sends it as 
> latin 1. Now on the receiving side you have a content-type saying "latin 1" 
> but the message was actually signed in UTF-8. And so you have to "try" 
> multiple charsets if you whish to verify the message (as it could also be 
> signed as latin1).

I think you’re over-thinking this, Andre.

OpenPGP has many things in it from its past, and its past includes the computing environment of 1991, namely DOS, and as a system to post messages on BBSes. There are many things that we’d do differently, if we were starting over. There are many things that are basically layering violations, as well.

For example, the compression stuff there is so that people wouldn’t have to edit something up, then run Zip, and then encrypt that. Another thing, and what is relevant here is the distinction between “Text” and “Binary,” which inherits in some backwards way from FTP.

With RFC 2440, we made a (then) bold step of effectively saying that OpenPGP says that Text is Unicode, coded in UTF-8. By the time of the early 2000s, just about everyone had moved to Unicode but with holdouts, particularly in Japan, where they still liked (and like) Shift-JIS. The amusingly named “ISO Latin-5” is actually Cyrillic, which is not Latin at all, but it was an eight-bit character set for them. I’m sure I am oversimplifying and there are other holdouts as well, because long tails are like that. Nonetheless, every day that tail becomes more tail-like, because Unicode is what just about everyone uses.

> I'm not sure if you say that we should not add standardized way to define the 
> charset for cleartext signatures or that we should?

I’m saying we *have* a standardized way. The charset is Unicode encoded in UTF-8.

And then we have this bag on the side so that the long-tail people can put hints in. Back to Neil’s comments, we found out in the mid-2000s that Unicode wasn’t getting traction in Japan and they were doing all sorts of contortions to deal with crossed character sets. The final compromise within the working group was to put this in so that we could both recognize that the official stance (text is Unicode) and let the dissenters have a better experience.

It’s actually good that the header isn’t signed, because if it were, it would end up causing more issues — is it part of the message or not, to start with. That header is metadata, and it’s metadata that can’t be definitely stated because of layering. It’s good that a component can slide that hint in without having to be integrated with the whole of the crypto system.

> 
> I don't really see the problem of either silly character sets or Latin-5 / JIS 
> messages. As long as It can be converted to the display charset / for passage 
> through the openpgp engine it should be ok.

I concur; the OpenPGP engine should not have to worry about character sets. It’s got too much to worry about as it is.

> 
>> Thus, this section lets an implementation throw its hands up in the air and
>> scream wherever and whenever it wants, while giving a decent way to
>> clearsign Japanese text.
> 
> Yeah, but from a usability standpoint I do not like guessign, screaming and 
> failing if it can be avoided at all :-).

The only time you have to guess is if someone is not using Unicode. That’s the whole point.

	Jon