[openpgp] Character encodings

Wyllys Ingersoll <wyllys@gmail.com> Tue, 17 March 2015 15:38 UTC

Return-Path: <wyllys@gmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 096351A6EF9 for <openpgp@ietfa.amsl.com>; Tue, 17 Mar 2015 08:38:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v6omt0-SSW8P for <openpgp@ietfa.amsl.com>; Tue, 17 Mar 2015 08:38:33 -0700 (PDT)
Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CD2A41A1C06 for <openpgp@ietf.org>; Tue, 17 Mar 2015 08:38:32 -0700 (PDT)
Received: by obdfc2 with SMTP id fc2so10158012obd.3 for <openpgp@ietf.org>; Tue, 17 Mar 2015 08:38:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=g3QOrZYOak8nwn+dJdy8XUsq3Y+tjESe9gqGTCLA6bk=; b=JYv35tys5GOhUozjKuA6KXnroj+PG/+k0h2Q1cnauo1KpaB8ZI4LDPMbsdAn1rYV4R YZ/JzbQRQCnsikcQmgus/43lRHJ5FuwjtyWBm2fVXL1I6iZE4fSaFDetfr/AGAoRFcld wWqz5/mDyoKmwxkE+//QTCdlzqNZW3yDVYVHZ8CwNwoY11VdeUsUC/j6z8BLus/2mUW4 JrUn6VhLy7bwyyHoLVL1NMbvkXzgQggggh1r82LUDaQjQISpLiAlTXJ2hg2ggw4EwCfc wIveGlzhCWcOC99ktYQN6u0EQbMbwEMdXgA6NSc6u1yrZd017qojoKorGuNONtOGM+tp 6pcQ==
X-Received: by 10.202.196.135 with SMTP id u129mr9977553oif.69.1426606712232; Tue, 17 Mar 2015 08:38:32 -0700 (PDT)
MIME-Version: 1.0
From: Wyllys Ingersoll <wyllys@gmail.com>
Date: Tue, 17 Mar 2015 15:38:31 +0000
Message-ID: <CAHRa8=UbKKnmAmHCxsGwONsgM5udRbbKkm=Nyzf7Jrgg70+j5A@mail.gmail.com>
To: "openpgp@ietf.org" <openpgp@ietf.org>
Content-Type: multipart/alternative; boundary="001a113e2bb851205105117dc21e"
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/YmVM3DIe8vd7oaaExjxlRD6X3Ko>
Subject: [openpgp] Character encodings
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Mar 2015 15:38:38 -0000

One area that I think needs some attention is the character encoding and
charsets for encrypted text messages.

4880 says that everything should be UTF-8.  However, the reality is that
UTF8 is not used everywhere and there are lots of clients that compose
messages in their native preferred character set (Latin5, Greek, Kanji,
etc) and its very difficult as an implementor to figure it out after the
fact without some indication from the sender.

The literal packet format only specifies 3 possible values - binary, UTF8,
or plain.  The ASCII Armor header may specify a different charset (though
unfortunately very few agents add the "Charset" PGP header).
Additionally, if the message had MIME headers, there may be yet another
charset indicated in MIME that differs from the ASCII Armor charset and the
literal packet data format byte.

If the encrypting PGP software knows what character encoding was used to
compose the original message, there should be some way to communicate this
in the message that would be definitive so that the decrypting software can
present it the way it was originally intended.  As an implementor, this is
one of the trickiest areas to get right so that the end user sees the
messages as it was originally intended.