Re: [openpgp] Character encodings

Wyllys Ingersoll <wyllys@gmail.com> Tue, 17 March 2015 19:27 UTC

Return-Path: <wyllys@gmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 792A91A88A6 for <openpgp@ietfa.amsl.com>; Tue, 17 Mar 2015 12:27:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 92VeQOS7KBcV for <openpgp@ietfa.amsl.com>; Tue, 17 Mar 2015 12:26:57 -0700 (PDT)
Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A882D1A8888 for <openpgp@ietf.org>; Tue, 17 Mar 2015 12:26:57 -0700 (PDT)
Received: by obcxo2 with SMTP id xo2so15603740obc.0 for <openpgp@ietf.org>; Tue, 17 Mar 2015 12:26:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=nc2BATI+sLiNNWKMHW+CYqu/xME7VdC+AhCWkeAezSA=; b=odlNfpsICf1LxmgpMVsIamrfGpL6NCIXl+MzY0mQZ7Elagfd1jYnAUATqIMfXSBmni xZ79cKILpzLHIkQ/t4zFbqBbPWWXMWmBaMZArqcTUHxoW8xhCqjxpam1//+ik1PYpYXZ DuWWfa9eFpca2Ex58eWaV2nhlJ8pSAg7kgab2TAweAoTyqCmlpoPyv/Au5zk2n3qr1TB 94d0UuupLkKiCus+S+gRM7m4cyV6oFon/ZsEmddMuN/uvq131SGp4P0RDbJPjSAHu4/t B56Po5Vfnpds+lY2X9wBBMmhaSh7iCcGHVIZoPJP9lwj+KYmwkpuhqrTtclgcRzgce8r JKlA==
X-Received: by 10.60.160.236 with SMTP id xn12mr24845519oeb.53.1426620417113; Tue, 17 Mar 2015 12:26:57 -0700 (PDT)
MIME-Version: 1.0
References: <CAHRa8=UbKKnmAmHCxsGwONsgM5udRbbKkm=Nyzf7Jrgg70+j5A@mail.gmail.com> <CAHBU6isuaGx_=0hBQUJ6LNMdSGJDJ8t0s0jhiZCVOe6znB7G2g@mail.gmail.com>
In-Reply-To: <CAHBU6isuaGx_=0hBQUJ6LNMdSGJDJ8t0s0jhiZCVOe6znB7G2g@mail.gmail.com>
From: Wyllys Ingersoll <wyllys@gmail.com>
Date: Tue, 17 Mar 2015 19:26:55 +0000
Message-ID: <CAHRa8=VTK17roDf0aPsnNJuME0oghNPV8rN=5Mfh3eLKsLWqoA@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: multipart/alternative; boundary="089e011603e431030a051180f328"
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/fSwGHUnAl3EZhv5U0SGSl-NJ_8w>
Cc: openpgp@ietf.org
Subject: Re: [openpgp] Character encodings
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Mar 2015 19:27:03 -0000

In my experience it's not getting any better for PGP messages that are not
composed in a basic text editor.  Users composing messages on a mobile
devices, for example, do not always default to UTF8, they use the
system-wide character encoding setting (or the charset encoding specified
by the composing app itself).

For example, iOS Apple basically says if you don't know the original
encoding, you have to basically "guess" by trying various encodings until
you find one that works.
https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/Strings/Articles/readingFiles.html#//apple_ref/doc/uid/TP40003459-SW4
Fortunately, it usually only takes a few tries to get it right if its not
UTF8.

I agree that UTF-8 should be preferred and enforced wherever possible. But
in cases where it is not, it would help if the sender was able to provide a
hint as to what the encoding actually is, and do so in a standardized
manner that can be easily implemented.



On Tue, Mar 17, 2015 at 3:00 PM Tim Bray <tbray@textuality.com> wrote:

> This would be a huge step backward. The proportion of text on the internet
> that is UTF-8 is monotonically increasing toward 100%. Thank goodness.
> On Mar 18, 2015 4:38 AM, "Wyllys Ingersoll" <wyllys@gmail.com> wrote:
>
>> One area that I think needs some attention is the character encoding and
>> charsets for encrypted text messages.
>>
>> 4880 says that everything should be UTF-8.  However, the reality is that
>> UTF8 is not used everywhere and there are lots of clients that compose
>> messages in their native preferred character set (Latin5, Greek, Kanji,
>> etc) and its very difficult as an implementor to figure it out after the
>> fact without some indication from the sender.
>>
>> The literal packet format only specifies 3 possible values - binary,
>> UTF8, or plain.  The ASCII Armor header may specify a different charset
>> (though unfortunately very few agents add the "Charset" PGP header).
>> Additionally, if the message had MIME headers, there may be yet another
>> charset indicated in MIME that differs from the ASCII Armor charset and the
>> literal packet data format byte.
>>
>> If the encrypting PGP software knows what character encoding was used to
>> compose the original message, there should be some way to communicate this
>> in the message that would be definitive so that the decrypting software can
>> present it the way it was originally intended.  As an implementor, this is
>> one of the trickiest areas to get right so that the end user sees the
>> messages as it was originally intended.
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> openpgp mailing list
>> openpgp@ietf.org
>> https://www.ietf.org/mailman/listinfo/openpgp
>>
>>