[EAI] draft-klensin-encoded-word-type-u-00

John C Klensin <klensin@jck.com> Thu, 24 November 2011 21:58 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0FE201F0C52 for <ima@ietfa.amsl.com>; Thu, 24 Nov 2011 13:58:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.83
X-Spam-Level:
X-Spam-Status: No, score=-1.83 tagged_above=-999 required=5 tests=[AWL=-0.523, BAYES_00=-2.599, MISSING_HEADERS=1.292]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id newEnAMtHOen for <ima@ietfa.amsl.com>; Thu, 24 Nov 2011 13:58:23 -0800 (PST)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by ietfa.amsl.com (Postfix) with ESMTP id 725441F0C3D for <ima@ietf.org>; Thu, 24 Nov 2011 13:58:23 -0800 (PST)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1RThJD-000HNf-7t for ima@ietf.org; Thu, 24 Nov 2011 16:58:19 -0500
Date: Thu, 24 Nov 2011 16:58:18 -0500
From: John C Klensin <klensin@jck.com>
cc: ima@ietf.org
Message-ID: <79084A029BB2424F3BBF8A65@PST.JCK.COM>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: [EAI] draft-klensin-encoded-word-type-u-00
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Nov 2011 21:58:24 -0000

Hi.

One of John Levine's comments about the mailinglist document and
some issues with pop-imap-downgrade have convinced me that there
is a problem with the use of encoded words in strings that users
are expected to see and work with that we might actually know
how to fix.   

Historically, encoded words have been used in contexts where we
expected MUAs to turn them back into their original (native
character) forms, with display of those encoded forms to users
being a (hopefully infrequent) last resort.   That decoding is
less likely in the pop-imap-downgrade and mailing list
situations: in both, if the relevant clients could handle native
UTF-8 addresses, the encoded word forms would not be necessary.

The problem with encoded words in those contexts is that
encoding form "Q" is pretty useless except for mostly-ASCII text
and of somewhat dubious value then and encoding form "B" pretty
much requires a computer to decode.  %-encoded UTF-8 octets are
even worse: one needs a computer or a subtle calculation to turn
them into a Unicode code point reference which then must be
looked up in a table and one actually has to understand how
UTF-8 works to be able to tell whether %NN%MM%OO is one
character, two characters, or three characters.

One possible solution is to mix the encoded word strategy will
direct encoding of Unicode characters by code point.  I've just
posted draft-klensin-encoded-word-type-u-00 as a strawman for
doing just that.  It is not a WG document.  It is not even a
serious proposal at this stage.  But it provides a relatively
concrete proposal about "something else" that we might do to get
at least slightly dug out of the present hole that some of you
might consider worth thinking about.

I have deliberately not added normalization considerations to
that draft, but it could easily be done and I suspect it would
be necessary if the proposal was to be as useful as it might be.

    john