Re: i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))

John C Klensin <john-ietf@jck.com> Thu, 08 January 2015 18:01 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A3C401A00E4; Thu, 8 Jan 2015 10:01:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.61
X-Spam-Level:
X-Spam-Status: No, score=-4.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UpKqE8w1aJdg; Thu, 8 Jan 2015 10:01:44 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9BEBB1A8AE2; Thu, 8 Jan 2015 10:01:42 -0800 (PST)
Received: from [198.252.137.35] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1Y9HOg-000HkZ-Uq; Thu, 08 Jan 2015 13:01:26 -0500
Date: Thu, 08 Jan 2015 13:01:17 -0500
From: John C Klensin <john-ietf@jck.com>
To: Jan Pechanec <jan.pechanec@oracle.com>
Subject: Re: i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))
Message-ID: <7A38C32D6DB867E71E8F90B4@JcK-HP8200.jck.com>
In-Reply-To: <alpine.GSO.2.00.1501071105250.8929@keflavik>
References: <CAK3OfOgm_ZYj-rY+4ExZzY8KY4G3rz2KLrZ8hQJi7ZUR4yiP0Q@mail.gmail.com> <alpine.GSO.2.00.1412300946340.4549@keflavik> <CAK3OfOha9qu=uDtqwDTdV78waLMaorYq0T6cq1YX3VzQn2OpKA@mail.gmail.com> <A4CC6CEC-D17E-4235-B615-9D2AD88096D4@frobbit.se> <20141231070328.GK24442@localhost> <B08B813F-B8B4-49F1-A0B9-60F322C8E9C7@frobbit.se> <20141231074641.GM24442@localhost> <947CA101-D717-4B56-8EEE-84B3A53BF4A1@frobbit.se> <20141231082551.GN24442@localhost> <E4837FDB76D5ACDEB1C568DF@[192.168.1.128]> <20150102030130.GN24442@localhost> <alpine.GSO.2.00.1501032124490.6923@keflavik> <alpine.GSO.2.00.1501071105250.8929@keflavik>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.35
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/ietf/z_6opqeAL9qHxR5EyQbahUrwIiY>
X-Mailman-Approved-At: Fri, 09 Jan 2015 08:41:14 -0800
Cc: Darren J Moffat <Darren.Moffat@oracle.com>, Stef Walter <stef@thewalter.net>, Jaroslav Imrich <jaroslav.imrich@gmail.com>, ietf@ietf.org, =?UTF-8?Q?Patrik_F=C3=A4ltstr=C3=B6m?= <paf@frobbit.se>, Shawn Emery <shawn.emery@oracle.com>, saag@ietf.org, Christian Huitema <huitema@microsoft.com>, Nikos Mavrogiannopoulos <n.mavrogiannopoulos@gmail.com>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jan 2015 18:01:46 -0000


--On Wednesday, January 07, 2015 11:16 -0800 Jan Pechanec
<jan.pechanec@oracle.com> wrote:

> On Sat, 3 Jan 2015, Jan Pechanec wrote:
> 
> 	hi, I haven't received any other comments on the draft 
> recently (I know the LC already ended on Dec 29 though) so I
> think I  can file changes discussed and drafted in this thread
> as draft 18 on  Friday.  Thank you all for feedback, I really
> appreciate it.
> 
> 	one more change for the draft 18 (v2 attached) is to spell 
> "NFC" and reference the Unicode Annex on normalization based
> on  comments from Jaroslav and Christian.
>...

Jan,

I don't have a lot of time to spend on this and am not an expert
on either X.509 or PKCK (#11 or otherwise).  At least the first
may be unfortunate, but it is what it is.

While I think the changes you have made are definitely
improvements, this i18n stuff is complicated.  As with Security,
there is a completely inadequate supply of magic pixie dust that
can be thrown at problems to make them go away.  "Normalize to
NFC" (with spelling-out and references) is a vast improvement or
"use [valid] UTF-8" but there are many other issues.  You have
noted some and omitted others.  For example, case-independent
matching is a very simple and completely deterministic issue for
ASCII (one essentially just masks off one bit within a certain
range), it can get very messy if one tries to be sensitive to
different locales that have different conventions about what to
do with diacritical marks when lower-case characters are
converted to upper case.  There are Unicode "CaseFold" rules
that are at least self-consistent but which contain wjat amount
to exceptions for some language contexts (e.g., for dotless "i")
but they are wildly unpopular in some places.

We used to joke that, every time we tried to carefully examine a
new script and set of languages for IDN-related purposes, it was
like turning over rocks with vipers hiding under them.  Each new
script or language context turned up a different set of
difficult issues -- the only surprise what what sort of
creatures crawled out, not whether there would be creatures
there.  The joke has gone out of fashion, but the realities that
inspired it survive.

Part of this is an inherent problem with trying to create a
universal character set -- languages and writing systems are
diverse enough that any "one size fits all" model or set of
decision rules is guaranteed to be deeply problematic and
upsetting for some people (and legitimately so) while developing
too many script-specific (or language-specific) rules or
exceptions is almost certain to upset those who feel a need for
simpler approaches that can be incorporated into generic
software.  For your reading pleasure,
draft-klensin-idna-5892upd-unicode70 discusses one set of cases
in which application of different rules and criteria led to a
conclusion that may be just right for some communities but that
is definitely problematic for others.

I don't know how far in explaining this your document should go.
I would urge, as I think I did before, some fairly strong
warnings that, at least until the issues are clarified in
PKCS#11 itself, one should be very certain one knows what one is
doing (and what the consequences of one's choices will be) if
one decides to move beyond the safety and general understanding
of the ASCII/ ISO 646/ IA5 letter and digit repertoire.  That
sort of warning should supplement your NFC language, not replace
it-- neither is a substitute for the other.   Whether you
incorporate it or not, your I-D should not assume that, by
saying "NFC", you have somehow resolved the full range of issues
in this area, any more than saying "UTF-8" did. 

For more information, you might have a look at some of the
PRECIS work, notably draft-ietf-precis-framework.

I also remain convinced that the best place to fix this is in
the PKCS#11 spec itself.  One is always at a disadvantage when
trying to work around an inadequate specification in a different
specification that has to depend on it and your work is no
exception.  I wish there were whatever liaison arrangements
between the IETF and others (presumably notably RSA) to be sure
that happened or at least there was clear awareness on the PKCS
side of the deficiencies.

Happy New Year,
    john