Re: [precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname

John C Klensin <john-ietf@jck.com> Thu, 01 October 2015 17:27 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3671F1B2DFC for <precis@ietfa.amsl.com>; Thu, 1 Oct 2015 10:27:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9rBY1ne5YmQg for <precis@ietfa.amsl.com>; Thu, 1 Oct 2015 10:27:31 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E1081B2E2C for <precis@ietf.org>; Thu, 1 Oct 2015 10:27:31 -0700 (PDT)
Received: from [198.252.137.10] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1Zhhdf-000LWC-6n; Thu, 01 Oct 2015 13:27:27 -0400
Date: Thu, 01 Oct 2015 13:27:22 -0400
From: John C Klensin <john-ietf@jck.com>
To: Tom Worster <fsb@thefsb.org>, precis@ietf.org, Peter Saint-Andre - &yet <peter@andyet.net>, Alexey Melnikov <Alexey.Melnikov@isode.com>
Message-ID: <AA9E723B80AC724FB163CDFA@JcK-HP8200.jck.com>
In-Reply-To: <D232C6F6.65904%fsb@thefsb.org>
References: <D230767C.6587A%fsb@thefsb.org> <560C5149.5090607@andyet.net> <588752141F4228C805E674FC@JcK-HP8200.jck.com> <D232C6F6.65904%fsb@thefsb.org>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/b19MS2FTv6NOLdvATs7X_Q06rwg>
Subject: Re: [precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Oct 2015 17:27:36 -0000


--On Thursday, October 01, 2015 12:15 -0400 Tom Worster
<fsb@thefsb.org> wrote:

> Thanks, John, for the most interesting email I got this week:)

:-)   I wish that this work were a lot less interesting, but so
it goes.
 
> It raises a question I would like to ask not only to you. As an
> implementer working through the specs, I got the clear
> impression that, while the PRECIS framework can potentially be
> useful for various things, the four profiles are explicitly
> specified exclusively for the purpose of string comparison
> operations on usernames, passwords and nicknames. If someone
> uses them for some other purpose and thereby causes problems,
> it's not RFC 7613's fault.

I'm in a bit of a difficult position on this because I was never
a big PRECIS fan.  It is probably best for you to assume that
the reasons are very technical (most are) but the main reason
why I didn't spot the paragraph you found and provide that
explanation before or during IETF Last Call is that I simply
gave up (after getting the WG to agree to resolve some problems
I considered much more significant).  So, if you have questions
about intent, Peter, ALexey, or others may be better qualified
to answer them.

> The question is, did I get the wrong impression?
> 
> If not then the discussion of how the profiles should specify
> case map/folding, which is specified only in the profiles, is
> simpler. The points to resolve being, as John put it, the
> "false positives".

Yes and no.  Maybe it is useful to think about it this way.  If
I'm a user, I want the systems and interfaces to be as intuitive
as possible relative to whatever I already know or think I do.
If I'm looking at text written in a particular language, I
expect the identifiers embedded in that text to obey that
language's rules (or what I think they are).  That argues for a
highly localized environment in which, e.g.,
French-as-used-in-France and France-as-used-in-Quebec have
different case matching rules and Arabic-language use of Arabic
script is a lot different from Persian-language use of Arabic
script even before one gets to "two sets of digits" or an
interesting discussion known to some specialists as "a digit is
a digit".

On the other hand, if I'm a system or library designer who
really wants to develop a single library, based around a single
profile and/or easily-switched tables, I really want a single
worldwide approach because trying to work things out separately
for every language, script, and location is almost impossible to
think about (and trying probably leads to madness)

Each of those groups is justified in believing that the other
group is living in an alternate, and profoundly unrealistic,
reality.  For some purposes, we can have a common code base and
allow considerable user choice, but if you and I could make
different choices about whether two valid strings match as
identifiers, we (and everyone else) are in big trouble.  In
IDNA2008, we tried (and failed, btw) to simply disallow all of
the characters that led to difficult edge cases.  PRECIS made a
different, profile-based, set of decisions that, on the one
hand, allow doing things that may be more intuitive for users
and that, on the other, may lead to surprises or contradictions.

For basic Latin script, the case-matching problem was solved
with Multics circa 40 years ago, albeit brutally (a solution
carried into Unix): if something doesn't match at the bitstring
level, it doesn't match and people just need to get used to it.
That causes obvious problems with expectations and assumptions,
but is at least 100% ambiguous.   As soon as one starts marching
down the path of case-independence for non-ASCII and then
non-Latin characters, it gets hard to explain to users -- users
whom we've managed to convince that computers are smart and
adaptable -- why "Color" and "color" should match but "colour"
(or even "couleur" or "Farbe") don't match and are different.

> Fwiw, the (immature and inadequately tested) implementation is
> open at https://github.com/tom--/precis

Good luck.
    john