Re: [I18ndir] I-D on filesystem I18N

John C Klensin <john-ietf@jck.com> Tue, 07 July 2020 12:27 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C81543A0C1C for <i18ndir@ietfa.amsl.com>; Tue, 7 Jul 2020 05:27:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q6fuhr9FzcRc for <i18ndir@ietfa.amsl.com>; Tue, 7 Jul 2020 05:27:45 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3BE03A0C16 for <i18ndir@ietf.org>; Tue, 7 Jul 2020 05:27:45 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1jsmhK-000CuV-1e; Tue, 07 Jul 2020 08:27:42 -0400
Date: Tue, 07 Jul 2020 08:27:36 -0400
From: John C Klensin <john-ietf@jck.com>
To: Nico Williams <nico@cryptonector.com>, Patrik Fältström <patrik@frobbit.se>
cc: i18ndir@ietf.org
Message-ID: <B0FAFBAF9EA570CCFB2575CF@PSB>
In-Reply-To: <20200707070456.GK3100@localhost>
References: <20200706225139.GJ3100@localhost> <B8BC0F0A-94AB-4BEF-8A5F-449049E28D8F@frobbit.se> <20200707070456.GK3100@localhost>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/l7l1kZ0rohm8EP83OrrScCMTG9w>
Subject: Re: [I18ndir] I-D on filesystem I18N
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2020 12:27:47 -0000


--On Tuesday, July 7, 2020 02:04 -0500 Nico Williams
<nico@cryptonector.com> wrote:

>...
>> When you talk about what context this document is about, I
>> feel you should explicitly say that you do not deal with
>> RTL/LTR issues. This ends up being something that is very
>> very important as well, but display issues is definitely not
>> within scope for this document.
> 
> Quite true!

Except that "so not deal with RTL/LTR issues" is equivalent to
"blow off a significant number of people in the world who use
such writing systems".  There are probably ways deal with the
issues in things like file system paths, but saying "display
system are not in scope" and then moving on is probably not a
reasonable member of that set.


>...
>> The reason for this is that it makes it easier to explain in
>> the next step that the function might very well be (as you
>> say) locale dependent, and I think more important that
>> lower_case() and upper_case() are two functions that might
>> not be inverse of each other. I.e. just because t =
>> lower_case(s) might not imply s = upper_case(t).
> 
> And case folding can be designed for case-insensitive
> comparisons, so it need not be the same as tolower().  That's
> why we speak of case folding, not "lowering case" or such like.

With the understanding that it is case folding, regardless of
its other advantages, that causes some of the nasty edge cases,
e.g.,

* At least before the upper-case Eszett ("Sharp S") was
introduced as a code point (and still for backwards
compatibility reasons) 

   toLower(ß) -> ß  (already lower case)

* Similarly,

   toLower(i) -> ı (U+0131, already lower case)

Only when does case folding, mapping the former to "SS" and then
to "ss" and the latter to "I" and then to "i", do nasty edge
cases requiring locale or language-specific treatment arise.

Priority question: we have a whole series of long-outstanding
and core i18n documents more or less in queue with
draft-faltstrom-unicode12 (and the now-required
draft-faltstrom-unicode13), draft-klensin-idna-rfc5891bis,
draft-freytag-troublesome-characters, and some SMTPUTF8 ("EAI")
tweaks that have not been turned into I-Ds as obvious examples
along with, perhaps, draft-sullivan-lucid-prob-stmt, the
notorious draft-klensin-idna-5892upd-unicode70, and others.  How
do we want to prioritize those probably tedious (but important
to make/keep existing standards track specs workping] well)
specs versus work relative to Nico's much more exciting file
system proposal?

    best,
     john