Re: [I18ndir] I-D on filesystem I18N

Asmus Freytag <asmusf@ix.netcom.com> Tue, 07 July 2020 23:43 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5B1213A0C46 for <i18ndir@ietfa.amsl.com>; Tue, 7 Jul 2020 16:43:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.996
X-Spam-Level:
X-Spam-Status: No, score=-1.996 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ozVMw9s1IXJY for <i18ndir@ietfa.amsl.com>; Tue, 7 Jul 2020 16:43:46 -0700 (PDT)
Received: from elasmtp-masked.atl.sa.earthlink.net (elasmtp-masked.atl.sa.earthlink.net [209.86.89.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D4E2C3A0C3E for <i18ndir@ietf.org>; Tue, 7 Jul 2020 16:43:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1594165425; bh=66TdcgvPvG3XoO+oUkzE/cp4p4SRUQJa04nn CxzGKdo=; h=Received:Subject:To:References:From:Cc:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=i+6WDRtmsM3T/cXARw6VS6nHfSiBm5Kz+ /avFc28vAzbHsebGB7ZOGpdQmu34H9ncYnAtpa7nmySMbSXyog6I5DlI37ZSF+4TLRk 7TxXDHtRUtWpDEBwajYmuSqw/gIqsgBKC9wjmLbI4DIvXxP2Efrs9QXHJqJ4Y+pPZKQ JHx/xKcy2x0aOfkmbuxYHIlxZldpvqQRV/RGqz/Ld2lE01om8R1vluDq3psZWOwR1LD pHz7XzVdFmjEMnBlZOFP2iTLbv4WqoS0UshC8d4tk4kbgSX8EBjbRoWM6Te9B/et87b 6cv9BKmuvV9K2arxCeVp+keQVEPSRXOvX0nK04CZQ==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=lxh3XrntLjuH3FZ9M+x5/7tdu+htb8fcg8fsbL6f5Tbd3w8v4wGXaogWvKYvaNQ6rrn+bMe/V8CfnuZwNhI6+Oy+xo+Hj2Tygsd6RArQuHv6Ecvx1dxoLR/ma8DRbhCeYA4F+uQYE6+6Rm9i9fEafVOZXGST/VnPbB2qLO2fTV05G9Lmdf/EP8cHZadIu8NX10FL57SfqgthbeKLMTYzh7EMdvU9WhoqiMWavEQdFvAnn3Aly5BdINTeqOCRJenLfdxU8rvVZ3bdOhQIyIgEMbR5gn3XjfC6NqC68jwX9pSS+qlkJsLB4qJF4O9T945sGwOTpWcsaXzBkFC+d0/DkQ==; h=Received:Subject:To:References:From:Cc:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [71.212.59.17] (helo=[192.168.1.106]) by elasmtp-masked.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1jsxFX-00033K-Cc; Tue, 07 Jul 2020 19:43:43 -0400
To: Nico Williams <nico@cryptonector.com>
References: <20200706225139.GJ3100@localhost>
From: Asmus Freytag <asmusf@ix.netcom.com>
Cc: i18ndir@ietf.org
Message-ID: <90740541-ab72-ffaf-ff3e-5a27b5805eae@ix.netcom.com>
Date: Tue, 07 Jul 2020 16:43:42 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <20200706225139.GJ3100@localhost>
Content-Type: multipart/alternative; boundary="------------25873634AC9627C19071B0C2"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b22356fd30c7fd936eff3a967adef647fd6a0b6d616a830c5c350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 71.212.59.17
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/7Qpdt2tiHgNqf5ex9zktwn_fa30>
Subject: Re: [I18ndir] I-D on filesystem I18N
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2020 23:43:50 -0000

On 7/6/2020 3:51 PM, Nico Williams wrote:
> I've submitted draft-williams-filesystem-18n-00.

Here are my comments:

>     This document describes requirements for internationalization (I18N)
>     of filesystems specifically in the context of Internet protocols, the
>     architecture for filesystems in most currently popular general
>     purpose operating systems, and their implications for filesystem
>     I18N.  From the I18N requirements for filesystems and the

The first sentence doesn't scan - the constructions joined by "and" 
seemingly are not parallel enough.

>     [TBD: Add references galore.  How to reference Unicode?  How to

If you go on the Unicode site and look for the page on the latest 
version (currently https://www.unicode.org/versions/Unicode13.0.0/), you 
can navigate to suggested ways to reference. I would cite the latest 
version as of the time of drafting of the I-D and then also write: "the 
latest version is available at" and give the URL as 
http:\\www.unicode.org\versions\latest.

If you need to reference specifically the properties, the character 
database is at https://www.unicode.org/Public/13.0.0/ or  
https://www.unicode.org/Public/UCD/latest/ -- You could also link to 
UAX#44, which is the overview of the Unicode Character Database. The 
"latest" is at: https://www.unicode.org/reports/tr44/ and the current 
one, as of today, is at 
https://www.unicode.org/reports/tr44/tr44-26.html (see the "This 
version" link for the latest).

Unless you need to specifically discuss a particular topic, in which 
cases you might like to identify a specific UAX (e.g. 9 for Bidi) or 
chapter. Links to chapters are redirected correctly from the "latest" link.

If this is a bit complex, it's because Unicode really is a family of 
standards.

>   To deal with the equivalence problem, Unicode defines Normal Forms
Unicode calls these "Normalization Forms", see Section 3.11 in TUS (The 
Unicode Standard) so that is what should be used when capitalized. Ditto 
for the formal names for NFC and NFD.

I think you need to mention already here that NFC/NFD represent a 
semantic identity (and generally identical appearance) while NFKC and 
NFCD abstract away rather noticeable differences in appearance that may 
in some cases imply strong semantic differences to some users (e.g. math 
alphabets).

>   Unicode compatibility equivalence allows equivalence between
>     different representations of the same abstract character that may
>     nonetheless have different visual appearance of behavior.  There are
>     two canonical forms that support compatibility equivalence: NFKC and
>     NFKD.  Using NoCL with NFKC or NFKD may be surprising to users in a
>     visual way.  While form-insensitivity with NFKC or NFKD may surprise
>     users who might consider two file names distinct even when Unicode
>     considers them equivalent under compatibility equivalence.  The
>     latter seems less likely and less surprising, though that is an
>     entirely subjective judgement.
You really MUST not use the term "canonical" with NFKC and NFKD because 
for Unicode, the two forms NFC and NFD are considered "canonical" and 
that term is used contrastively to "compatibility".

There's an "of" that should be an "or", but it's not just behavior - 
it's also meaning. Mathematicians would object strenuously to having 
their math alphabets "normalized" to standard A-Z.

NFK(C/D) is useful in a different way: if it used to disallow code 
points that aren't stable under these normalization forms, then one 
sidesteps the issue of whether the distinct appearance etc. is 
meaningful, but without ever changing a filename (which would happen if 
it was normalized when stored). (There may be no file-systems that take 
this approach, nevertheless it's worth discussing as it is used in other 
naming schemes).

>     foldings are defined by Unicode.  Generally, case-insensitive
>     filesystems preserve original case just form-insensitive filesystems
>     preserve original form.
There's an "as" missing.

However, early case-insensitive file systems did not preserve case. Not 
sure how rare this has become.

>     listings that work the same way as on the server.  We do not specify
>     any case foldings here.  Instead we will either create a registry of
the "here" is unclear. Does it refer to recommendation in this ID 
relevant to caching clients? If so, link to section.

>    "just-use-8" or "just-use-16" (as in UTF-16 [UNICODE  <https://tools.ietf.org/html/draft-williams-filesystem-18n-00#ref-UNICODE>]), with no
>     attempt at normalization or case folding done anywhere in between.
In Unicode parlance: "just use strings of code units".

Specifically for UTF-8, this would imply that there are also no 
guarantees of well-formedness of the UTF-8 strings (likewise for 
surrogate pairs in UTF-16).

-- reached end of section 1 and out of time slot --

A./