Re: [I18ndir] I-D on filesystem I18N

Nico Williams <nico@cryptonector.com> Tue, 07 July 2020 19:07 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9D623A09A8 for <i18ndir@ietfa.amsl.com>; Tue, 7 Jul 2020 12:07:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8ARwLpfk56lL for <i18ndir@ietfa.amsl.com>; Tue, 7 Jul 2020 12:07:31 -0700 (PDT)
Received: from chocolate.birch.relay.mailchannels.net (chocolate.birch.relay.mailchannels.net [23.83.209.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 004D03A092F for <i18ndir@ietf.org>; Tue, 7 Jul 2020 12:07:30 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id CDF1D321284; Tue, 7 Jul 2020 19:07:29 +0000 (UTC)
Received: from pdx1-sub0-mail-a38.g.dreamhost.com (100-96-23-13.trex.outbound.svc.cluster.local [100.96.23.13]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 280EC3213F3; Tue, 7 Jul 2020 19:07:29 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a38.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.8); Tue, 07 Jul 2020 19:07:29 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Share-Tart: 03876dc602671a95_1594148849609_819192387
X-MC-Loop-Signature: 1594148849609:551499427
X-MC-Ingress-Time: 1594148849609
Received: from pdx1-sub0-mail-a38.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a38.g.dreamhost.com (Postfix) with ESMTP id D5BBEB474E; Tue, 7 Jul 2020 12:07:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=XUqv/B5qAzSEKl EtKVce+vn0DEY=; b=hFds50+ahE6fRqR1G42Gf68e51Y7sGjV/Q73+7Fbx7gRJL pS5l49nxB/J3iOgUYK/RRBD42KCNEWocdaPj96NCQe9NYaaqeUFYVKoCCayvgd5I UIbALIPF9emfVTOs66R57odLAICLcKqRGZKdVz5h3Zjtey1flh4lvEk4CMa2I=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a38.g.dreamhost.com (Postfix) with ESMTPSA id 07885B474C; Tue, 7 Jul 2020 12:07:25 -0700 (PDT)
Date: Tue, 07 Jul 2020 14:07:16 -0500
X-DH-BACKEND: pdx1-sub0-mail-a38
From: Nico Williams <nico@cryptonector.com>
To: John C Klensin <john-ietf@jck.com>
Cc: Patrik Fältström <patrik@frobbit.se>, i18ndir@ietf.org
Message-ID: <20200707190715.GR3100@localhost>
References: <20200706225139.GJ3100@localhost> <B8BC0F0A-94AB-4BEF-8A5F-449049E28D8F@frobbit.se> <20200707070456.GK3100@localhost> <B0FAFBAF9EA570CCFB2575CF@PSB> <20200707150542.GN3100@localhost> <20200707155855.GO3100@localhost> <9035F429BFBFB80CF1288E30@PSB>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <9035F429BFBFB80CF1288E30@PSB>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduiedrudehgdduvdekucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepfffhvffukfhfgggtuggjfgesthdtredttdervdenucfhrhhomheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqeenucggtffrrghtthgvrhhnpefftdektefhueetveeigfefgeejteejvdfhhefgvddtfeeujeehleeguefhgffhgfenucfkphepvdegrddvkedruddtkedrudekfeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehlohgtrghlhhhoshhtpdhinhgvthepvdegrddvkedruddtkedrudekfedprhgvthhurhhnqdhprghthheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqedpmhgrihhlfhhrohhmpehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmpdhnrhgtphhtthhopehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhm
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/EJgCNCkX2_3p8Uz0_COZJnng_eY>
Subject: Re: [I18ndir] I-D on filesystem I18N
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2020 19:07:33 -0000

On Tue, Jul 07, 2020 at 02:20:43PM -0400, John C Klensin wrote:
> --On Tuesday, July 7, 2020 10:58 -0500 Nico Williams
> <nico@cryptonector.com> wrote:
> > Hmm, unless you meant that mixing of RTL and LTR scripts
> > should be forbidden.  But that's not a realistic proposition
> > given that file extensions tend to be a) ASCII (LTR) and b)
> > required, so even within a single directory it may not be
> > possible.  And even where file extensions are not required,
> > there is the problem of "dotfile" conventions for hiding
> > files.  And even if all of those were non-issues and we could
> > ban mixing of RTL & LTR scripts in any given _filename_, we
> > could never keep script mixing from happening in absolute
> > _paths_.  So we can't effectively avoid display issues by
> > forbidding mixing of RTL/LTR scripts.
> > 
> > Display issues for mixed RTL/LTR paths affect a very large
> > variety of software, from terminal emulators to browsers
> > (because paths tend to be used as URI/IRI local parts) to
> > native GUI apps, and much more.  What more can we do about
> > them than reference Unicode?
> 
> First of all, remember that, as long as numerals are expressed
> in positional form (e.g., with Indo-Arabic digits and their
> equivalent in other scripts), there is no such thing as a
> strictly RTL script.  That is at least one reason why "bidi" is
> much more appropriate as a characterization of those scripts
> than RTL, even with no script mixing.   The problems you are

Sure.

> discussing are, however, exactly where "just referencing
> Unicode" breaks down.  The Bidi algorithm was designed for, and
> works really well for, ordinary running text, at least for
> scripts with syntactically unambiguous "words" and word
> boundaries.  As soon as something has a special syntax because
> it is not running text but a special type of object expressed in
> text form -- such as a filename-- unthinkingly applying the Bidi
> algorithm leads to trouble.  

That's fair, but again, running code that displays file paths and file
names and URIs/IRIs is common and far flung.  I agree that we could
borrow from RFC 5893 to provide advice to implementors of code that
displays any of those things.  We just need to understand that we won't
be boiling any oceans, and that advice regarding display targets a very
different set of implementors than advice regarding filesystem protocols
and advice regarding filesystems themselves.

> We've recognized that it the past.  For example, RFC 5893 does
> not say "just use Unicode Bidi".  It recognizes that domain
> names provide special challenges (with the most obvious example
> being that DNS labels are not required to follow the rules for
> "words" in relevant languages and because the fundamental syntax
> for non-TLD FQDNs is Label.Label... and so provides rules for
> dealing with domain names that are different from the Unicode
> Bidi rules.   It seems to me that precisely the argument you've
> made leads to a conclusion that file names and file paths are
> special types of objects and, if they are allowed to include
> code points drawn from nominally RtoL scripts, then they are
> going to need their own rules.  

Possibly.  Keep in mind that we don't and won't have a filesystem
equivalent of A-labels, and we don't have a notion of filesystem-aware
apps or protocol slots.

On the plus side, these are _filesystems_, not _the DNS_.  The problems
that arise have much much smaller impact.

(Unless we bring IRI local parts into scope, which we might do since
often IRI local parts are taken from filesystems.  But still, the impact
of bidi issues there is much smaller than for the DNS.  Mind you, I
don't want to blow up the I-D's scope too much, and would rather say
anything about IRI local parts in a subsequent document, if we do it at
all.)

> Now, of course, you could write your spec to say "this works
> only for filesystems that do not allow characters to which
> Unicode assigned RTL properties and it may not be appropriate
> for scripts that encounter significant issues when putting
> identifiers through Case Folding or that have compatibility
> equivalents to other characters either".   I think you would
> have trouble getting IETF consensus (or adoption in the field by
> people who already have their own systems --even if ASCII-only--
> and would need to change them) with such restrictions but, if
> you disagree, go for it.

I don't think we need to do that.  We could say that bidi can cause
problems, describe the impact, and provide implementation advice, just
as long as we understand that it's unlikely that all the relevant
software will be updated accordingly any time soon.

> Again, I'd recommend careful study of the PRECIS specs, at least
> to the extent of either being able to appropriately profile them
> for some of your needs or being able to explain why they are not
> relevant.

Noted.

Nico
--