Re: [nfsv4] one more try at RFC3530 internationalization.

Nico Williams <nico@cryptonector.com> Wed, 07 August 2013 22:26 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6705421F9DF7 for <nfsv4@ietfa.amsl.com>; Wed, 7 Aug 2013 15:26:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.377
X-Spam-Level:
X-Spam-Status: No, score=-2.377 tagged_above=-999 required=5 tests=[AWL=-0.400, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GzeczSFS5THw for <nfsv4@ietfa.amsl.com>; Wed, 7 Aug 2013 15:26:30 -0700 (PDT)
Received: from homiemail-a87.g.dreamhost.com (caiajhbdcaid.dreamhost.com [208.97.132.83]) by ietfa.amsl.com (Postfix) with ESMTP id 4A3F321F9E0B for <nfsv4@ietf.org>; Wed, 7 Aug 2013 15:26:27 -0700 (PDT)
Received: from homiemail-a87.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a87.g.dreamhost.com (Postfix) with ESMTP id 9B3B626C05E for <nfsv4@ietf.org>; Wed, 7 Aug 2013 15:26:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=Pvww6YKKS7czTf36rAdx 2wcwvHs=; b=BUNZ+6wICNJG4KbFEqWO+JDJnsb1pM30uejTW75m9qVU6LbUXSzq tEvwJOfkVHEy4CjOgoQlrnI84bzkH/qCf48UaKYhqQk9PuNmsf8hQZ53dvAlzFf+ vJEPUpQgdCiNzoKZv9s0+q62ZSUveq8IOqpDqREbhUo+HRUsbn2wDl8=
Received: from mail-wg0-f46.google.com (mail-wg0-f46.google.com [74.125.82.46]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a87.g.dreamhost.com (Postfix) with ESMTPSA id 2ACB426C06F for <nfsv4@ietf.org>; Wed, 7 Aug 2013 15:26:26 -0700 (PDT)
Received: by mail-wg0-f46.google.com with SMTP id k13so1984484wgh.25 for <nfsv4@ietf.org>; Wed, 07 Aug 2013 15:26:24 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mnXzImB7o9nxi2FsiQ8C7rwAJ3BfXLoYhCDGb6Isbvg=; b=J9m7PJT00roJg4ppWZ8KVBS6nnlOFwty0aDngTqvM+j7fFdedaegCaZs8xF1uUtpqx B/4Y/6WdyWDemhLUksowFxxtdSSLqiXUvy5smB+iUcQQdpw+qtU4x96ydNmHet8cDo1M R45spPDK8t/YaMsrPfn11a5rtMiK/xe1SdK81Z9XA3JLmnslVarzd3Q2Kq+tLWugFhVh IIIAKrOg0aaQY7GtKl6b06bcNgixw8ar2oaiIUBJLFgh4rCdzP84CrYLvOzxbjYjO3D8 3TfPcn06H0rUrDlskHvFDRD+NgOlijkJu+oBfBdl/xScR1WaVdRsWDVgoHMrDtt9AU3Y GF/w==
MIME-Version: 1.0
X-Received: by 10.180.187.175 with SMTP id ft15mr3440639wic.20.1375914384470; Wed, 07 Aug 2013 15:26:24 -0700 (PDT)
Received: by 10.216.21.138 with HTTP; Wed, 7 Aug 2013 15:26:24 -0700 (PDT)
In-Reply-To: <5DEA8DB993B81040A21CF3CB332489F607B5AACC00@MX31A.corp.emc.com>
References: <5DEA8DB993B81040A21CF3CB332489F607B5AACC00@MX31A.corp.emc.com>
Date: Wed, 07 Aug 2013 17:26:24 -0500
Message-ID: <CAK3OfOi5HL6gdEi8LKoUQ3L+hEhA+oCLbKTtd3hTSVxjBuneJw@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: "Noveck, David" <david.noveck@emc.com>
Content-Type: text/plain; charset="UTF-8"
Cc: "nfsv4 list (nfsv4@ietf.org)" <nfsv4@ietf.org>
Subject: Re: [nfsv4] one more try at RFC3530 internationalization.
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Aug 2013 22:26:35 -0000

On Wed, Aug 7, 2013 at 2:40 PM, Noveck, David <david.noveck@emc.com> wrote:
> I've been working with Tom Haynes and David Black to come with an
> approach to internationalization for RFC3530bis, that meets the IESG's
> objections and has a good chance of being approved.
>
> The approach to be taken is basically to get an internationalization
> description that matches what people have implemented.  I think it is
> pretty clear that does not match the stringprep-based approach taken in
> RFC3530, although we may have some issues proving that fact to the IESG,
> or more properly, giving Martin the information he needs to prove it to
> the IESG.

Back then the IETF didn't have all that much experience with I18N.
Even recently the PRECIS WG was nearly making serious mistakes.  I
think our experience qualifies as experience to learn from.  I think
the IESG will surely understand.

> The internationalization approach in the curent RF3530bis draft,
> draft-ietf-nfsv4-rfc3530bis-26 attempted to make the stringprep stuff
> non-normtive nd loosen it nough tht current implentations fit withiin
> it.  unfortunately, the IESG ididn't like it, so we need something else.

We know from our experience with ZFS and NFS on top what is reasonable
and what is not w.r.t. filesystem object names:

 - it is reasonable to reject non-UTF-8

 - it is reasonable to do normalization- and/or case-insensitivity --
i.e., do any normalization and mapping on lookup (or on hashing at
create time) but always return the original name on READDIR, with a
security considerations note about aliasing

 - it is probably reasonable to not normalize for anything

 - normalize-on-create, mappings of any kind (e.g., case) are a
sometimes-desirable, sometimes-not feature, and in particular on most
Unix systems (not including OS X w/ HFS+) normalization-on-create and
any mappings on create lead to problems, therefore:

 - it is reasonable to not do *any* mappings or normalization on
create -- this has to be a choice made with application compatibility
in mind

 - all of this on a per-filesystem basis (or even per-directory), NOT
on a per-fileserver basis

>From experience in the SASL/PRECIS world we know that for *user* and
group names it's best to apply any and all normalization and mappings
on the server side.  The client need not do anything, though
normalizing doesn't hurt (but mappings, on the client side, *do*
hurt).

Actually, that applies to filesystem objects as well: the client
should do no mappings.

We really need to distinguish:

 - query strings (strings sent by the client to the server)

 - storage strings (what the filesystem/server stores)

 - display strings (what the user sees)

The preparation for query strings should be minimal; the identity
function will do, though the client MUST (but won't) convert between
non-UTF-8 and UTF-8, because a) the server SHOULD reject non-UTF-8,
and b) codeset aliasing sucks.

The preparation for storage strings is and should be
filesystem-specific.  For example, HFS+ normalizes on create to NFD
and is case-insensitive on lookup (IIRC).  Nothing an RFC says can
change that -- we must not even try.  What we can do is hope to get
codeset conversions right by dictating the use of UTF-8 on the wire:
then clients and servers can convert to/from whatever they use
locally.  For example, on Solaris 11 FAT and other such filesystems
can convert automatically to UTF-16 and/or to various code pages.

In practice clients cannot really do codeset conversions because the
only place that such conversions can be done in Unix-like OSes is in
the C run-time's system call stubs (the kernel doesn't know about
user-land locales, and above the system call stubs is the app, which
shouldn't have to do these conversions).  To my knowledge no C library
does that.  This can be addressed by running only in UTF-8 locales,
but that's not 100% satisfactory, just 95%.  Client implementors that
care enough will go and fix their user-land C run-times, though I
doubt any will.

> The proposed approach is to base the new RFC3530bis handling of
> internationalization on the internationlization tretament in
> draft-ietf-nfsv4-rfc3010bis-04, the last draft for rfc3530 before
> it was stringprep-ized.  I feel that this is simple enough that we
> can clearly make the case, with the implementer's help, that this
> is what current servers and clients do and getting RFC3530bis approved.

I've not read that.  Please see my proposal above.

Nico
--