Re: [nfsv4] one more try at RFC3530 internationalization.

"Noveck, David" <david.noveck@emc.com> Thu, 08 August 2013 18:02 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7555621F857A for <nfsv4@ietfa.amsl.com>; Thu, 8 Aug 2013 11:02:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rUXQAhRQJkJu for <nfsv4@ietfa.amsl.com>; Thu, 8 Aug 2013 11:02:33 -0700 (PDT)
Received: from mexforward.lss.emc.com (hop-nat-141.emc.com [168.159.213.141]) by ietfa.amsl.com (Postfix) with ESMTP id 7273C21F9CB3 for <nfsv4@ietf.org>; Thu, 8 Aug 2013 11:02:33 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id r78I2Ntm018361 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 8 Aug 2013 14:02:29 -0400
Received: from mailhub.lss.emc.com (mailhubhoprd01.lss.emc.com [10.254.221.251]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor); Thu, 8 Aug 2013 14:02:10 -0400
Received: from mxhub29.corp.emc.com (mxhub29.corp.emc.com [128.222.70.169]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id r78I20Lt025752; Thu, 8 Aug 2013 14:02:03 -0400
Received: from mx31a.corp.emc.com ([169.254.1.238]) by mxhub29.corp.emc.com ([128.222.70.169]) with mapi; Thu, 8 Aug 2013 14:02:02 -0400
From: "Noveck, David" <david.noveck@emc.com>
To: Nico Williams <nico@cryptonector.com>
Date: Thu, 08 Aug 2013 14:02:01 -0400
Thread-Topic: [nfsv4] one more try at RFC3530 internationalization.
Thread-Index: Ac6TvStSVjdRstE6Sk6yuFOmeipmSwAoIBUw
Message-ID: <5DEA8DB993B81040A21CF3CB332489F607B5AACD3A@MX31A.corp.emc.com>
References: <5DEA8DB993B81040A21CF3CB332489F607B5AACC00@MX31A.corp.emc.com> <CAK3OfOi5HL6gdEi8LKoUQ3L+hEhA+oCLbKTtd3hTSVxjBuneJw@mail.gmail.com>
In-Reply-To: <CAK3OfOi5HL6gdEi8LKoUQ3L+hEhA+oCLbKTtd3hTSVxjBuneJw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-EMM-MHVC: 1
Cc: "nfsv4 list (nfsv4@ietf.org)" <nfsv4@ietf.org>
Subject: Re: [nfsv4] one more try at RFC3530 internationalization.
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Aug 2013 18:02:39 -0000

> Back then the IETF didn't have all that much experience with I18N.

Now it has and everybody is sadder and a few people are wiser.  

> Even recently the PRECIS WG was nearly making serious mistakes.  

But they didn't actually make them?  Sounds pretty good to me.

> I think our experience qualifies as experience to learn from.  

I think so but I don't expect the rest of the IETF internationalization community to agree, at least for a while.

> I think the IESG will surely understand.

I believe you're unduly optimistic.

>> The proposed approach is to base the new RFC3530bis handling of 
>> internationalization on the internationlization tretament in 
>> draft-ietf-nfsv4-rfc3010bis-04, the last draft for rfc3530 before it 
>> was stringprep-ized.  I feel that this is simple enough that we can 
>> clearly make the case, with the implementer's help, that this is what 
>> current servers and clients do and getting RFC3530bis approved.

> I've not read that.  Please see my proposal above.

When we have a new draft-ietf-nfsv4-rfc3050bis,incorporating the above, I'd ask you to read that and indicate any cases in which the things you think are reasonable (and have presumably implemented) are not at least ALLOWED by that new draft.  That draft will allow many things you think and I think are unreasonable, but our goal here is to allow all existing implementation behavior  so that Martin can confidently tell the IESG that rfc3530bis describes the actual implementations out there. 




-----Original Message-----
From: Nico Williams [mailto:nico@cryptonector.com] 
Sent: Wednesday, August 07, 2013 6:26 PM
To: Noveck, David
Cc: nfsv4 list (nfsv4@ietf.org)
Subject: Re: [nfsv4] one more try at RFC3530 internationalization.

On Wed, Aug 7, 2013 at 2:40 PM, Noveck, David <david.noveck@emc.com> wrote:
> I've been working with Tom Haynes and David Black to come with an 
> approach to internationalization for RFC3530bis, that meets the IESG's 
> objections and has a good chance of being approved.
>
> The approach to be taken is basically to get an internationalization 
> description that matches what people have implemented.  I think it is 
> pretty clear that does not match the stringprep-based approach taken 
> in RFC3530, although we may have some issues proving that fact to the 
> IESG, or more properly, giving Martin the information he needs to 
> prove it to the IESG.

Back then the IETF didn't have all that much experience with I18N.
Even recently the PRECIS WG was nearly making serious mistakes.  I think our experience qualifies as experience to learn from.  I think the IESG will surely understand.

> The internationalization approach in the curent RF3530bis draft,
> draft-ietf-nfsv4-rfc3530bis-26 attempted to make the stringprep stuff 
> non-normtive nd loosen it nough tht current implentations fit withiin 
> it.  unfortunately, the IESG ididn't like it, so we need something else.

We know from our experience with ZFS and NFS on top what is reasonable and what is not w.r.t. filesystem object names:

 - it is reasonable to reject non-UTF-8

 - it is reasonable to do normalization- and/or case-insensitivity -- i.e., do any normalization and mapping on lookup (or on hashing at create time) but always return the original name on READDIR, with a security considerations note about aliasing

 - it is probably reasonable to not normalize for anything

 - normalize-on-create, mappings of any kind (e.g., case) are a sometimes-desirable, sometimes-not feature, and in particular on most Unix systems (not including OS X w/ HFS+) normalization-on-create and any mappings on create lead to problems, therefore:


 - it is reasonable to not do *any* mappings or normalization on create -- this has to be a choice made with application compatibility in mind

 - all of this on a per-filesystem basis (or even per-directory), NOT on a per-fileserver basis

From experience in the SASL/PRECIS world we know that for *user* and group names it's best to apply any and all normalization and mappings on the server side.  The client need not do anything, though normalizing doesn't hurt (but mappings, on the client side, *do* hurt).

Actually, that applies to filesystem objects as well: the client should do no mappings.

We really need to distinguish:

 - query strings (strings sent by the client to the server)

 - storage strings (what the filesystem/server stores)

 - display strings (what the user sees)

The preparation for query strings should be minimal; the identity function will do, though the client MUST (but won't) convert between
non-UTF-8 and UTF-8, because a) the server SHOULD reject non-UTF-8, and b) codeset aliasing sucks.

The preparation for storage strings is and should be filesystem-specific.  For example, HFS+ normalizes on create to NFD and is case-insensitive on lookup (IIRC).  Nothing an RFC says can change that -- we must not even try.  What we can do is hope to get codeset conversions right by dictating the use of UTF-8 on the wire:
then clients and servers can convert to/from whatever they use locally.  For example, on Solaris 11 FAT and other such filesystems can convert automatically to UTF-16 and/or to various code pages.

In practice clients cannot really do codeset conversions because the only place that such conversions can be done in Unix-like OSes is in the C run-time's system call stubs (the kernel doesn't know about user-land locales, and above the system call stubs is the app, which shouldn't have to do these conversions).  To my knowledge no C library does that.  This can be addressed by running only in UTF-8 locales, but that's not 100% satisfactory, just 95%.  Client implementors that care enough will go and fix their user-land C run-times, though I doubt any will.

> The proposed approach is to base the new RFC3530bis handling of 
> internationalization on the internationlization tretament in 
> draft-ietf-nfsv4-rfc3010bis-04, the last draft for rfc3530 before it 
> was stringprep-ized.  I feel that this is simple enough that we can 
> clearly make the case, with the implementer's help, that this is what 
> current servers and clients do and getting RFC3530bis approved.

I've not read that.  Please see my proposal above.

Nico
--