[nfsv4] Going forward on I18N in RFC3530 bis

<david.noveck@emc.com> Thu, 09 September 2010 22:35 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 43B253A68FD for <nfsv4@core3.amsl.com>; Thu, 9 Sep 2010 15:35:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.376
X-Spam-Level:
X-Spam-Status: No, score=-6.376 tagged_above=-999 required=5 tests=[AWL=0.223, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uYD8lIMaGWkR for <nfsv4@core3.amsl.com>; Thu, 9 Sep 2010 15:35:36 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 1A1E43A681A for <nfsv4@ietf.org>; Thu, 9 Sep 2010 15:35:34 -0700 (PDT)
Received: from hop04-l1d11-si02.isus.emc.com (HOP04-L1D11-SI02.isus.emc.com [10.254.111.55]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o89Ma1E9017334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Thu, 9 Sep 2010 18:36:01 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si02.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Thu, 9 Sep 2010 18:35:58 -0400
Received: from corpussmtp4.corp.emc.com (corpussmtp4.corp.emc.com [10.254.169.197]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o89MZw21019231 for <nfsv4@ietf.org>; Thu, 9 Sep 2010 18:35:58 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.39]) by corpussmtp4.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 9 Sep 2010 18:35:58 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Thu, 09 Sep 2010 18:35:57 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D8002664E38@CORPUSMX50A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Going forward on I18N in RFC3530 bis
Thread-Index: ActQb18btK6G4JYERR+jB0BChD80JA==
From: david.noveck@emc.com
To: nfsv4@ietf.org
X-OriginalArrivalTime: 09 Sep 2010 22:35:58.0007 (UTC) FILETIME=[5F43EC70:01CB506F]
X-EMM-MHVC: 1
Subject: [nfsv4] Going forward on I18N in RFC3530 bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Sep 2010 22:35:57 -0000

David Black (the man behind NFSv4.2 :-) has asked me to summarize the
situation with regard to I18N in RFC3530 and the current plan about what
to do about it going forward in handling it in RFC3530bis.

---- First some pointers:

    The description of I18N is in chapter 11 of RFC3530, 
    page 122 of http://www.ietf.org/rfc/rfc3530.txt 

    The current draft replacement is in the latest draft
    of RFC3530bis, that is, in chapter 12 (pages 160-179)
    http://tools.ietf.org/id/draft-ietf-nfsv4-rfc3530bis-04.txt.
    This is pretty much a rewrite of chapter 12 of the 
    previous draft-03, so looking at the diff is not much help.

---- Background:

The basic problem with chapter 11 of RFC3530 is that it has almost no
relation to what has been actually implemented.  The current form of
chapter 11 reflects political pressures at the time of RFC approval
within the IETF to conform to the stringprep paradigm, and so it is
organized around that.  But implementations started without it, and
never were adjusted to conform to that model, for good reasons,
discussed below.

In the meantime, problems within stringprep have become manifest.  Even
more important is the fact that for the most important string type
subject to I18N issues, filename components, the stringprep-style
approach in its totality does not match the needs of NFSv4.  The issue
is that you can think of the server as a single thing (including the
server code and the file system you are talking to) in which case it
makes sense to define, in exquisite detail, character mapping, and
repertoire rules, so as to provide interoperability down to the most
recondite character-handling details.  

However in fact, server implementations and file-systems are separate
things and one cannot enforce detailed character handling rules on the
file-systems and if one does one limits unacceptably the file systems
that one can use.  And if one does that in front of the file systems, we
interfere with another major goal of NFSv4, proper interoperability with
other network file systems and with local use of those file systems.  If
the protocol imposes rules that are not imposed locally, there may be
valid files you can't get at over NFSv4.

As a result, NFSv4, at least in this regard is better described as a
protocol to pass names from the client to the remote server file system,
making as few modifications as we can.  In fact, this is what people
actually implemented and it differs in a major way from what is
described in chapter 11 of RFC3530.  Thus the need to describe the
reality that clients and servers implement in RFC3530bis.

---- Changes:

This is a brief summary of the changes I introduced.  It is a high-level
summary and I may have forgotten a few things.

Re-organize the string types.  In RFC3530, these had been organized
about stringprep profiles, basically around whether strings
case-sensitive or not, or partially case-sensitive.  The resulted in
very strange conclusions such as applying UTF-8 checking and checking
for characters outside Unicode 3.1 being applied to tags.

Tags are treated opaquely with no UTF-8 checking, Unicode repertoire
checking, normalization-related checking.

There is more clarity about various sorts of strings.  In particular,
string which, for various reasons, do not require internationalization
handling are explicitly called out.

Adopting IDNA handling for domains and servers and simply referencing
those docs for what is OK.  There is the issue of U-labels vs. A-labels.
We allow A-labels or UTF-8 strings whether canonicalized or not.  There
has been some discussion about changing that to U-labels only but that
will only be done if there is working group consensus.

Extensive discussion of the fact that our ability to legislate character
handling for file systems is limited.

Change UTF-8 requirement for filenames from MUST to SHOULD to match
NFSv4.1.

Get rid of requirement that everything be within Unicode 3.1.  Get rid
of requirements that large sets of characters within Unicode 3.1 be
rejected for various reasons. 

Get rid of requirement to map various characters.  SHOULD NOT do
mappings which are problematic for stringprep (German eszett mapped to
'ss', zero-length join and non-join characters mapped to nothing causing
issues Farsi) but MAY use other mappings in that (and by implication no
mappings outside it).

New treatment of normalization.  Allow normalization-sensitive servers
(but warn of difficulties without saying SHOULD NOT), allow
servers/file-systems to choose to normalize NFC or NFD (but not reject
filename in "wrong" normalization as was implied by RFC3530), and also
mention/allow for the first time
normalization/insensitive/normalization-preserving handling of names
(best choice but no SHOULD because this is big change to the file system
ad thus nor really spec's business).

Discussion of how symlink text should be processed and where the
handling differs from file component names.

New treatment of user/group names.  Each domain establishes its own list
of these so there are no repertoire rules.  There is a discussion about
why you should match these based on canonical equivalence, but there is
no n/i-n/p option for these because it would require fs to save 2 (or
sometimes many more) variants of that same user and group in the user
and group attributes and in ACLs.   Nobody is going to do that nor
should they.

I'm sure I've missed some things.  If you notice them, let me know, as
it would be good to maintain somewhere a summary of what was done in
this chapter.

---- Discussion on call:

The big issue discussed was whether we should wait for precis to finish
this up.

David Black came to the conclusion that since precis work was not
proceeding very fast, we should go ahead based on the current draft plus
working group comments, with the potential of an additional update
(RFC3530tris?) when the precis work is finished and can be applied to
NFSv4.

There were arguments about his use of the word "patch" and the probable
relative proportions of updates in RFC3530bis and its successor but no
fundamental disagreement on the basic approach.

I will be working on an update to chapter 12 that will go into a new
draft of rfc3530-bis, targeted at the Beijing deadline.  May be able to
get it out earlier but I hope people will have chance to look at the
current draft and give me their comments.