Re: [nfsv4] Going forward on I18N in RFC3530 bis

<david.noveck@emc.com> Thu, 30 September 2010 01:46 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 182023A6C8A for <nfsv4@core3.amsl.com>; Wed, 29 Sep 2010 18:46:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.614
X-Spam-Level:
X-Spam-Status: No, score=-7.614 tagged_above=-999 required=5 tests=[AWL=0.985, BAYES_00=-2.599, GB_I_INVITATION=-2, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zRxMX2hjdMWo for <nfsv4@core3.amsl.com>; Wed, 29 Sep 2010 18:46:03 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id ECF763A6BE9 for <nfsv4@ietf.org>; Wed, 29 Sep 2010 18:46:02 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o8U1kl6C020484 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Wed, 29 Sep 2010 21:46:47 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Wed, 29 Sep 2010 21:46:36 -0400
Received: from corpussmtp4.corp.emc.com (corpussmtp4.corp.emc.com [10.254.169.197]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o8U1kKmD010391 for <nfsv4@ietf.org>; Wed, 29 Sep 2010 21:46:20 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.39]) by corpussmtp4.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 29 Sep 2010 21:46:20 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 29 Sep 2010 21:46:20 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D800276513B@CORPUSMX50A.corp.emc.com>
In-Reply-To: <7C4DFCE962635144B8FAE8CA11D0BF1E03D1AEDF02@MX14A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Going forward on I18N in RFC3530 bis
Thread-Index: ActQb18btK6G4JYERR+jB0BChD80JAJ9w2+wAA7nWsABY1PRYA==
References: <BF3BB6D12298F54B89C8DCC1E4073D8002664E38@CORPUSMX50A.corp.emc.com> <7C4DFCE962635144B8FAE8CA11D0BF1E03D1AEDD3D@MX14A.corp.emc.com> <7C4DFCE962635144B8FAE8CA11D0BF1E03D1AEDF02@MX14A.corp.emc.com>
From: david.noveck@emc.com
To: david.black@emc.com, nfsv4@ietf.org
X-OriginalArrivalTime: 30 Sep 2010 01:46:20.0334 (UTC) FILETIME=[47C3B4E0:01CB6041]
X-EMM-MHVC: 1
Subject: Re: [nfsv4] Going forward on I18N in RFC3530 bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 01:46:05 -0000

A goblin ate the words in *September"?  If this starts happening in
August, we should really start getting worried about RFC integrity :-) 

-----Original Message-----
From: Black, David 
Sent: Wednesday, September 22, 2010 6:06 PM
To: Noveck, David; nfsv4@ietf.org
Subject: RE: Going forward on I18N in RFC3530 bis

Correction to one of my comments ...

> For strings that SHOULD be UTF-8, but aren't, what's the protocol
requirement?  I think the
> requirement is 8-bit clean (e.g., MUST NOT force the most significant
octet to zero, unless the
> string MUST be ASCII).  That should be stated as part of the string
classification.

That should be "MUST NOT force the most significant bit in each octet to
zero, unless the string MUST be ASCII" - some goblin ate a few words ...

Thanks,
--David

> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
Of david.black@emc.com
> Sent: Wednesday, September 22, 2010 11:49 AM
> To: Noveck, David; nfsv4@ietf.org
> Subject: Re: [nfsv4] Going forward on I18N in RFC3530 bis
> 
> Dave,
> 
> I reviewed the i18n material in -04 (Section 12).  it looks fairly
good, but the details are now
> beyond my level of i18n expertise.  I suggest that we get a real i18n
expert to review this section
> in the next version of the draft - I have a couple of candidate
reviewers in mind.  Many thanks for
> the extensive effort that has clearly gone into this.
> 
> I have one basic disagreement that should not come as a surprise ;-)
...
> 
> My current view of A-labels vs. U-labels is that I'm going to (try to)
insist on no A-labels,
> *unless* there is important "running code" that depends on A-labels on
the wire and that needs to be
> grandfathered.  A-labels exist because the DNS infrastructure is
fundamentally ASCII.  Since NFSv4
> is UTF-8 capable, A-labels on the wire are just plain wrong in
principle, IMHO.  FWIW, I don't care
> whether it's possible to get the current A-label approach blessed by
the IETF's i18n gurus.  This
> turns up in 12.6 as "MAY be in the form of an A-label".  My preference
is that A-labels on the wire
> be "MUST NOT" - if there's important "running code", I might settle
for "SHOULD NOT" with an
> explanation of the "running code" that requires ignoring that "SHOULD
NOT" in order to keep that
> "running code" happy.
> 
> Comments:
> 
> For strings that SHOULD be UTF-8, but aren't, what's the protocol
requirement?  I think the
> requirement is 8-bit clean (e.g., MUST NOT force the most significant
octet to zero, unless the
> string MUST be ASCII).  That should be stated as part of the string
classification.
> 
> The redefinition of "SHOULD" in 12.2.2 is an invitation to confusion.
I suggest:
> 	SHOULD -> USHOULD, VSHOULD -> UVSHOULD & VMUST -> UVMUST
> plus use of capitalized SHOULD/MUST in defining these terms.
> 
> The first paragraph of 12.3 does not distinguish utf8_should strings
from utf8val_should strings -
> the "SHOULD" requirement to return an error if the string is not UTF-8
conflicts with the statement
> that utf8_should strings are not checked for UTF-8 validity - I think
that error return requirement
> applies only to utf8val_should strings.
> 
> 12.4.2 suggests that NFSv4 supports hex-encoded text forms of IPv4
addresses.  Is that correct
> and/or needed?  The usual textual form of IPv4 addresses is decimal
encoding.
> 
> 12.7.1.2:
> 
>    However, in any of the following situations, file names have to be
>    treated as strings of characters and servers MUST return
>    NFS4ERR_INVAL when file names that are not in UTF-8 format:
> 
> Would "characters" -> "Unicode characters" be consistent with what was
intended?  If so, that change
> would make the text clearer.  If not, I'm confused.
> 
> 12.7.1.3 uses lower-case "must" and "should".  Is that deliberate vs.
upper-case?  In general,
> double-check all uses of lower-case "must" and "should" to make sure
that they are intended.
> 
> 12.7.1.5.2 would be improved by examples of what clients should and/or
should not do in order to
> improve interoperability with servers that do not handle normalization
in the fashion that the
> client expects.
> 
> 12.7.2 - If link text is utf8_should, servers aren't supposed to check
for valid UTF8.  Based on
> 12.2.3, it looks like link text is utf8val_should, for which this
check is appropriate.
> 
> Nits:
> - Saw one instance of NFKC garbled to NKFC.
> 
> Thanks,
> --David
> 
> > -----Original Message-----
> > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On
Behalf Of david.noveck@emc.com
> > Sent: Thursday, September 09, 2010 6:36 PM
> > To: nfsv4@ietf.org
> > Subject: [nfsv4] Going forward on I18N in RFC3530 bis
> >
> > David Black (the man behind NFSv4.2 :-) has asked me to summarize
the
> > situation with regard to I18N in RFC3530 and the current plan about
what
> > to do about it going forward in handling it in RFC3530bis.
> >
> > ---- First some pointers:
> >
> >     The description of I18N is in chapter 11 of RFC3530,
> >     page 122 of http://www.ietf.org/rfc/rfc3530.txt
> >
> >     The current draft replacement is in the latest draft
> >     of RFC3530bis, that is, in chapter 12 (pages 160-179)
> >     http://tools.ietf.org/id/draft-ietf-nfsv4-rfc3530bis-04.txt.
> >     This is pretty much a rewrite of chapter 12 of the
> >     previous draft-03, so looking at the diff is not much help.
> >
> > ---- Background:
> >
> > The basic problem with chapter 11 of RFC3530 is that it has almost
no
> > relation to what has been actually implemented.  The current form of
> > chapter 11 reflects political pressures at the time of RFC approval
> > within the IETF to conform to the stringprep paradigm, and so it is
> > organized around that.  But implementations started without it, and
> > never were adjusted to conform to that model, for good reasons,
> > discussed below.
> >
> > In the meantime, problems within stringprep have become manifest.
Even
> > more important is the fact that for the most important string type
> > subject to I18N issues, filename components, the stringprep-style
> > approach in its totality does not match the needs of NFSv4.  The
issue
> > is that you can think of the server as a single thing (including the
> > server code and the file system you are talking to) in which case it
> > makes sense to define, in exquisite detail, character mapping, and
> > repertoire rules, so as to provide interoperability down to the most
> > recondite character-handling details.
> >
> > However in fact, server implementations and file-systems are
separate
> > things and one cannot enforce detailed character handling rules on
the
> > file-systems and if one does one limits unacceptably the file
systems
> > that one can use.  And if one does that in front of the file
systems, we
> > interfere with another major goal of NFSv4, proper interoperability
with
> > other network file systems and with local use of those file systems.
If
> > the protocol imposes rules that are not imposed locally, there may
be
> > valid files you can't get at over NFSv4.
> >
> > As a result, NFSv4, at least in this regard is better described as a
> > protocol to pass names from the client to the remote server file
system,
> > making as few modifications as we can.  In fact, this is what people
> > actually implemented and it differs in a major way from what is
> > described in chapter 11 of RFC3530.  Thus the need to describe the
> > reality that clients and servers implement in RFC3530bis.
> >
> > ---- Changes:
> >
> > This is a brief summary of the changes I introduced.  It is a
high-level
> > summary and I may have forgotten a few things.
> >
> > Re-organize the string types.  In RFC3530, these had been organized
> > about stringprep profiles, basically around whether strings
> > case-sensitive or not, or partially case-sensitive.  The resulted in
> > very strange conclusions such as applying UTF-8 checking and
checking
> > for characters outside Unicode 3.1 being applied to tags.
> >
> > Tags are treated opaquely with no UTF-8 checking, Unicode repertoire
> > checking, normalization-related checking.
> >
> > There is more clarity about various sorts of strings.  In
particular,
> > string which, for various reasons, do not require
internationalization
> > handling are explicitly called out.
> >
> > Adopting IDNA handling for domains and servers and simply
referencing
> > those docs for what is OK.  There is the issue of U-labels vs.
A-labels.
> > We allow A-labels or UTF-8 strings whether canonicalized or not.
There
> > has been some discussion about changing that to U-labels only but
that
> > will only be done if there is working group consensus.
> >
> > Extensive discussion of the fact that our ability to legislate
character
> > handling for file systems is limited.
> >
> > Change UTF-8 requirement for filenames from MUST to SHOULD to match
> > NFSv4.1.
> >
> > Get rid of requirement that everything be within Unicode 3.1.  Get
rid
> > of requirements that large sets of characters within Unicode 3.1 be
> > rejected for various reasons.
> >
> > Get rid of requirement to map various characters.  SHOULD NOT do
> > mappings which are problematic for stringprep (German eszett mapped
to
> > 'ss', zero-length join and non-join characters mapped to nothing
causing
> > issues Farsi) but MAY use other mappings in that (and by implication
no
> > mappings outside it).
> >
> > New treatment of normalization.  Allow normalization-sensitive
servers
> > (but warn of difficulties without saying SHOULD NOT), allow
> > servers/file-systems to choose to normalize NFC or NFD (but not
reject
> > filename in "wrong" normalization as was implied by RFC3530), and
also
> > mention/allow for the first time
> > normalization/insensitive/normalization-preserving handling of names
> > (best choice but no SHOULD because this is big change to the file
system
> > ad thus nor really spec's business).
> >
> > Discussion of how symlink text should be processed and where the
> > handling differs from file component names.
> >
> > New treatment of user/group names.  Each domain establishes its own
list
> > of these so there are no repertoire rules.  There is a discussion
about
> > why you should match these based on canonical equivalence, but there
is
> > no n/i-n/p option for these because it would require fs to save 2
(or
> > sometimes many more) variants of that same user and group in the
user
> > and group attributes and in ACLs.   Nobody is going to do that nor
> > should they.
> >
> > I'm sure I've missed some things.  If you notice them, let me know,
as
> > it would be good to maintain somewhere a summary of what was done in
> > this chapter.
> >
> > ---- Discussion on call:
> >
> > The big issue discussed was whether we should wait for precis to
finish
> > this up.
> >
> > David Black came to the conclusion that since precis work was not
> > proceeding very fast, we should go ahead based on the current draft
plus
> > working group comments, with the potential of an additional update
> > (RFC3530tris?) when the precis work is finished and can be applied
to
> > NFSv4.
> >
> > There were arguments about his use of the word "patch" and the
probable
> > relative proportions of updates in RFC3530bis and its successor but
no
> > fundamental disagreement on the basic approach.
> >
> > I will be working on an update to chapter 12 that will go into a new
> > draft of rfc3530-bis, targeted at the Beijing deadline.  May be able
to
> > get it out earlier but I hope people will have chance to look at the
> > current draft and give me their comments.
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4