Re: [nfsv4] Going forward on I18N in RFC3530 bis

Correction to one of my comments ...

> For strings that SHOULD be UTF-8, but aren't, what's the protocol requirement?  I think the
> requirement is 8-bit clean (e.g., MUST NOT force the most significant octet to zero, unless the
> string MUST be ASCII).  That should be stated as part of the string classification.

That should be "MUST NOT force the most significant bit in each octet to zero, unless the string MUST be ASCII" - some goblin ate a few words ...

Thanks,
--David

> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of david.black@emc.com
> Sent: Wednesday, September 22, 2010 11:49 AM
> To: Noveck, David; nfsv4@ietf.org
> Subject: Re: [nfsv4] Going forward on I18N in RFC3530 bis
> 
> Dave,
> 
> I reviewed the i18n material in -04 (Section 12).  it looks fairly good, but the details are now
> beyond my level of i18n expertise.  I suggest that we get a real i18n expert to review this section
> in the next version of the draft - I have a couple of candidate reviewers in mind.  Many thanks for
> the extensive effort that has clearly gone into this.
> 
> I have one basic disagreement that should not come as a surprise ;-) ...
> 
> My current view of A-labels vs. U-labels is that I'm going to (try to) insist on no A-labels,
> *unless* there is important "running code" that depends on A-labels on the wire and that needs to be
> grandfathered.  A-labels exist because the DNS infrastructure is fundamentally ASCII.  Since NFSv4
> is UTF-8 capable, A-labels on the wire are just plain wrong in principle, IMHO.  FWIW, I don't care
> whether it's possible to get the current A-label approach blessed by the IETF's i18n gurus.  This
> turns up in 12.6 as "MAY be in the form of an A-label".  My preference is that A-labels on the wire
> be "MUST NOT" - if there's important "running code", I might settle for "SHOULD NOT" with an
> explanation of the "running code" that requires ignoring that "SHOULD NOT" in order to keep that
> "running code" happy.
> 
> Comments:
> 
> For strings that SHOULD be UTF-8, but aren't, what's the protocol requirement?  I think the
> requirement is 8-bit clean (e.g., MUST NOT force the most significant octet to zero, unless the
> string MUST be ASCII).  That should be stated as part of the string classification.
> 
> The redefinition of "SHOULD" in 12.2.2 is an invitation to confusion.  I suggest:
> 	SHOULD -> USHOULD, VSHOULD -> UVSHOULD & VMUST -> UVMUST
> plus use of capitalized SHOULD/MUST in defining these terms.
> 
> The first paragraph of 12.3 does not distinguish utf8_should strings from utf8val_should strings -
> the "SHOULD" requirement to return an error if the string is not UTF-8 conflicts with the statement
> that utf8_should strings are not checked for UTF-8 validity - I think that error return requirement
> applies only to utf8val_should strings.
> 
> 12.4.2 suggests that NFSv4 supports hex-encoded text forms of IPv4 addresses.  Is that correct
> and/or needed?  The usual textual form of IPv4 addresses is decimal encoding.
> 
> 12.7.1.2:
> 
>    However, in any of the following situations, file names have to be
>    treated as strings of characters and servers MUST return
>    NFS4ERR_INVAL when file names that are not in UTF-8 format:
> 
> Would "characters" -> "Unicode characters" be consistent with what was intended?  If so, that change
> would make the text clearer.  If not, I'm confused.
> 
> 12.7.1.3 uses lower-case "must" and "should".  Is that deliberate vs. upper-case?  In general,
> double-check all uses of lower-case "must" and "should" to make sure that they are intended.
> 
> 12.7.1.5.2 would be improved by examples of what clients should and/or should not do in order to
> improve interoperability with servers that do not handle normalization in the fashion that the
> client expects.
> 
> 12.7.2 - If link text is utf8_should, servers aren't supposed to check for valid UTF8.  Based on
> 12.2.3, it looks like link text is utf8val_should, for which this check is appropriate.
> 
> Nits:
> - Saw one instance of NFKC garbled to NKFC.
> 
> Thanks,
> --David
> 
> > -----Original Message-----
> > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of david.noveck@emc.com
> > Sent: Thursday, September 09, 2010 6:36 PM
> > To: nfsv4@ietf.org
> > Subject: [nfsv4] Going forward on I18N in RFC3530 bis
> >
> > David Black (the man behind NFSv4.2 :-) has asked me to summarize the
> > situation with regard to I18N in RFC3530 and the current plan about what
> > to do about it going forward in handling it in RFC3530bis.
> >
> > ---- First some pointers:
> >
> >     The description of I18N is in chapter 11 of RFC3530,
> >     page 122 of http://www.ietf.org/rfc/rfc3530.txt
> >
> >     The current draft replacement is in the latest draft
> >     of RFC3530bis, that is, in chapter 12 (pages 160-179)
> >     http://tools.ietf.org/id/draft-ietf-nfsv4-rfc3530bis-04.txt.
> >     This is pretty much a rewrite of chapter 12 of the
> >     previous draft-03, so looking at the diff is not much help.
> >
> > ---- Background:
> >
> > The basic problem with chapter 11 of RFC3530 is that it has almost no
> > relation to what has been actually implemented.  The current form of
> > chapter 11 reflects political pressures at the time of RFC approval
> > within the IETF to conform to the stringprep paradigm, and so it is
> > organized around that.  But implementations started without it, and
> > never were adjusted to conform to that model, for good reasons,
> > discussed below.
> >
> > In the meantime, problems within stringprep have become manifest.  Even
> > more important is the fact that for the most important string type
> > subject to I18N issues, filename components, the stringprep-style
> > approach in its totality does not match the needs of NFSv4.  The issue
> > is that you can think of the server as a single thing (including the
> > server code and the file system you are talking to) in which case it
> > makes sense to define, in exquisite detail, character mapping, and
> > repertoire rules, so as to provide interoperability down to the most
> > recondite character-handling details.
> >
> > However in fact, server implementations and file-systems are separate
> > things and one cannot enforce detailed character handling rules on the
> > file-systems and if one does one limits unacceptably the file systems
> > that one can use.  And if one does that in front of the file systems, we
> > interfere with another major goal of NFSv4, proper interoperability with
> > other network file systems and with local use of those file systems.  If
> > the protocol imposes rules that are not imposed locally, there may be
> > valid files you can't get at over NFSv4.
> >
> > As a result, NFSv4, at least in this regard is better described as a
> > protocol to pass names from the client to the remote server file system,
> > making as few modifications as we can.  In fact, this is what people
> > actually implemented and it differs in a major way from what is
> > described in chapter 11 of RFC3530.  Thus the need to describe the
> > reality that clients and servers implement in RFC3530bis.
> >
> > ---- Changes:
> >
> > This is a brief summary of the changes I introduced.  It is a high-level
> > summary and I may have forgotten a few things.
> >
> > Re-organize the string types.  In RFC3530, these had been organized
> > about stringprep profiles, basically around whether strings
> > case-sensitive or not, or partially case-sensitive.  The resulted in
> > very strange conclusions such as applying UTF-8 checking and checking
> > for characters outside Unicode 3.1 being applied to tags.
> >
> > Tags are treated opaquely with no UTF-8 checking, Unicode repertoire
> > checking, normalization-related checking.
> >
> > There is more clarity about various sorts of strings.  In particular,
> > string which, for various reasons, do not require internationalization
> > handling are explicitly called out.
> >
> > Adopting IDNA handling for domains and servers and simply referencing
> > those docs for what is OK.  There is the issue of U-labels vs. A-labels.
> > We allow A-labels or UTF-8 strings whether canonicalized or not.  There
> > has been some discussion about changing that to U-labels only but that
> > will only be done if there is working group consensus.
> >
> > Extensive discussion of the fact that our ability to legislate character
> > handling for file systems is limited.
> >
> > Change UTF-8 requirement for filenames from MUST to SHOULD to match
> > NFSv4.1.
> >
> > Get rid of requirement that everything be within Unicode 3.1.  Get rid
> > of requirements that large sets of characters within Unicode 3.1 be
> > rejected for various reasons.
> >
> > Get rid of requirement to map various characters.  SHOULD NOT do
> > mappings which are problematic for stringprep (German eszett mapped to
> > 'ss', zero-length join and non-join characters mapped to nothing causing
> > issues Farsi) but MAY use other mappings in that (and by implication no
> > mappings outside it).
> >
> > New treatment of normalization.  Allow normalization-sensitive servers
> > (but warn of difficulties without saying SHOULD NOT), allow
> > servers/file-systems to choose to normalize NFC or NFD (but not reject
> > filename in "wrong" normalization as was implied by RFC3530), and also
> > mention/allow for the first time
> > normalization/insensitive/normalization-preserving handling of names
> > (best choice but no SHOULD because this is big change to the file system
> > ad thus nor really spec's business).
> >
> > Discussion of how symlink text should be processed and where the
> > handling differs from file component names.
> >
> > New treatment of user/group names.  Each domain establishes its own list
> > of these so there are no repertoire rules.  There is a discussion about
> > why you should match these based on canonical equivalence, but there is
> > no n/i-n/p option for these because it would require fs to save 2 (or
> > sometimes many more) variants of that same user and group in the user
> > and group attributes and in ACLs.   Nobody is going to do that nor
> > should they.
> >
> > I'm sure I've missed some things.  If you notice them, let me know, as
> > it would be good to maintain somewhere a summary of what was done in
> > this chapter.
> >
> > ---- Discussion on call:
> >
> > The big issue discussed was whether we should wait for precis to finish
> > this up.
> >
> > David Black came to the conclusion that since precis work was not
> > proceeding very fast, we should go ahead based on the current draft plus
> > working group comments, with the potential of an additional update
> > (RFC3530tris?) when the precis work is finished and can be applied to
> > NFSv4.
> >
> > There were arguments about his use of the word "patch" and the probable
> > relative proportions of updates in RFC3530bis and its successor but no
> > fundamental disagreement on the basic approach.
> >
> > I will be working on an update to chapter 12 that will go into a new
> > draft of rfc3530-bis, targeted at the Beijing deadline.  May be able to
> > get it out earlier but I hope people will have chance to look at the
> > current draft and give me their comments.
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4