Re: [nfsv4] Going forward on I18N in RFC3530 bis

I've addressed David's comments.  Thanks, again.

The status indications refer to my current private copy of the XML which
I'm proofreading and will send out pretty soon.

Reminding people that the draft deadline is 10/25 and I've established
two weeks before (10/11) as the time by which I expect comments so that
I can make sure they get into the next drafts of RFC3530bis.  Not saying
I can't negotiate 10/12 or 12/13 but let me know if you have stuff that
might be a bit late.

> For strings that SHOULD be UTF-8, but aren't, what's the 
> protocol requirement?  I think the requirement is 8-bit 
> clean (e.g., MUST NOT force the most significant octet to 
> zero, unless the string MUST be ASCII).  That should be 
> stated as part of the string classification.

Guy asks me.  What's the protocol requirement?  Lemme put it this way.

Hey man, I send you some bytes.  Don't mess with them.  Changing bits
'cause you feel like it?   Who do you think you are, man?  I send you my
bytes and you take care of 'em.  They ain't yours to screw around with.
There have been guys who tried to mess with my bytes and you haven't
seen them around lately, have you.

I'll put that in more spec-suitable language.  

> The redefinition of "SHOULD" in 12.2.2 is an invitation 
> to confusion.  I suggest:
>	SHOULD -> USHOULD, VSHOULD -> UVSHOULD & VMUST -> UVMUST
> plus use of capitalized SHOULD/MUST in defining these terms.

Done.

> The first paragraph of 12.3 does not distinguish utf8_should 
> strings from utf8val_should strings - the "SHOULD" requirement 
> to return an error if the string is not UTF-8 conflicts with 
> the statement that utf8_should strings are not checked for 
> UTF-8 validity - I think that error return requirement applies 
> only to utf8val_should strings.

Done.

> 12.4.2 suggests that NFSv4 supports hex-encoded text forms 
> of IPv4 addresses.  Is that correct and/or needed?  The 
> usual textual form of IPv4 addresses is decimal encoding.

Fixed.  I wrote that I was too tired or perhaps it was after somebody
sent something that caused my head to explode :-)

> 12.7.1.2:
>
>    However, in any of the following situations, file names have to be
>   treated as strings of characters and servers MUST return
>   NFS4ERR_INVAL when file names that are not in UTF-8 format:
>
> Would "characters" -> "Unicode characters" be consistent 
> with what was intended?  If so, that change would make the 
> text clearer.  If not, I'm confused.

Fixed.

> 12.7.1.3 uses lower-case "must" and "should".  Is that 
> deliberate vs. upper-case.  In general, double-check all 
> uses of lower-case "must" and "should" to make sure that 
> they are intended.

It's a result of being a child of the 60's/70's.  It's the "Thou shalt
not 'shalt' and 'shalt not'" approach.

Fixed plus a few more "musts".

> 12.7.1.5.2 would be improved by examples of what clients 
> should and/or should not do in order to improve interoperability 
> with servers that do not handle normalization in the fashion that 
> the client expects.

I've added a lot material.  This goes at least a considerable way to
address the issue.  Tell me if you see anything missing.

> 12.7.2 - If link text is utf8_should, servers aren't 
> supposed to check for valid UTF8.  Based on 12.2.3, it 
> looks like link text is utf8val_should, for which this 
> check is appropriate.

Fixed.

> Nits:
> - Saw one instance of NFKC garbled to NKFC 

Fixed.

-----Original Message-----
From: Black, David 
Sent: Wednesday, September 22, 2010 11:49 AM
To: Noveck, David; nfsv4@ietf.org
Subject: RE: Going forward on I18N in RFC3530 bis

Dave,

I reviewed the i18n material in -04 (Section 12).  it looks fairly good,
but the details are now beyond my level of i18n expertise.  I suggest
that we get a real i18n expert to review this section in the next
version of the draft - I have a couple of candidate reviewers in mind.
Many thanks for the extensive effort that has clearly gone into this.

I have one basic disagreement that should not come as a surprise ;-) ...

My current view of A-labels vs. U-labels is that I'm going to (try to)
insist on no A-labels, *unless* there is important "running code" that
depends on A-labels on the wire and that needs to be grandfathered.
A-labels exist because the DNS infrastructure is fundamentally ASCII.
Since NFSv4 is UTF-8 capable, A-labels on the wire are just plain wrong
in principle, IMHO.  FWIW, I don't care whether it's possible to get the
current A-label approach blessed by the IETF's i18n gurus.  This turns
up in 12.6 as "MAY be in the form of an A-label".  My preference is that
A-labels on the wire be "MUST NOT" - if there's important "running
code", I might settle for "SHOULD NOT" with an explanation of the
"running code" that requires ignoring that "SHOULD NOT" in order to keep
that "running code" happy.

Comments: 

For strings that SHOULD be UTF-8, but aren't, what's the protocol
requirement?  I think the requirement is 8-bit clean (e.g., MUST NOT
force the most significant octet to zero, unless the string MUST be
ASCII).  That should be stated as part of the string classification.

The redefinition of "SHOULD" in 12.2.2 is an invitation to confusion.  I
suggest:
	SHOULD -> USHOULD, VSHOULD -> UVSHOULD & VMUST -> UVMUST
plus use of capitalized SHOULD/MUST in defining these terms.

The first paragraph of 12.3 does not distinguish utf8_should strings
from utf8val_should strings - the "SHOULD" requirement to return an
error if the string is not UTF-8 conflicts with the statement that
utf8_should strings are not checked for UTF-8 validity - I think that
error return requirement applies only to utf8val_should strings.

12.4.2 suggests that NFSv4 supports hex-encoded text forms of IPv4
addresses.  Is that correct and/or needed?  The usual textual form of
IPv4 addresses is decimal encoding.

12.7.1.2:

   However, in any of the following situations, file names have to be
   treated as strings of characters and servers MUST return
   NFS4ERR_INVAL when file names that are not in UTF-8 format:

Would "characters" -> "Unicode characters" be consistent with what was
intended?  If so, that change would make the text clearer.  If not, I'm
confused.

12.7.1.3 uses lower-case "must" and "should".  Is that deliberate vs.
upper-case.  In general, double-check all uses of lower-case "must" and
"should" to make sure that they are intended.

12.7.1.5.2 would be improved by examples of what clients should and/or
should not do in order to improve interoperability with servers that do
not handle normalization in the fashion that the client expects.

12.7.2 - If link text is utf8_should, servers aren't supposed to check
for valid UTF8.  Based on 12.2.3, it looks like link text is
utf8val_should, for which this check is appropriate.

Nits:
- Saw one instance of NFKC garbled to NKFC.

Thanks,
--David

> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
Of david.noveck@emc.com
> Sent: Thursday, September 09, 2010 6:36 PM
> To: nfsv4@ietf.org
> Subject: [nfsv4] Going forward on I18N in RFC3530 bis
> 
> David Black (the man behind NFSv4.2 :-) has asked me to summarize the
> situation with regard to I18N in RFC3530 and the current plan about
what
> to do about it going forward in handling it in RFC3530bis.
> 
> ---- First some pointers:
> 
>     The description of I18N is in chapter 11 of RFC3530,
>     page 122 of http://www.ietf.org/rfc/rfc3530.txt
> 
>     The current draft replacement is in the latest draft
>     of RFC3530bis, that is, in chapter 12 (pages 160-179)
>     http://tools.ietf.org/id/draft-ietf-nfsv4-rfc3530bis-04.txt.
>     This is pretty much a rewrite of chapter 12 of the
>     previous draft-03, so looking at the diff is not much help.
> 
> ---- Background:
> 
> The basic problem with chapter 11 of RFC3530 is that it has almost no
> relation to what has been actually implemented.  The current form of
> chapter 11 reflects political pressures at the time of RFC approval
> within the IETF to conform to the stringprep paradigm, and so it is
> organized around that.  But implementations started without it, and
> never were adjusted to conform to that model, for good reasons,
> discussed below.
> 
> In the meantime, problems within stringprep have become manifest.
Even
> more important is the fact that for the most important string type
> subject to I18N issues, filename components, the stringprep-style
> approach in its totality does not match the needs of NFSv4.  The issue
> is that you can think of the server as a single thing (including the
> server code and the file system you are talking to) in which case it
> makes sense to define, in exquisite detail, character mapping, and
> repertoire rules, so as to provide interoperability down to the most
> recondite character-handling details.
> 
> However in fact, server implementations and file-systems are separate
> things and one cannot enforce detailed character handling rules on the
> file-systems and if one does one limits unacceptably the file systems
> that one can use.  And if one does that in front of the file systems,
we
> interfere with another major goal of NFSv4, proper interoperability
with
> other network file systems and with local use of those file systems.
If
> the protocol imposes rules that are not imposed locally, there may be
> valid files you can't get at over NFSv4.
> 
> As a result, NFSv4, at least in this regard is better described as a
> protocol to pass names from the client to the remote server file
system,
> making as few modifications as we can.  In fact, this is what people
> actually implemented and it differs in a major way from what is
> described in chapter 11 of RFC3530.  Thus the need to describe the
> reality that clients and servers implement in RFC3530bis.
> 
> ---- Changes:
> 
> This is a brief summary of the changes I introduced.  It is a
high-level
> summary and I may have forgotten a few things.
> 
> Re-organize the string types.  In RFC3530, these had been organized
> about stringprep profiles, basically around whether strings
> case-sensitive or not, or partially case-sensitive.  The resulted in
> very strange conclusions such as applying UTF-8 checking and checking
> for characters outside Unicode 3.1 being applied to tags.
> 
> Tags are treated opaquely with no UTF-8 checking, Unicode repertoire
> checking, normalization-related checking.
> 
> There is more clarity about various sorts of strings.  In particular,
> string which, for various reasons, do not require internationalization
> handling are explicitly called out.
> 
> Adopting IDNA handling for domains and servers and simply referencing
> those docs for what is OK.  There is the issue of U-labels vs.
A-labels.
> We allow A-labels or UTF-8 strings whether canonicalized or not.
There
> has been some discussion about changing that to U-labels only but that
> will only be done if there is working group consensus.
> 
> Extensive discussion of the fact that our ability to legislate
character
> handling for file systems is limited.
> 
> Change UTF-8 requirement for filenames from MUST to SHOULD to match
> NFSv4.1.
> 
> Get rid of requirement that everything be within Unicode 3.1.  Get rid
> of requirements that large sets of characters within Unicode 3.1 be
> rejected for various reasons.
> 
> Get rid of requirement to map various characters.  SHOULD NOT do
> mappings which are problematic for stringprep (German eszett mapped to
> 'ss', zero-length join and non-join characters mapped to nothing
causing
> issues Farsi) but MAY use other mappings in that (and by implication
no
> mappings outside it).
> 
> New treatment of normalization.  Allow normalization-sensitive servers
> (but warn of difficulties without saying SHOULD NOT), allow
> servers/file-systems to choose to normalize NFC or NFD (but not reject
> filename in "wrong" normalization as was implied by RFC3530), and also
> mention/allow for the first time
> normalization/insensitive/normalization-preserving handling of names
> (best choice but no SHOULD because this is big change to the file
system
> ad thus nor really spec's business).
> 
> Discussion of how symlink text should be processed and where the
> handling differs from file component names.
> 
> New treatment of user/group names.  Each domain establishes its own
list
> of these so there are no repertoire rules.  There is a discussion
about
> why you should match these based on canonical equivalence, but there
is
> no n/i-n/p option for these because it would require fs to save 2 (or
> sometimes many more) variants of that same user and group in the user
> and group attributes and in ACLs.   Nobody is going to do that nor
> should they.
> 
> I'm sure I've missed some things.  If you notice them, let me know, as
> it would be good to maintain somewhere a summary of what was done in
> this chapter.
> 
> ---- Discussion on call:
> 
> The big issue discussed was whether we should wait for precis to
finish
> this up.
> 
> David Black came to the conclusion that since precis work was not
> proceeding very fast, we should go ahead based on the current draft
plus
> working group comments, with the potential of an additional update
> (RFC3530tris?) when the precis work is finished and can be applied to
> NFSv4.
> 
> There were arguments about his use of the word "patch" and the
probable
> relative proportions of updates in RFC3530bis and its successor but no
> fundamental disagreement on the basic approach.
> 
> I will be working on an update to chapter 12 that will go into a new
> draft of rfc3530-bis, targeted at the Beijing deadline.  May be able
to
> get it out earlier but I hope people will have chance to look at the
> current draft and give me their comments.
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4