Re: [nfsv4] Going forward on I18N in RFC3530 bis
<david.black@emc.com> Wed, 06 October 2010 17:20 UTC
Return-Path: <david.black@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 764EA3A6FCD for <nfsv4@core3.amsl.com>; Wed, 6 Oct 2010 10:20:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -107.469
X-Spam-Level:
X-Spam-Status: No, score=-107.469 tagged_above=-999 required=5 tests=[AWL=1.130, BAYES_00=-2.599, GB_I_INVITATION=-2, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z7W0fWZ6SsyM for <nfsv4@core3.amsl.com>; Wed, 6 Oct 2010 10:20:42 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 751C33A6FB1 for <nfsv4@ietf.org>; Wed, 6 Oct 2010 10:20:42 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o96HLf0o005876 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Wed, 6 Oct 2010 13:21:42 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Wed, 6 Oct 2010 13:21:33 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o96HLQSM011717 for <nfsv4@ietf.org>; Wed, 6 Oct 2010 13:21:27 -0400
Received: from mxhub05.corp.emc.com ([128.221.46.113]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 6 Oct 2010 13:21:10 -0400
Received: from mx14a.corp.emc.com ([169.254.1.11]) by mxhub05.corp.emc.com ([128.221.46.113]) with mapi; Wed, 6 Oct 2010 13:21:09 -0400
From: david.black@emc.com
To: david.noveck@emc.com, nfsv4@ietf.org
Date: Wed, 06 Oct 2010 13:21:00 -0400
Thread-Topic: Going forward on I18N in RFC3530 bis
Thread-Index: ActQb18btK6G4JYERR+jB0BChD80JAJ9w2+wAXLTnVABUjM2kA==
Message-ID: <7C4DFCE962635144B8FAE8CA11D0BF1E03D1BD0B99@MX14A.corp.emc.com>
References: <BF3BB6D12298F54B89C8DCC1E4073D8002664E38@CORPUSMX50A.corp.emc.com> <7C4DFCE962635144B8FAE8CA11D0BF1E03D1AEDD3D@MX14A.corp.emc.com> <BF3BB6D12298F54B89C8DCC1E4073D80027DD09A@CORPUSMX50A.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D80027DD09A@CORPUSMX50A.corp.emc.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-cr-puzzleid: {65008F91-0926-407C-916A-ADD5382D3261}
x-cr-hashedpuzzle: AMmO Bitg Dmmx D7a5 D+1f EB5M EQNc Ez+h HBL5 HGUQ H83U IFS0 JftA Jtax KArt KZ+h; 1; bgBmAHMAdgA0AEAAaQBlAHQAZgAuAG8AcgBnAA==; Sosha1_v1; 7; {65008F91-0926-407C-916A-ADD5382D3261}; ZABhAHYAaQBkAC4AYgBsAGEAYwBrAEAAZQBtAGMALgBjAG8AbQA=; Wed, 06 Oct 2010 17:21:00 GMT; UgBFADoAIABHAG8AaQBuAGcAIABmAG8AcgB3AGEAcgBkACAAbwBuACAASQAxADgATgAgAGkAbgAgAFIARgBDADMANQAzADAAIABiAGkAcwA=
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginalArrivalTime: 06 Oct 2010 17:21:10.0736 (UTC) FILETIME=[DEBA5500:01CB657A]
X-EMM-MHVC: 1
Subject: Re: [nfsv4] Going forward on I18N in RFC3530 bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Oct 2010 17:20:44 -0000
Dave, Sounds good - one minor comment ... > > For strings that SHOULD be UTF-8, but aren't, what's the > > protocol requirement? I think the requirement is 8-bit > > clean (e.g., MUST NOT force the most significant octet to > > zero, unless the string MUST be ASCII). That should be > > stated as part of the string classification. > > Guy asks me. What's the protocol requirement? Lemme put it this way. > > Hey man, I send you some bytes. Don't mess with them. Changing bits 'cause you feel like it? Who > do you think you are, man? I send you my bytes and you take care of 'em. They ain't yours to screw > around with. There have been guys who tried to mess with my bytes and you haven't seen them around > lately, have you. > > I'll put that in more spec-suitable language. That looks like 8-bit clean to me, backed by a statement that such a string MUST NOT be assumed to be 7-bit ASCII ;-). Thanks, --David > -----Original Message----- > From: Noveck, David > Sent: Wednesday, October 06, 2010 1:11 PM > To: Black, David; nfsv4@ietf.org > Subject: RE: Going forward on I18N in RFC3530 bis > > I've addressed David's comments. Thanks, again. > > The status indications refer to my current private copy of the XML which I'm proofreading and will > send out pretty soon. > > Reminding people that the draft deadline is 10/25 and I've established two weeks before (10/11) as > the time by which I expect comments so that I can make sure they get into the next drafts of > RFC3530bis. Not saying I can't negotiate 10/12 or 12/13 but let me know if you have stuff that > might be a bit late. > > > > For strings that SHOULD be UTF-8, but aren't, what's the > > protocol requirement? I think the requirement is 8-bit > > clean (e.g., MUST NOT force the most significant octet to > > zero, unless the string MUST be ASCII). That should be > > stated as part of the string classification. > > Guy asks me. What's the protocol requirement? Lemme put it this way. > > Hey man, I send you some bytes. Don't mess with them. Changing bits 'cause you feel like it? Who > do you think you are, man? I send you my bytes and you take care of 'em. They ain't yours to screw > around with. There have been guys who tried to mess with my bytes and you haven't seen them around > lately, have you. > > I'll put that in more spec-suitable language. > > > The redefinition of "SHOULD" in 12.2.2 is an invitation > > to confusion. I suggest: > > SHOULD -> USHOULD, VSHOULD -> UVSHOULD & VMUST -> UVMUST > > plus use of capitalized SHOULD/MUST in defining these terms. > > Done. > > > The first paragraph of 12.3 does not distinguish utf8_should > > strings from utf8val_should strings - the "SHOULD" requirement > > to return an error if the string is not UTF-8 conflicts with > > the statement that utf8_should strings are not checked for > > UTF-8 validity - I think that error return requirement applies > > only to utf8val_should strings. > > Done. > > > 12.4.2 suggests that NFSv4 supports hex-encoded text forms > > of IPv4 addresses. Is that correct and/or needed? The > > usual textual form of IPv4 addresses is decimal encoding. > > Fixed. I wrote that I was too tired or perhaps it was after somebody sent something that caused my > head to explode :-) > > > 12.7.1.2: > > > > However, in any of the following situations, file names have to be > > treated as strings of characters and servers MUST return > > NFS4ERR_INVAL when file names that are not in UTF-8 format: > > > > Would "characters" -> "Unicode characters" be consistent > > with what was intended? If so, that change would make the > > text clearer. If not, I'm confused. > > Fixed. > > > 12.7.1.3 uses lower-case "must" and "should". Is that > > deliberate vs. upper-case. In general, double-check all > > uses of lower-case "must" and "should" to make sure that > > they are intended. > > It's a result of being a child of the 60's/70's. It's the "Thou shalt not 'shalt' and 'shalt not'" > approach. > > Fixed plus a few more "musts". > > > 12.7.1.5.2 would be improved by examples of what clients > > should and/or should not do in order to improve interoperability > > with servers that do not handle normalization in the fashion that > > the client expects. > > I've added a lot material. This goes at least a considerable way to address the issue. Tell me if > you see anything missing. > > > 12.7.2 - If link text is utf8_should, servers aren't > > supposed to check for valid UTF8. Based on 12.2.3, it > > looks like link text is utf8val_should, for which this > > check is appropriate. > > Fixed. > > > Nits: > > - Saw one instance of NFKC garbled to NKFC > > Fixed. > > > -----Original Message----- > From: Black, David > Sent: Wednesday, September 22, 2010 11:49 AM > To: Noveck, David; nfsv4@ietf.org > Subject: RE: Going forward on I18N in RFC3530 bis > > Dave, > > I reviewed the i18n material in -04 (Section 12). it looks fairly good, but the details are now > beyond my level of i18n expertise. I suggest that we get a real i18n expert to review this section > in the next version of the draft - I have a couple of candidate reviewers in mind. Many thanks for > the extensive effort that has clearly gone into this. > > I have one basic disagreement that should not come as a surprise ;-) ... > > My current view of A-labels vs. U-labels is that I'm going to (try to) insist on no A-labels, > *unless* there is important "running code" that depends on A-labels on the wire and that needs to be > grandfathered. A-labels exist because the DNS infrastructure is fundamentally ASCII. Since NFSv4 > is UTF-8 capable, A-labels on the wire are just plain wrong in principle, IMHO. FWIW, I don't care > whether it's possible to get the current A-label approach blessed by the IETF's i18n gurus. This > turns up in 12.6 as "MAY be in the form of an A-label". My preference is that A-labels on the wire > be "MUST NOT" - if there's important "running code", I might settle for "SHOULD NOT" with an > explanation of the "running code" that requires ignoring that "SHOULD NOT" in order to keep that > "running code" happy. > > Comments: > > For strings that SHOULD be UTF-8, but aren't, what's the protocol requirement? I think the > requirement is 8-bit clean (e.g., MUST NOT force the most significant octet to zero, unless the > string MUST be ASCII). That should be stated as part of the string classification. > > The redefinition of "SHOULD" in 12.2.2 is an invitation to confusion. I suggest: > SHOULD -> USHOULD, VSHOULD -> UVSHOULD & VMUST -> UVMUST > plus use of capitalized SHOULD/MUST in defining these terms. > > The first paragraph of 12.3 does not distinguish utf8_should strings from utf8val_should strings - > the "SHOULD" requirement to return an error if the string is not UTF-8 conflicts with the statement > that utf8_should strings are not checked for UTF-8 validity - I think that error return requirement > applies only to utf8val_should strings. > > 12.4.2 suggests that NFSv4 supports hex-encoded text forms of IPv4 addresses. Is that correct > and/or needed? The usual textual form of IPv4 addresses is decimal encoding. > > 12.7.1.2: > > However, in any of the following situations, file names have to be > treated as strings of characters and servers MUST return > NFS4ERR_INVAL when file names that are not in UTF-8 format: > > Would "characters" -> "Unicode characters" be consistent with what was intended? If so, that change > would make the text clearer. If not, I'm confused. > > 12.7.1.3 uses lower-case "must" and "should". Is that deliberate vs. upper-case. In general, > double-check all uses of lower-case "must" and "should" to make sure that they are intended. > > 12.7.1.5.2 would be improved by examples of what clients should and/or should not do in order to > improve interoperability with servers that do not handle normalization in the fashion that the > client expects. > > 12.7.2 - If link text is utf8_should, servers aren't supposed to check for valid UTF8. Based on > 12.2.3, it looks like link text is utf8val_should, for which this check is appropriate. > > Nits: > - Saw one instance of NFKC garbled to NKFC. > > Thanks, > --David > > > -----Original Message----- > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of david.noveck@emc.com > > Sent: Thursday, September 09, 2010 6:36 PM > > To: nfsv4@ietf.org > > Subject: [nfsv4] Going forward on I18N in RFC3530 bis > > > > David Black (the man behind NFSv4.2 :-) has asked me to summarize the > > situation with regard to I18N in RFC3530 and the current plan about what > > to do about it going forward in handling it in RFC3530bis. > > > > ---- First some pointers: > > > > The description of I18N is in chapter 11 of RFC3530, > > page 122 of http://www.ietf.org/rfc/rfc3530.txt > > > > The current draft replacement is in the latest draft > > of RFC3530bis, that is, in chapter 12 (pages 160-179) > > http://tools.ietf.org/id/draft-ietf-nfsv4-rfc3530bis-04.txt. > > This is pretty much a rewrite of chapter 12 of the > > previous draft-03, so looking at the diff is not much help. > > > > ---- Background: > > > > The basic problem with chapter 11 of RFC3530 is that it has almost no > > relation to what has been actually implemented. The current form of > > chapter 11 reflects political pressures at the time of RFC approval > > within the IETF to conform to the stringprep paradigm, and so it is > > organized around that. But implementations started without it, and > > never were adjusted to conform to that model, for good reasons, > > discussed below. > > > > In the meantime, problems within stringprep have become manifest. Even > > more important is the fact that for the most important string type > > subject to I18N issues, filename components, the stringprep-style > > approach in its totality does not match the needs of NFSv4. The issue > > is that you can think of the server as a single thing (including the > > server code and the file system you are talking to) in which case it > > makes sense to define, in exquisite detail, character mapping, and > > repertoire rules, so as to provide interoperability down to the most > > recondite character-handling details. > > > > However in fact, server implementations and file-systems are separate > > things and one cannot enforce detailed character handling rules on the > > file-systems and if one does one limits unacceptably the file systems > > that one can use. And if one does that in front of the file systems, we > > interfere with another major goal of NFSv4, proper interoperability with > > other network file systems and with local use of those file systems. If > > the protocol imposes rules that are not imposed locally, there may be > > valid files you can't get at over NFSv4. > > > > As a result, NFSv4, at least in this regard is better described as a > > protocol to pass names from the client to the remote server file system, > > making as few modifications as we can. In fact, this is what people > > actually implemented and it differs in a major way from what is > > described in chapter 11 of RFC3530. Thus the need to describe the > > reality that clients and servers implement in RFC3530bis. > > > > ---- Changes: > > > > This is a brief summary of the changes I introduced. It is a high-level > > summary and I may have forgotten a few things. > > > > Re-organize the string types. In RFC3530, these had been organized > > about stringprep profiles, basically around whether strings > > case-sensitive or not, or partially case-sensitive. The resulted in > > very strange conclusions such as applying UTF-8 checking and checking > > for characters outside Unicode 3.1 being applied to tags. > > > > Tags are treated opaquely with no UTF-8 checking, Unicode repertoire > > checking, normalization-related checking. > > > > There is more clarity about various sorts of strings. In particular, > > string which, for various reasons, do not require internationalization > > handling are explicitly called out. > > > > Adopting IDNA handling for domains and servers and simply referencing > > those docs for what is OK. There is the issue of U-labels vs. A-labels. > > We allow A-labels or UTF-8 strings whether canonicalized or not. There > > has been some discussion about changing that to U-labels only but that > > will only be done if there is working group consensus. > > > > Extensive discussion of the fact that our ability to legislate character > > handling for file systems is limited. > > > > Change UTF-8 requirement for filenames from MUST to SHOULD to match > > NFSv4.1. > > > > Get rid of requirement that everything be within Unicode 3.1. Get rid > > of requirements that large sets of characters within Unicode 3.1 be > > rejected for various reasons. > > > > Get rid of requirement to map various characters. SHOULD NOT do > > mappings which are problematic for stringprep (German eszett mapped to > > 'ss', zero-length join and non-join characters mapped to nothing causing > > issues Farsi) but MAY use other mappings in that (and by implication no > > mappings outside it). > > > > New treatment of normalization. Allow normalization-sensitive servers > > (but warn of difficulties without saying SHOULD NOT), allow > > servers/file-systems to choose to normalize NFC or NFD (but not reject > > filename in "wrong" normalization as was implied by RFC3530), and also > > mention/allow for the first time > > normalization/insensitive/normalization-preserving handling of names > > (best choice but no SHOULD because this is big change to the file system > > ad thus nor really spec's business). > > > > Discussion of how symlink text should be processed and where the > > handling differs from file component names. > > > > New treatment of user/group names. Each domain establishes its own list > > of these so there are no repertoire rules. There is a discussion about > > why you should match these based on canonical equivalence, but there is > > no n/i-n/p option for these because it would require fs to save 2 (or > > sometimes many more) variants of that same user and group in the user > > and group attributes and in ACLs. Nobody is going to do that nor > > should they. > > > > I'm sure I've missed some things. If you notice them, let me know, as > > it would be good to maintain somewhere a summary of what was done in > > this chapter. > > > > ---- Discussion on call: > > > > The big issue discussed was whether we should wait for precis to finish > > this up. > > > > David Black came to the conclusion that since precis work was not > > proceeding very fast, we should go ahead based on the current draft plus > > working group comments, with the potential of an additional update > > (RFC3530tris?) when the precis work is finished and can be applied to > > NFSv4. > > > > There were arguments about his use of the word "patch" and the probable > > relative proportions of updates in RFC3530bis and its successor but no > > fundamental disagreement on the basic approach. > > > > I will be working on an update to chapter 12 that will go into a new > > draft of rfc3530-bis, targeted at the Beijing deadline. May be able to > > get it out earlier but I hope people will have chance to look at the > > current draft and give me their comments. > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 >
- [nfsv4] Going forward on I18N in RFC3530 bis david.noveck
- Re: [nfsv4] Going forward on I18N in RFC3530 bis Trond Myklebust
- Re: [nfsv4] Going forward on I18N in RFC3530 bis Thomas Haynes
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.noveck
- Re: [nfsv4] Going forward on I18N in RFC3530 bis Spencer Shepler
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.black
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.noveck
- Re: [nfsv4] Going forward on I18N in RFC3530 bis J. Bruce Fields
- Re: [nfsv4] Going forward on I18N in RFC3530 bis Trond Myklebust
- Re: [nfsv4] Going forward on I18N in RFC3530 bis J. Bruce Fields
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.black
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.black
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.noveck
- Re: [nfsv4] Going forward on I18N in RFC3530 bis Robert Thurlow
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.noveck
- Re: [nfsv4] Going forward on I18N in RFC3530 bis david.black