Re: [nfsv4] NFSv4 I18n string type summary
<david.black@emc.com> Sun, 01 August 2010 20:45 UTC
Return-Path: <david.black@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5B18E3A689F for <nfsv4@core3.amsl.com>; Sun, 1 Aug 2010 13:45:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.187
X-Spam-Level:
X-Spam-Status: No, score=-5.187 tagged_above=-999 required=5 tests=[AWL=-1.188, BAYES_50=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kMoeFrHhj6Vd for <nfsv4@core3.amsl.com>; Sun, 1 Aug 2010 13:45:54 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 55F9E3A685A for <nfsv4@ietf.org>; Sun, 1 Aug 2010 13:45:53 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o71KkK7Q026614 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Sun, 1 Aug 2010 16:46:20 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Sun, 1 Aug 2010 16:46:17 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o71KkG81013564 for <nfsv4@ietf.org>; Sun, 1 Aug 2010 16:46:16 -0400
Received: from CORPUSMX80B.corp.emc.com ([10.254.89.203]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 1 Aug 2010 16:46:16 -0400
x-mimeole: Produced By Microsoft Exchange V6.5
x-cr-hashedpuzzle: TdU= AC9M Anek AuOV CctB DWlf DeSC D847 EC3C ELxw E0ko FCIV G7pZ JEp4 Jl1n Kbxk; 1; bgBmAHMAdgA0AEAAaQBlAHQAZgAuAG8AcgBnAA==; Sosha1_v1; 7; {D21F713C-58CE-40A8-A773-B56B9026228A}; ZABhAHYAaQBkAC4AYgBsAGEAYwBrAEAAZQBtAGMALgBjAG8AbQA=; Sun, 01 Aug 2010 20:46:10 GMT; UgBFADoAIABOAEYAUwB2ADQAIABJADEAOABuACAAcwB0AHIAaQBuAGcAIAB0AHkAcABlACAAcwB1AG0AbQBhAHIAeQA=
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
x-cr-puzzleid: {D21F713C-58CE-40A8-A773-B56B9026228A}
Content-class: urn:content-classes:message
Date: Sun, 01 Aug 2010 16:46:09 -0400
Message-ID: <C2D311A6F086424F99E385949ECFEBCB03453E70@CORPUSMX80B.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8002053C28@CORPUSMX50A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: NFSv4 I18n string type summary
Thread-Index: AcsvYyyW7ZyAk2QIRvaZGoBVrUMKdACVP9+Q
References: <BF3BB6D12298F54B89C8DCC1E4073D8002053C28@CORPUSMX50A.corp.emc.com>
From: david.black@emc.com
To: Noveck_David@emc.com, nfsv4@ietf.org
X-OriginalArrivalTime: 01 Aug 2010 20:46:16.0441 (UTC) FILETIME=[963C6690:01CB31BA]
X-EMM-MHVC: 1
Subject: Re: [nfsv4] NFSv4 I18n string type summary
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Aug 2010 20:45:56 -0000
Dave, Taking a quick pass through this: - filenames (incl. component4 and linktext4) are "special" and we are going to have to do something NFSv4-specific in order to cope with physical filesystems that have different normalization rules. I would still watch what precis does (e.g., for what characters to prohibit - e.g., control characters are usually prohibited in file names, although we may be able to punt that to the underlying physical FS). - domain names (both as server names and as part of a principal) are handled by pointing at IDNAbis, and figuring out representation. I still dislike A-labels on the wire in a protocol that is fully UTF8-capable (A-label exist because DNS isn't). - user names. I would watch what precis does here, as they're likely to do something useful for user names in general that we can point to. I would treat group names as a class of user names for i18n purposes. Those three items may cover us, as: - A principal is constructed from a user/group name plus a server name. - If tags are never compared in the protocol, they can be described as "opaque". - Stating that MIME types and svraddr4 are always ASCII removes i18n issues. I still want to take another run at A-labels vs. U-labels, because the current direction allows A-labels to show up in config files and associated management software - IMHO, that sort of abuse of administrators is just plain wrong when the protocol and related technology support Unicode. Thanks, --David > -----Original Message----- > From: Noveck, David > Sent: Thursday, July 29, 2010 5:16 PM > To: Black, David; 'nfsv4@ietf.org' > Subject: NFSv4 I18n string type summary > > David Black asked me to provide information on all the i18n-related types in NFSv4. It turns > out that there is a section devoted to that very purpose in draft-ietf-nfsv4-rfc3530bis-04.txt. > It is section 12.2 and starts on page 164. The draft only expires on January 8, 2011 so is > available until that time or until a revision is published (I'm guessing something like three > months from now). > > I've reproduced the text (minus heading crap) below and will try to give people a brief summary > of the summary here. It will not fit in 144 characters, however. > > There are a number of types of strings which are combinations of other type such as users and > groups which are the form principal@domain. There is some preliminary processing defined for > such types that defines how to break these up into other types. These mixture type will not > be further discussed here. See table 6 in section 12.2 below. > > Aside from these there are three classes of string types to deal with. > > First are those that there is some existing internet specification on which handling can be based. > Life is good. > > +-----------------+----------------------------------------------------+ > | Type | Explanation and Discussion | > +-----------------+----------------------------------------------------+ > | svrname4 | Server name as returned by server. Not sent by | > | | client, except in VERIFY/NVERIFY. Handled by | > | | IDNAbis rules with A-labels and UTF-8 strings | > | | all allowed. | > | prinsfx4 | Suffix part of principal, in the form of a domain | > | | name. Handled by IDNAbis rules with A-labels and | > | | UTF-8 strings all allowed. | > +-----------------+----------------------------------------------------+ > > Next are the more troublesome and ideologically contentious types where the issue is basically that we > are communicating with entities that exist and that we, in general, cannot change. In the case of > component names and link text strings they are the file system, while in the case of the principal > names, it is whatever mechanism the implementations have agreed upon to map between names and numeric > ids. We are communicating with the file system which has its own rules for name processing. This > makes a directive approach whereby the details of all string processing are to be specified a non- > starter. I perceive some reluctance to accept this fact and proceed in that light. > > +-----------------+----------------------------------------------------+ > | Type | Explanation and Discussion | > +-----------------+----------------------------------------------------+ > | component4 | Should be utf8 but clients may need to access file | > | | systems with a different name structure, possibly | > | | including non-utf8 names. | > | linktext4 | Should be utf8 since text may include name | > | | components. Because of the need to access existing | > | | existing file systems, this check may be inhibited.| > | prinpfx4 | Must match one of a list of valid users or groups | > | | for that particular domain. | > +-----------------+----------------------------------------------------+ > > Finally, the following types, even though nominally UTF-8, have for various no i18n-related processing > defined in the protocol. > > +-----------------+----------------------------------------------------+ > | Type | Explanation and Discussion | > +-----------------+----------------------------------------------------+ > | comptag4 | Tags are debugging strings that are in the protocol| > | | for the purpose of helping protocol debugging. | > | | There is no specification as to what should be in | > | | They are defined as UTF-8 but are not validated for| > | | UTF-8 correctness by the receiver. What would you | > | | do if it is wrong? | > +-----------------+----------------------------------------------------+ > | fattr4_mimetype | All mime types are validated by comparing them | > | | against a fixed list, all of whose members consist | > | | only of ascii characters, making other i18n-related| > | | processing unnecessary. | > +-----------------+----------------------------------------------------+ > | svraddr4 | Denotes something already validated as an IP | > | | address, whether IPv4 or IPv6, which means it has | > | | to be all ascii. | > +-----------------+----------------------------------------------------+ > > > > ------------------------------------------------------------------------ > > 12.2. String Type Overview > > 12.2.1. Overall String Class Divisions > > NFS version 4 has to deal with with a large set of diffreent types of > strings and because of the different role of each, > internationalization issues will be different for each: > > o For some types of strings, the fundamental internationalization- > related decisions are the province of the file system or the > security-handling functions of the server and the protocol's job > is to establish the rules under which file systems and servers are > allowed to exercise this freedom, to avoid adding to confusion. > > o In other cases, the fundamental internationalization issues are > the responsibility of other IETF groups and our jobis simply to > reference those and perhaps make a few choices as to how they are > to be used (e.g. U-labels vs. A-labels). > > o There are also cases in which a string has a small amount of NFS > version 4 processing which results in one or more strings being > referred to one of the other categories. > > We will divide strings to be dealt with into the following classes: > > MIX indicating that there is small amount of preparatory processing > that either picks an appropriate modes of internationalization > handling or divides the string into a set of (two) strings with a > different mode internationalization handling for each. The > details are discussed in the section "Types with Pre-processing to > Resolve Mixture Issues". > > NIP indicating that, for various reasons, there is no need for > internationalization-specific processing to be performed. The > specifics of the various string types handled in this way are > described in the section "String Types without > Internationalization Processing". > > INET indicating that the string needs to be processed in a fashion > is goverened by non-NFS-specific internet specifications. The > details are discussed in the section "Types with Processing > Defined by Other Internet Areas". > > NFS indicating that the string needs to be processed in a fashion is > goverened by NFSv4-specific consideration. The primary focus is > on enabling flexibility for the various file systems to be > accessed and is described in the section "String Types with NFS- > specific Processing". > > 12.2.2. Divisions by Typedef Parent types > > There are a number of different string types within NFS version 4 and > internationalization handling will be different for different types > of strings. Each the types will be in one of four groups based on > the parent type that specifies the nature of its relationship to utf8 > and ascii. > > utf8_should/SHOULD: indicating that strings of this type should be > UTF-8 but clients and servers will not check for valid UTF-8 > encoding. > > utf8val_should/VSHOULD: indicating that strings of this type should > be and generally will be in the form of the UTF-8 encoding of > Unicode. Strings in most cases will be checked by the server for > valid UTF-8 but for certain file systems, such checking may be > inhibited. > > utf8val_must/VMUST: indicating that strings of this type must be in > the form of the UTF-8 encoding of Unicode. Strings will be > checked by the server for valid UTF-8 and the server should ensure > that when sent to the client, they are valid UTF-8. > > ascii_must/ASCII: indicating that strings of this type must be pure > ASCII, and thus automatically UTF-8. The processing of these > string must ensure that they are only have ASCII characters but > this need not be a separate step if any normally required check > for validity inherently assures that only ASCII characters are > present. > > 12.2.3. Individual Types and Their Handling > > The first table outlines the handling for the primary string types, > i.e. those not derived as a prefix or a suffix from a mixture type. > > +-----------------+---------+-------+-------------------------------+ > | Type | Parent | Class | Explanation | > +-----------------+---------+-------+-------------------------------+ > | comptag4 | SHOULD | NIP | Should be utf8 but no | > | | | | validation by server or | > | | | | client is to be done. | > | component4 | VSHOULD | NFS | Should be utf8 but clients | > | | | | may need to access file | > | | | | systems with a different name | > | | | | structure. files systems with | > | | | | non-utf8 names. | > | linktext4 | VSHOULD | NFS | Should be utf8 since text may | > | | | | include name components. | > | | | | Because of the need to access | > | | | | existing file systems, this | > | | | | check may be inhibited. | > | fattr4_mimetype | ASCII | NIP | All mime types are ascii so | > | | | | no specific utf8 processing | > | | | | is required, given that you | > | | | | are comparing to that list. | > +-----------------+---------+-------+-------------------------------+ > > Table 5 > > There are a number of string types that are compound in that they may > consist of multiple conjoined strings with different utf8-related > processing for each. > > +---------+--------+-------+----------------------------------------+ > | Type | Parent | Class | Explanation | > +---------+--------+-------+----------------------------------------+ > | prin4 | VMUST | MIX | Consists of two parts separated by an | > | | | | at-sign, a prinpfx4 and a prinsfx4. | > | | | | These are described in the next table. | > | server4 | VMUST | MIX | Is either an IP address (serveraddr4) | > | | | | which has to be pure ascii or a server | > | | | | name svrname4, which is described | > | | | | immediately below. | > +---------+--------+-------+----------------------------------------+ > > Table 6 > > The last table describes the components of the compound types > described above. > > +----------+-------+------+-----------------------------------------+ > | Type | Class | Def | Explanation | > +----------+-------+------+-----------------------------------------+ > | svraddr4 | ASCII | NIP | Server as IP address, whether IPv4 or | > | | | | IPv6, | > | svrname4 | VMUST | INET | Server name as returned by server. Not | > | | | | sent by client, except in | > | | | | VERIFY/NVERIFY. | > | prinsfx4 | VMUST | INET | Suffix part of principal, in the form | > | | | | of a domain name. | > | prinpfx4 | VMUST | NFS | Must match one of a list of valid users | > | | | | or groups for that particular domain. | > +----------+-------+------+-----------------------------------------+ > > Table 7
- [nfsv4] NFSv4 I18n string type summary Noveck_David
- Re: [nfsv4] NFSv4 I18n string type summary david.black
- Re: [nfsv4] NFSv4 I18n string type summary Noveck_David