Re: [nfsv4] NFSv4 I18n string type summary

<david.black@emc.com> Sun, 01 August 2010 20:45 UTC

Return-Path: <david.black@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5B18E3A689F for <nfsv4@core3.amsl.com>; Sun, 1 Aug 2010 13:45:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.187
X-Spam-Level:
X-Spam-Status: No, score=-5.187 tagged_above=-999 required=5 tests=[AWL=-1.188, BAYES_50=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kMoeFrHhj6Vd for <nfsv4@core3.amsl.com>; Sun, 1 Aug 2010 13:45:54 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 55F9E3A685A for <nfsv4@ietf.org>; Sun, 1 Aug 2010 13:45:53 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o71KkK7Q026614 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Sun, 1 Aug 2010 16:46:20 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Sun, 1 Aug 2010 16:46:17 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o71KkG81013564 for <nfsv4@ietf.org>; Sun, 1 Aug 2010 16:46:16 -0400
Received: from CORPUSMX80B.corp.emc.com ([10.254.89.203]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 1 Aug 2010 16:46:16 -0400
x-mimeole: Produced By Microsoft Exchange V6.5
x-cr-hashedpuzzle: TdU= AC9M Anek AuOV CctB DWlf DeSC D847 EC3C ELxw E0ko FCIV G7pZ JEp4 Jl1n Kbxk; 1; bgBmAHMAdgA0AEAAaQBlAHQAZgAuAG8AcgBnAA==; Sosha1_v1; 7; {D21F713C-58CE-40A8-A773-B56B9026228A}; ZABhAHYAaQBkAC4AYgBsAGEAYwBrAEAAZQBtAGMALgBjAG8AbQA=; Sun, 01 Aug 2010 20:46:10 GMT; UgBFADoAIABOAEYAUwB2ADQAIABJADEAOABuACAAcwB0AHIAaQBuAGcAIAB0AHkAcABlACAAcwB1AG0AbQBhAHIAeQA=
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
x-cr-puzzleid: {D21F713C-58CE-40A8-A773-B56B9026228A}
Content-class: urn:content-classes:message
Date: Sun, 01 Aug 2010 16:46:09 -0400
Message-ID: <C2D311A6F086424F99E385949ECFEBCB03453E70@CORPUSMX80B.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8002053C28@CORPUSMX50A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: NFSv4 I18n string type summary
Thread-Index: AcsvYyyW7ZyAk2QIRvaZGoBVrUMKdACVP9+Q
References: <BF3BB6D12298F54B89C8DCC1E4073D8002053C28@CORPUSMX50A.corp.emc.com>
From: david.black@emc.com
To: Noveck_David@emc.com, nfsv4@ietf.org
X-OriginalArrivalTime: 01 Aug 2010 20:46:16.0441 (UTC) FILETIME=[963C6690:01CB31BA]
X-EMM-MHVC: 1
Subject: Re: [nfsv4] NFSv4 I18n string type summary
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Aug 2010 20:45:56 -0000

Dave,

Taking a quick pass through this:
- filenames (incl. component4 and linktext4) are "special" and we are
going to have to
	do something NFSv4-specific in order to cope with physical
filesystems that have
	different normalization rules.  I would still watch what precis
does (e.g., for
	what characters to prohibit - e.g., control characters are
usually prohibited in
	file names, although we may be able to punt that to the
underlying physical FS).
- domain names (both as server names and as part of a principal) are
handled by pointing
	at IDNAbis, and figuring out representation.  I still dislike
A-labels on the wire
	in a protocol that is fully UTF8-capable (A-label exist because
DNS isn't). 
- user names.  I would watch what precis does here, as they're likely to
do something
	useful for user names in general that we can point to.  I would
treat group names
	as a class of user names for i18n purposes.

Those three items may cover us, as:
- A principal is constructed from a user/group name plus a server name. 
- If tags are never compared in the protocol, they can be described as
"opaque".
- Stating that MIME types and svraddr4 are always ASCII removes i18n
issues.

I still want to take another run at A-labels vs. U-labels, because the
current direction allows A-labels to show up in config files and
associated management software - IMHO, that sort of abuse of
administrators is just plain wrong when the protocol and related
technology support Unicode.

Thanks,
--David


> -----Original Message-----
> From: Noveck, David
> Sent: Thursday, July 29, 2010 5:16 PM
> To: Black, David; 'nfsv4@ietf.org'
> Subject: NFSv4 I18n string type summary
> 
> David Black asked me to provide information on all the i18n-related
types in NFSv4.  It turns
> out that there is a section devoted to that very purpose in
draft-ietf-nfsv4-rfc3530bis-04.txt.
> It is section 12.2 and starts on page 164.  The draft only expires on
January 8, 2011 so is
> available until that time or until a revision is published (I'm
guessing something like three
> months from now).
> 
> I've reproduced the text (minus heading crap) below and will try to
give people a brief summary
> of the summary here.  It will not fit in 144 characters, however.
> 
> There are a number of types of strings which are combinations of other
type such as users and
> groups which are the form principal@domain.  There is some preliminary
processing defined for
> such types that defines how to break these up into other types.  These
mixture type will not
> be further discussed here.  See table 6 in section 12.2 below.
> 
> Aside from these there are three classes of string types to deal with.
> 
> First are those that there is some existing internet specification on
which handling can be based.
> Life is good.
> 
>
+-----------------+----------------------------------------------------+
> | Type            | Explanation and Discussion
|
>
+-----------------+----------------------------------------------------+
> | svrname4        | Server name as returned by server.  Not sent by
|
> |                 | client, except in VERIFY/NVERIFY.  Handled by
|
> |                 | IDNAbis rules with A-labels and UTF-8 strings
|
> |                 | all allowed.
|
> | prinsfx4        | Suffix part of principal, in the form of a domain
|
> |                 | name.  Handled by IDNAbis rules with A-labels and
|
> |                 | UTF-8 strings all allowed.
|
>
+-----------------+----------------------------------------------------+
> 
> Next are the more troublesome and ideologically contentious types
where the issue is basically that we
> are communicating with entities that exist and that we, in general,
cannot change.  In the case of
> component names and link text strings they are the file system, while
in the case of the principal
> names, it is whatever mechanism the implementations have agreed upon
to map between names and numeric
> ids.  We are communicating with the file system which has its own
rules for name processing.  This
> makes a directive approach whereby the details of all string
processing are to be specified a non-
> starter.  I perceive some reluctance to accept this fact and proceed
in that light.
> 
>
+-----------------+----------------------------------------------------+
> | Type            | Explanation and Discussion
|
>
+-----------------+----------------------------------------------------+
> | component4      | Should be utf8 but clients may need to access file
|
> |                 | systems with a different name structure, possibly
|
> |                 | including non-utf8 names.
|
> | linktext4       | Should be utf8 since text may include name
|
> |                 | components. Because of the need to access existing
|
> |                 | existing file systems, this check may be
inhibited.|
> | prinpfx4        | Must match one of a list of valid users or groups
|
> |                 | for that particular domain.
|
>
+-----------------+----------------------------------------------------+
> 
> Finally, the following types, even though nominally UTF-8, have for
various no i18n-related processing
> defined in the protocol.
> 
>
+-----------------+----------------------------------------------------+
> | Type            | Explanation and Discussion
|
>
+-----------------+----------------------------------------------------+
> | comptag4        | Tags are debugging strings that are in the
protocol|
> |                 | for the purpose of helping protocol debugging.
|
> |                 | There is no specification as to what should be in
|
> |                 | They are defined as UTF-8 but are not validated
for|
> |                 | UTF-8 correctness by the receiver.  What would you
|
> |                 | do if it is wrong?
|
>
+-----------------+----------------------------------------------------+
> | fattr4_mimetype | All mime types are validated by comparing them
|
> |                 | against a fixed list, all of whose members consist
|
> |                 | only of ascii characters, making other
i18n-related|
> |                 | processing unnecessary.
|
>
+-----------------+----------------------------------------------------+
> | svraddr4        | Denotes something already validated as an IP
|
> |                 | address, whether IPv4 or IPv6, which means it has
|
> |                 | to be all ascii.
|
>
+-----------------+----------------------------------------------------+
> 
> 
> 
>
------------------------------------------------------------------------
> 
> 12.2.  String Type Overview
> 
> 12.2.1.  Overall String Class Divisions
> 
>    NFS version 4 has to deal with with a large set of diffreent types
of
>    strings and because of the different role of each,
>    internationalization issues will be different for each:
> 
>    o  For some types of strings, the fundamental internationalization-
>       related decisions are the province of the file system or the
>       security-handling functions of the server and the protocol's job
>       is to establish the rules under which file systems and servers
are
>       allowed to exercise this freedom, to avoid adding to confusion.
> 
>    o  In other cases, the fundamental internationalization issues are
>       the responsibility of other IETF groups and our jobis simply to
>       reference those and perhaps make a few choices as to how they
are
>       to be used (e.g.  U-labels vs. A-labels).
> 
>    o  There are also cases in which a string has a small amount of NFS
>       version 4 processing which results in one or more strings being
>       referred to one of the other categories.
> 
>    We will divide strings to be dealt with into the following classes:
> 
>    MIX  indicating that there is small amount of preparatory
processing
>       that either picks an appropriate modes of internationalization
>       handling or divides the string into a set of (two) strings with
a
>       different mode internationalization handling for each.  The
>       details are discussed in the section "Types with Pre-processing
to
>       Resolve Mixture Issues".
> 
>    NIP  indicating that, for various reasons, there is no need for
>       internationalization-specific processing to be performed.  The
>       specifics of the various string types handled in this way are
>       described in the section "String Types without
>       Internationalization Processing".
> 
>    INET  indicating that the string needs to be processed in a fashion
>       is goverened by non-NFS-specific internet specifications.  The
>       details are discussed in the section "Types with Processing
>       Defined by Other Internet Areas".
> 
>    NFS  indicating that the string needs to be processed in a fashion
is
>       goverened by NFSv4-specific consideration.  The primary focus is
>       on enabling flexibility for the various file systems to be
>       accessed and is described in the section "String Types with NFS-
>       specific Processing".
> 
> 12.2.2.  Divisions by Typedef Parent types
> 
>    There are a number of different string types within NFS version 4
and
>    internationalization handling will be different for different types
>    of strings.  Each the types will be in one of four groups based on
>    the parent type that specifies the nature of its relationship to
utf8
>    and ascii.
> 
>    utf8_should/SHOULD:  indicating that strings of this type should be
>       UTF-8 but clients and servers will not check for valid UTF-8
>       encoding.
> 
>    utf8val_should/VSHOULD:  indicating that strings of this type
should
>       be and generally will be in the form of the UTF-8 encoding of
>       Unicode.  Strings in most cases will be checked by the server
for
>       valid UTF-8 but for certain file systems, such checking may be
>       inhibited.
> 
>    utf8val_must/VMUST:  indicating that strings of this type must be
in
>       the form of the UTF-8 encoding of Unicode.  Strings will be
>       checked by the server for valid UTF-8 and the server should
ensure
>       that when sent to the client, they are valid UTF-8.
> 
>    ascii_must/ASCII:  indicating that strings of this type must be
pure
>       ASCII, and thus automatically UTF-8.  The processing of these
>       string must ensure that they are only have ASCII characters but
>       this need not be a separate step if any normally required check
>       for validity inherently assures that only ASCII characters are
>       present.
> 
> 12.2.3.  Individual Types and Their Handling
> 
>    The first table outlines the handling for the primary string types,
>    i.e. those not derived as a prefix or a suffix from a mixture type.
> 
>
+-----------------+---------+-------+-------------------------------+
>    | Type            | Parent  | Class | Explanation
|
>
+-----------------+---------+-------+-------------------------------+
>    | comptag4        | SHOULD  | NIP   | Should be utf8 but no
|
>    |                 |         |       | validation by server or
|
>    |                 |         |       | client is to be done.
|
>    | component4      | VSHOULD | NFS   | Should be utf8 but clients
|
>    |                 |         |       | may need to access file
|
>    |                 |         |       | systems with a different name
|
>    |                 |         |       | structure. files systems with
|
>    |                 |         |       | non-utf8 names.
|
>    | linktext4       | VSHOULD | NFS   | Should be utf8 since text may
|
>    |                 |         |       | include name components.
|
>    |                 |         |       | Because of the need to access
|
>    |                 |         |       | existing file systems, this
|
>    |                 |         |       | check may be inhibited.
|
>    | fattr4_mimetype | ASCII   | NIP   | All mime types are ascii so
|
>    |                 |         |       | no specific utf8 processing
|
>    |                 |         |       | is required, given that you
|
>    |                 |         |       | are comparing to that list.
|
>
+-----------------+---------+-------+-------------------------------+
> 
>                                   Table 5
> 
>    There are a number of string types that are compound in that they
may
>    consist of multiple conjoined strings with different utf8-related
>    processing for each.
> 
>
+---------+--------+-------+----------------------------------------+
>    | Type    | Parent | Class | Explanation
|
>
+---------+--------+-------+----------------------------------------+
>    | prin4   | VMUST  | MIX   | Consists of two parts separated by an
|
>    |         |        |       | at-sign, a prinpfx4 and a prinsfx4.
|
>    |         |        |       | These are described in the next table.
|
>    | server4 | VMUST  | MIX   | Is either an IP address (serveraddr4)
|
>    |         |        |       | which has to be pure ascii or a server
|
>    |         |        |       | name svrname4, which is described
|
>    |         |        |       | immediately below.
|
>
+---------+--------+-------+----------------------------------------+
> 
>                                   Table 6
> 
>    The last table describes the components of the compound types
>    described above.
> 
>
+----------+-------+------+-----------------------------------------+
>    | Type     | Class | Def  | Explanation
|
>
+----------+-------+------+-----------------------------------------+
>    | svraddr4 | ASCII | NIP  | Server as IP address, whether IPv4 or
|
>    |          |       |      | IPv6,
|
>    | svrname4 | VMUST | INET | Server name as returned by server.  Not
|
>    |          |       |      | sent by client, except in
|
>    |          |       |      | VERIFY/NVERIFY.
|
>    | prinsfx4 | VMUST | INET | Suffix part of principal, in the form
|
>    |          |       |      | of a domain name.
|
>    | prinpfx4 | VMUST | NFS  | Must match one of a list of valid users
|
>    |          |       |      | or groups for that particular domain.
|
>
+----------+-------+------+-----------------------------------------+
> 
>                                   Table 7