Re: [nfsv4] Chapter 12 for next rfc3530bis

"Everhart, Craig" <Craig.Everhart@netapp.com> Thu, 14 October 2010 15:56 UTC

Return-Path: <Craig.Everhart@netapp.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8520F3A6A67 for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 08:56:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.524
X-Spam-Level:
X-Spam-Status: No, score=-6.524 tagged_above=-999 required=5 tests=[AWL=0.075, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GQu+TVaHQq5H for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 08:56:06 -0700 (PDT)
Received: from mx2.netapp.com (mx2.netapp.com [216.240.18.37]) by core3.amsl.com (Postfix) with ESMTP id 121803A6AE6 for <nfsv4@ietf.org>; Thu, 14 Oct 2010 08:56:05 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.57,330,1283756400"; d="scan'208";a="467472844"
Received: from smtp1.corp.netapp.com ([10.57.156.124]) by mx2-out.netapp.com with ESMTP; 14 Oct 2010 08:57:25 -0700
Received: from sacrsexc1-prd.hq.netapp.com (sacrsexc1-prd.hq.netapp.com [10.99.115.27]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id o9EFvPr1011771; Thu, 14 Oct 2010 08:57:25 -0700 (PDT)
Received: from rtprsexc1-prd.hq.netapp.com ([10.100.161.114]) by sacrsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 14 Oct 2010 08:57:25 -0700
Received: from RTPMVEXC1-PRD.hq.netapp.com ([10.100.161.111]) by rtprsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 14 Oct 2010 11:57:23 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Thu, 14 Oct 2010 11:57:21 -0400
Message-ID: <E7372E66F45B51429E249BF556CEFFBC0EFDDC7E@RTPMVEXC1-PRD.hq.netapp.com>
In-Reply-To: <20101014144238.GI24146@fieldses.org>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
thread-topic: [nfsv4] Chapter 12 for next rfc3530bis
thread-index: ActrriJS86wvjns9SLqZ0N4gGhQZvAAB9GZA
References: <BF3BB6D12298F54B89C8DCC1E4073D80027DD8CC@CORPUSMX50A.corp.emc.com><20101012170759.GA22495@fieldses.org><20101012170936.GB22495@fieldses.org><BF3BB6D12298F54B89C8DCC1E4073D80027DDA78@CORPUSMX50A.corp.emc.com><20101012182853.GA25673@fieldses.org><BF3BB6D12298F54B89C8DCC1E4073D80027DDA91@CORPUSMX50A.corp.emc.com><20101012190552.GB25673@fieldses.org><BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com> <20101014144238.GI24146@fieldses.org>
From: "Everhart, Craig" <Craig.Everhart@netapp.com>
To: "J. Bruce Fields" <bfields@fieldses.org>, <david.noveck@emc.com>
X-OriginalArrivalTime: 14 Oct 2010 15:57:23.0448 (UTC) FILETIME=[7D890F80:01CB6BB8]
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Oct 2010 15:56:07 -0000

> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> 
> On Thu, Oct 14, 2010 at 09:58:19AM -0400, david.noveck@emc.com wrote:
> > First of all, if it non-UTF8, it is non-ASCII.  That way we only
have
> > one predicate to deal with.  Assuming that in the past Cyrillic,
> Hebrew,
> > Nagari, etc. names have not been allowed (Why can't I have a
Linear-B
> > username :-) this only includes one-byte ascii and extended ascii
> > characters.  If there are extended-ascii characters, then you have
> > something that is not utf8, but the procedure to covert it to utf8
is
> > very simple.  You move around some of bits and come up with the two-
> byte
> > utf8 equivalent of the extended ascii characters.  The ascii
> characters
> > you copy over straight.
> >
> > Why can't I just say "Problem solved"?  Am I missing something?  I
> > understand that "Mission accomplished" is out of bounds.
> 
> So I think that's the:
> 
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> 
> option.
> 
> My understanding is that converting to utf8 is not so simple: you have
> to know which of the multiple possible extended ascii character sets
> you're converting from.  /etc/passwd doesn't necessarily come with
that
> information.

Is it optional somehow?  Is there a place where it might be found?

How would I go about creating a non-USASCII username?  Am I being
irresponsible if, having done so and not having provided the information
about how to interpret it, I connect it up to a distributed file system
that uses utf-8 file attributes?  Particularly if all the clients of
that file system don't know a-priori what special interpretation your
client is using for its attributes?

Is it possible to document /etc/passwd as being in UTF-8 format, and
whenever entries are found not to be valid UTF-8, NFS generates errors?
This is basically Bruce's choice 1.  Why wouldn't that be a reasonable
choice, other than inertia?

Is there a documentable way to turn this into Bruce's choice 2 without
heuristics?
 
> > I think there is a philosophical problem underlying this issue.  If
> you
> > treat what is returned by the lookup as an array of bytes, you have
> all
> > these problems with non-utf8-ness.  But if you treat it as an array
> of
> > characters,
> 
> I can't assume that whatever password database is using knows about
> encodings--so I really am stuck with just an array of bytes.
> 
> --b.
> 
> > then the mapping to UTF-8 is clear.  But, the idea that it
> > is a simplification to treat everything as an array of bytes is very
> > prevalent.  It is true that it seems to avoid all that I18N
nastiness,
> > with that being left for (unlucky) others  to deal with.  But that
> isn't
> > the philosophy of NFSv4 (for good reason), although for file name
> > components we have an accommodation to allow non-utf8 to work.  But
I
> > can't see a similar change for user and group names.  They must be
> UTF-8
> > and the only way to provide UTF-8 depends on understanding the
string
> as
> > an array of characters in some known code and converting it to
UTF-8.
> >
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Tuesday, October 12, 2010 3:06 PM
> > To: Noveck, David
> > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> >
> > On Tue, Oct 12, 2010 at 02:47:40PM -0400, david.noveck@emc.com
wrote:
> > > You as the server can enforce it by simply checking whether it is
> > UTF-8
> > > and returning INVAL if it isn't.
> >
> > Alas, no, if I get a getattr for the owner of a file, find the file
> is
> > owned by uid 8569, look up 8569 and it's somebody non-ascii and
> > non-utf8, my choices are:
> >
> > 	- Return an error, or claim not to support the owner attribute
> > 	  on this file.
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> > 	- Just return the username as it is.
> >
> > And if I pick 3 (and it's what the code would do now, and seems the
> > least of evils), and the client tries to send the same name back to
> me
> > in a setattr, I can't see returning INVAL.
> >
> > I could try to avoid getting into the whole situation in the first
> place
> > by forbidding those names.  That's not really up to me.
> >
> > In the end I guess this just means I'll be violating the spec in
some
> > (probably very unlikely?) cases.  Solaris too I bet, if it has
> > historically allowed such usernames.  It won't keep me up at night.
> >
> > --b.
> >
> > > The client is going to have some understanding of what it is being
> > > presented with and certainly can map an o-umlaut for example, to
> the
> > > UTF-8 version of that which is two bytes.  It is supposed to send
> > UTF-8.
> > >
> > >
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, October 12, 2010 2:29 PM
> > > To: Noveck, David
> > > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > >
> > > On Tue, Oct 12, 2010 at 02:12:55PM -0400, david.noveck@emc.com
> wrote:
> > > > On Tue, Oct 12, 2010 at 01:07:59PM -0400, bfields wrote:
> > > > > I just want to make sure I'm getting into the business of
> mapping
> > > >
> > > > (I meant to have a "not" there!)
> > > >
> > > > > non-ascii non-utf8 usernames....
> > > >
> > > > UVMUST says it MUST be UTF-8.  So if you get into the business
of
> > > > mapping non-ascii non-utf8, you are non-compliant and you have
> only
> > > > yourself to blame for having to map stuff that you aren't
> supposed
> > to
> > > > have accepted in the first place.
> > >
> > > I have some control over nfsd, but none over useradd.  If there
are
> > > people out there with /etc/passwd's containing non-utf8 non-ascii
> > > usernames then the only way I'd see to enforce a MUST of utf-8
> would
> > be
> > > by taking a stab at what encoding they're using and then mapping
to
> > and
> > > from utf-8.  No thanks!
> > >
> > > Hm, but non-ascii usernames can't be used in email addresses, can
> > they?
> > > So maybe we'll never see them in practice.
> > >
> > > --b.
> > >
> >
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4