Re: [nfsv4] Chapter 12 for next rfc3530bis

"J. Bruce Fields" <bfields@fieldses.org> Thu, 14 October 2010 14:41 UTC

Return-Path: <bfields@fieldses.org>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id ED8AF3A6A8B for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 07:41:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.58
X-Spam-Level:
X-Spam-Status: No, score=-2.58 tagged_above=-999 required=5 tests=[AWL=0.019, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XXqepExFNeoU for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 07:41:42 -0700 (PDT)
Received: from fieldses.org (fieldses.org [174.143.236.118]) by core3.amsl.com (Postfix) with ESMTP id 8EF523A6AD4 for <nfsv4@ietf.org>; Thu, 14 Oct 2010 07:41:39 -0700 (PDT)
Received: from bfields by fieldses.org with local (Exim 4.71) (envelope-from <bfields@fieldses.org>) id 1P6P0w-0007Jo-6V; Thu, 14 Oct 2010 10:42:38 -0400
Date: Thu, 14 Oct 2010 10:42:38 -0400
From: "J. Bruce Fields" <bfields@fieldses.org>
To: david.noveck@emc.com
Message-ID: <20101014144238.GI24146@fieldses.org>
References: <BF3BB6D12298F54B89C8DCC1E4073D80027DD8CC@CORPUSMX50A.corp.emc.com> <20101012170759.GA22495@fieldses.org> <20101012170936.GB22495@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA78@CORPUSMX50A.corp.emc.com> <20101012182853.GA25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA91@CORPUSMX50A.corp.emc.com> <20101012190552.GB25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Oct 2010 14:41:47 -0000

On Thu, Oct 14, 2010 at 09:58:19AM -0400, david.noveck@emc.com wrote:
> First of all, if it non-UTF8, it is non-ASCII.  That way we only have
> one predicate to deal with.  Assuming that in the past Cyrillic, Hebrew,
> Nagari, etc. names have not been allowed (Why can't I have a Linear-B
> username :-) this only includes one-byte ascii and extended ascii
> characters.  If there are extended-ascii characters, then you have
> something that is not utf8, but the procedure to covert it to utf8 is
> very simple.  You move around some of bits and come up with the two-byte
> utf8 equivalent of the extended ascii characters.  The ascii characters
> you copy over straight.
> 
> Why can't I just say "Problem solved"?  Am I missing something?  I
> understand that "Mission accomplished" is out of bounds. 

So I think that's the:

> 	- Attempt to guess the encoding and map to utf-8 before
> 	  returning.

option.

My understanding is that converting to utf8 is not so simple: you have
to know which of the multiple possible extended ascii character sets
you're converting from.  /etc/passwd doesn't necessarily come with that
information.

> I think there is a philosophical problem underlying this issue.  If you
> treat what is returned by the lookup as an array of bytes, you have all
> these problems with non-utf8-ness.  But if you treat it as an array of
> characters,

I can't assume that whatever password database is using knows about
encodings--so I really am stuck with just an array of bytes.

--b.

> then the mapping to UTF-8 is clear.  But, the idea that it
> is a simplification to treat everything as an array of bytes is very
> prevalent.  It is true that it seems to avoid all that I18N nastiness,
> with that being left for (unlucky) others  to deal with.  But that isn't
> the philosophy of NFSv4 (for good reason), although for file name
> components we have an accommodation to allow non-utf8 to work.  But I
> can't see a similar change for user and group names.  They must be UTF-8
> and the only way to provide UTF-8 depends on understanding the string as
> an array of characters in some known code and converting it to UTF-8.
> 
> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org] 
> Sent: Tuesday, October 12, 2010 3:06 PM
> To: Noveck, David
> Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> 
> On Tue, Oct 12, 2010 at 02:47:40PM -0400, david.noveck@emc.com wrote:
> > You as the server can enforce it by simply checking whether it is
> UTF-8
> > and returning INVAL if it isn't.
> 
> Alas, no, if I get a getattr for the owner of a file, find the file is
> owned by uid 8569, look up 8569 and it's somebody non-ascii and
> non-utf8, my choices are:
> 
> 	- Return an error, or claim not to support the owner attribute
> 	  on this file.
> 	- Attempt to guess the encoding and map to utf-8 before
> 	  returning.
> 	- Just return the username as it is.
> 
> And if I pick 3 (and it's what the code would do now, and seems the
> least of evils), and the client tries to send the same name back to me
> in a setattr, I can't see returning INVAL.
> 
> I could try to avoid getting into the whole situation in the first place
> by forbidding those names.  That's not really up to me.
> 
> In the end I guess this just means I'll be violating the spec in some
> (probably very unlikely?) cases.  Solaris too I bet, if it has
> historically allowed such usernames.  It won't keep me up at night.
> 
> --b.
> 
> > The client is going to have some understanding of what it is being
> > presented with and certainly can map an o-umlaut for example, to the
> > UTF-8 version of that which is two bytes.  It is supposed to send
> UTF-8.
> > 
> > 
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org] 
> > Sent: Tuesday, October 12, 2010 2:29 PM
> > To: Noveck, David
> > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > 
> > On Tue, Oct 12, 2010 at 02:12:55PM -0400, david.noveck@emc.com wrote:
> > > On Tue, Oct 12, 2010 at 01:07:59PM -0400, bfields wrote:
> > > > I just want to make sure I'm getting into the business of mapping
> > > 
> > > (I meant to have a "not" there!)
> > > 
> > > > non-ascii non-utf8 usernames....
> > > 
> > > UVMUST says it MUST be UTF-8.  So if you get into the business of
> > > mapping non-ascii non-utf8, you are non-compliant and you have only
> > > yourself to blame for having to map stuff that you aren't supposed
> to
> > > have accepted in the first place. 
> > 
> > I have some control over nfsd, but none over useradd.  If there are
> > people out there with /etc/passwd's containing non-utf8 non-ascii
> > usernames then the only way I'd see to enforce a MUST of utf-8 would
> be
> > by taking a stab at what encoding they're using and then mapping to
> and
> > from utf-8.  No thanks!
> > 
> > Hm, but non-ascii usernames can't be used in email addresses, can
> they?
> > So maybe we'll never see them in practice.
> > 
> > --b.
> > 
>