Re: [nfsv4] Chapter 12 for next rfc3530bis

"J. Bruce Fields" <bfields@fieldses.org> Thu, 14 October 2010 18:00 UTC

Return-Path: <bfields@fieldses.org>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 81AFE3A684F for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 11:00:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.582
X-Spam-Level:
X-Spam-Status: No, score=-2.582 tagged_above=-999 required=5 tests=[AWL=0.017, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BgJJQmlY0c5E for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 11:00:01 -0700 (PDT)
Received: from fieldses.org (fieldses.org [174.143.236.118]) by core3.amsl.com (Postfix) with ESMTP id E21363A693E for <nfsv4@ietf.org>; Thu, 14 Oct 2010 11:00:00 -0700 (PDT)
Received: from bfields by fieldses.org with local (Exim 4.71) (envelope-from <bfields@fieldses.org>) id 1P6S7E-0000kT-2d; Thu, 14 Oct 2010 14:01:20 -0400
Date: Thu, 14 Oct 2010 14:01:19 -0400
From: "J. Bruce Fields" <bfields@fieldses.org>
To: david.noveck@emc.com
Message-ID: <20101014180119.GA1913@fieldses.org>
References: <20101012170759.GA22495@fieldses.org> <20101012170936.GB22495@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA78@CORPUSMX50A.corp.emc.com> <20101012182853.GA25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA91@CORPUSMX50A.corp.emc.com> <20101012190552.GB25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com> <20101014144238.GI24146@fieldses.org> <7C4DFCE962635144B8FAE8CA11D0BF1E03D1C641EC@MX14A.corp.emc.com> <BF3BB6D12298F54B89C8DCC1E4073D80028516ED@CORPUSMX50A.corp.emc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D80028516ED@CORPUSMX50A.corp.emc.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Oct 2010 18:00:03 -0000

On Thu, Oct 14, 2010 at 12:53:18PM -0400, david.noveck@emc.com wrote:
> > Changing this situation will require passing charset 
> > info on the wire, and that will be awful, IMHO. 
> 
> A) It is awful in my opinion, as well.

And it doesn't help, since the problematic case is that of a preexisting
non-UTF8 username whose encoding we *don't know*.

> B) That isn't the NFSv4 protocol.  We have a design for this stuff and
> it is to encode it in UTF-8 and we're not changing it, except if it
> absolutely cannot work.

My conclusion for now is: a) I'll probably never get a bug report about
this since presumably email and lots of other things break if you try
it, b) in the unlikely case I do get a complaint, I'm happy to tell the
adminstrator they're out of spec, and c) this is something that drives
everyone who touches it (including, obviously, me) nuts, too much time
has already been spent on it, and I regret bring it up, ok, ok, ok!

> In the past when the issue of converting stings to UTF-8 according to
> locale was raised, it was for name components and Linux implementations
> would not agree to do anything like that.

Well, and that nobody else actually attempted that sort of conversion
(e.g., making open() locale-aware) either--it was never just Linux.

--b.

> There was an objection to
> doing this kind of conversion in the kernel, as not appropriate to the
> kernel environment.  I believe Linus was quite firm on the matter.
> 
> In the case of this lookup function I'm guessing that this happens
> outside the kernel.  If that's the case, conversion based on the locale
> to UTF-8 could happen before the data was returned for the lookup and
> the kernel string conversion issue would not arise.
> 
> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of david.black@emc.com
> Sent: Thursday, October 14, 2010 12:02 PM
> To: bfields@fieldses.org
> Cc: nfsv4@ietf.org
> Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> 
> Hi Bruce,
> 
> > So I think that's the:
> > 
> > > 	- Attempt to guess the encoding and map to utf-8 before
> > > 	  returning.
> > 
> > option.
> > 
> > My understanding is that converting to utf8 is not so simple: you have
> > to know which of the multiple possible extended ascii character sets
> > you're converting from.  /etc/passwd doesn't necessarily come with
> that
> > information.
> 
> First of all, more than a "guess" is required, and some system help may
> well be needed.  To take a specific example, if /etc/passwd contains
> usernames in multiple incompatible ISO 8859 (extended ASCII) character
> sets (http://www.terena.org/activities/multiling/ml-docs/iso-8859.html),
> somebody has seriously screwed up the system administration - IMHO, the
> "system administrator" component is probably defective and should be
> replaced ;-).  Assuming a competent system administrator, there should
> be at most one ISO 8859 charset in use and there should be a system
> locale from which it's possible to determine which one.  In contrast,
> trying to "guess" in isolation which 8859 charset is in use is
> dangerous.
> 
> As should be clear from prior discussion, NFSv4 is UTF-8 on the wire,
> which rules out:
> 
> > > 	- Just return the username as it is.
> 
> Changing this situation will require passing charset info on the wire,
> and that will be awful, IMHO.
> 
> Converting other charsets to/from UTF-8 can in general require locale
> info, not just knowing which charset, because the 8859 extended code
> point space is very limited by comparison to Unicode.  For an
> eye-opening related example see RFC 5895's discussion of converting
> capital "I" to lower case for Turkish
> (http://datatracker.ietf.org/doc/rfc5895/).
> 
> Thanks,
> --David
> 
> 
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Thursday, October 14, 2010 10:43 AM
> > To: Noveck, David
> > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > 
> > On Thu, Oct 14, 2010 at 09:58:19AM -0400, david.noveck@emc.com wrote:
> > > First of all, if it non-UTF8, it is non-ASCII.  That way we only
> have
> > > one predicate to deal with.  Assuming that in the past Cyrillic,
> Hebrew,
> > > Nagari, etc. names have not been allowed (Why can't I have a
> Linear-B
> > > username :-) this only includes one-byte ascii and extended ascii
> > > characters.  If there are extended-ascii characters, then you have
> > > something that is not utf8, but the procedure to covert it to utf8
> is
> > > very simple.  You move around some of bits and come up with the
> two-byte
> > > utf8 equivalent of the extended ascii characters.  The ascii
> characters
> > > you copy over straight.
> > >
> > > Why can't I just say "Problem solved"?  Am I missing something?  I
> > > understand that "Mission accomplished" is out of bounds.
> > 
> > So I think that's the:
> > 
> > > 	- Attempt to guess the encoding and map to utf-8 before
> > > 	  returning.
> > 
> > option.
> > 
> > My understanding is that converting to utf8 is not so simple: you have
> > to know which of the multiple possible extended ascii character sets
> > you're converting from.  /etc/passwd doesn't necessarily come with
> that
> > information.
> > 
> > > I think there is a philosophical problem underlying this issue.  If
> you
> > > treat what is returned by the lookup as an array of bytes, you have
> all
> > > these problems with non-utf8-ness.  But if you treat it as an array
> of
> > > characters,
> > 
> > I can't assume that whatever password database is using knows about
> > encodings--so I really am stuck with just an array of bytes.
> > 
> > --b.
> > 
> > > then the mapping to UTF-8 is clear.  But, the idea that it
> > > is a simplification to treat everything as an array of bytes is very
> > > prevalent.  It is true that it seems to avoid all that I18N
> nastiness,
> > > with that being left for (unlucky) others  to deal with.  But that
> isn't
> > > the philosophy of NFSv4 (for good reason), although for file name
> > > components we have an accommodation to allow non-utf8 to work.  But
> I
> > > can't see a similar change for user and group names.  They must be
> UTF-8
> > > and the only way to provide UTF-8 depends on understanding the
> string as
> > > an array of characters in some known code and converting it to
> UTF-8.
> > >
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, October 12, 2010 3:06 PM
> > > To: Noveck, David
> > > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > >
> > > On Tue, Oct 12, 2010 at 02:47:40PM -0400, david.noveck@emc.com
> wrote:
> > > > You as the server can enforce it by simply checking whether it is
> > > UTF-8
> > > > and returning INVAL if it isn't.
> > >
> > > Alas, no, if I get a getattr for the owner of a file, find the file
> is
> > > owned by uid 8569, look up 8569 and it's somebody non-ascii and
> > > non-utf8, my choices are:
> > >
> > > 	- Return an error, or claim not to support the owner attribute
> > > 	  on this file.
> > > 	- Attempt to guess the encoding and map to utf-8 before
> > > 	  returning.
> > > 	- Just return the username as it is.
> > >
> > > And if I pick 3 (and it's what the code would do now, and seems the
> > > least of evils), and the client tries to send the same name back to
> me
> > > in a setattr, I can't see returning INVAL.
> > >
> > > I could try to avoid getting into the whole situation in the first
> place
> > > by forbidding those names.  That's not really up to me.
> > >
> > > In the end I guess this just means I'll be violating the spec in
> some
> > > (probably very unlikely?) cases.  Solaris too I bet, if it has
> > > historically allowed such usernames.  It won't keep me up at night.
> > >
> > > --b.
> > >
> > > > The client is going to have some understanding of what it is being
> > > > presented with and certainly can map an o-umlaut for example, to
> the
> > > > UTF-8 version of that which is two bytes.  It is supposed to send
> > > UTF-8.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > > Sent: Tuesday, October 12, 2010 2:29 PM
> > > > To: Noveck, David
> > > > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > > > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > > >
> > > > On Tue, Oct 12, 2010 at 02:12:55PM -0400, david.noveck@emc.com
> wrote:
> > > > > On Tue, Oct 12, 2010 at 01:07:59PM -0400, bfields wrote:
> > > > > > I just want to make sure I'm getting into the business of
> mapping
> > > > >
> > > > > (I meant to have a "not" there!)
> > > > >
> > > > > > non-ascii non-utf8 usernames....
> > > > >
> > > > > UVMUST says it MUST be UTF-8.  So if you get into the business
> of
> > > > > mapping non-ascii non-utf8, you are non-compliant and you have
> only
> > > > > yourself to blame for having to map stuff that you aren't
> supposed
> > > to
> > > > > have accepted in the first place.
> > > >
> > > > I have some control over nfsd, but none over useradd.  If there
> are
> > > > people out there with /etc/passwd's containing non-utf8 non-ascii
> > > > usernames then the only way I'd see to enforce a MUST of utf-8
> would
> > > be
> > > > by taking a stab at what encoding they're using and then mapping
> to
> > > and
> > > > from utf-8.  No thanks!
> > > >
> > > > Hm, but non-ascii usernames can't be used in email addresses, can
> > > they?
> > > > So maybe we'll never see them in practice.
> > > >
> > > > --b.
> > > >
> > >
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>