Re: [nfsv4] Chapter 12 for next rfc3530bis

<david.noveck@emc.com> Thu, 14 October 2010 13:57 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EB8153A6949 for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 06:57:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.675
X-Spam-Level:
X-Spam-Status: No, score=-6.675 tagged_above=-999 required=5 tests=[AWL=-0.076, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SDCT1Fe+m5F9 for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 06:57:32 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id AE1C33A6A02 for <nfsv4@ietf.org>; Thu, 14 Oct 2010 06:57:32 -0700 (PDT)
Received: from hop04-l1d11-si04.isus.emc.com (HOP04-L1D11-SI04.isus.emc.com [10.254.111.24]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9EDwgjU024434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 14 Oct 2010 09:58:42 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si04.isus.emc.com (RSA Interceptor); Thu, 14 Oct 2010 09:58:37 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9EDwKMB028433; Thu, 14 Oct 2010 09:58:21 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.41]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 14 Oct 2010 09:58:21 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Thu, 14 Oct 2010 09:58:19 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com>
In-Reply-To: <20101012190552.GB25673@fieldses.org>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [nfsv4] Chapter 12 for next rfc3530bis
Thread-Index: ActqQMz4YtB2wiJwSwGEtPl3rFFu5QABCo4A
References: <BF3BB6D12298F54B89C8DCC1E4073D80027DD8CC@CORPUSMX50A.corp.emc.com> <20101012170759.GA22495@fieldses.org> <20101012170936.GB22495@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA78@CORPUSMX50A.corp.emc.com> <20101012182853.GA25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA91@CORPUSMX50A.corp.emc.com> <20101012190552.GB25673@fieldses.org>
From: david.noveck@emc.com
To: bfields@fieldses.org
X-OriginalArrivalTime: 14 Oct 2010 13:58:21.0338 (UTC) FILETIME=[DC818FA0:01CB6BA7]
X-EMM-MHVC: 1
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Oct 2010 13:57:34 -0000

First of all, if it non-UTF8, it is non-ASCII.  That way we only have
one predicate to deal with.  Assuming that in the past Cyrillic, Hebrew,
Nagari, etc. names have not been allowed (Why can't I have a Linear-B
username :-) this only includes one-byte ascii and extended ascii
characters.  If there are extended-ascii characters, then you have
something that is not utf8, but the procedure to covert it to utf8 is
very simple.  You move around some of bits and come up with the two-byte
utf8 equivalent of the extended ascii characters.  The ascii characters
you copy over straight.

Why can't I just say "Problem solved"?  Am I missing something?  I
understand that "Mission accomplished" is out of bounds. 

I think there is a philosophical problem underlying this issue.  If you
treat what is returned by the lookup as an array of bytes, you have all
these problems with non-utf8-ness.  But if you treat it as an array of
characters, then the mapping to UTF-8 is clear.  But, the idea that it
is a simplification to treat everything as an array of bytes is very
prevalent.  It is true that it seems to avoid all that I18N nastiness,
with that being left for (unlucky) others  to deal with.  But that isn't
the philosophy of NFSv4 (for good reason), although for file name
components we have an accommodation to allow non-utf8 to work.  But I
can't see a similar change for user and group names.  They must be UTF-8
and the only way to provide UTF-8 depends on understanding the string as
an array of characters in some known code and converting it to UTF-8.

-----Original Message-----
From: J. Bruce Fields [mailto:bfields@fieldses.org] 
Sent: Tuesday, October 12, 2010 3:06 PM
To: Noveck, David
Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis

On Tue, Oct 12, 2010 at 02:47:40PM -0400, david.noveck@emc.com wrote:
> You as the server can enforce it by simply checking whether it is
UTF-8
> and returning INVAL if it isn't.

Alas, no, if I get a getattr for the owner of a file, find the file is
owned by uid 8569, look up 8569 and it's somebody non-ascii and
non-utf8, my choices are:

	- Return an error, or claim not to support the owner attribute
	  on this file.
	- Attempt to guess the encoding and map to utf-8 before
	  returning.
	- Just return the username as it is.

And if I pick 3 (and it's what the code would do now, and seems the
least of evils), and the client tries to send the same name back to me
in a setattr, I can't see returning INVAL.

I could try to avoid getting into the whole situation in the first place
by forbidding those names.  That's not really up to me.

In the end I guess this just means I'll be violating the spec in some
(probably very unlikely?) cases.  Solaris too I bet, if it has
historically allowed such usernames.  It won't keep me up at night.

--b.

> The client is going to have some understanding of what it is being
> presented with and certainly can map an o-umlaut for example, to the
> UTF-8 version of that which is two bytes.  It is supposed to send
UTF-8.
> 
> 
> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org] 
> Sent: Tuesday, October 12, 2010 2:29 PM
> To: Noveck, David
> Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> 
> On Tue, Oct 12, 2010 at 02:12:55PM -0400, david.noveck@emc.com wrote:
> > On Tue, Oct 12, 2010 at 01:07:59PM -0400, bfields wrote:
> > > I just want to make sure I'm getting into the business of mapping
> > 
> > (I meant to have a "not" there!)
> > 
> > > non-ascii non-utf8 usernames....
> > 
> > UVMUST says it MUST be UTF-8.  So if you get into the business of
> > mapping non-ascii non-utf8, you are non-compliant and you have only
> > yourself to blame for having to map stuff that you aren't supposed
to
> > have accepted in the first place. 
> 
> I have some control over nfsd, but none over useradd.  If there are
> people out there with /etc/passwd's containing non-utf8 non-ascii
> usernames then the only way I'd see to enforce a MUST of utf-8 would
be
> by taking a stab at what encoding they're using and then mapping to
and
> from utf-8.  No thanks!
> 
> Hm, but non-ascii usernames can't be used in email addresses, can
they?
> So maybe we'll never see them in practice.
> 
> --b.
>