Re: [nfsv4] Chapter 12 for next rfc3530bis

<david.noveck@emc.com> Thu, 14 October 2010 16:52 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 21A4F3A6984 for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 09:52:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.674
X-Spam-Level:
X-Spam-Status: No, score=-6.674 tagged_above=-999 required=5 tests=[AWL=-0.075, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BQQNw5OQcQrz for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 09:52:37 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 360CA3A693A for <nfsv4@ietf.org>; Thu, 14 Oct 2010 09:52:36 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9EGrn9L015329 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 14 Oct 2010 12:53:49 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.145]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor); Thu, 14 Oct 2010 12:53:43 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9EGr94e032402; Thu, 14 Oct 2010 12:53:18 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.41]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 14 Oct 2010 12:53:15 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Thu, 14 Oct 2010 12:53:18 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D80028516ED@CORPUSMX50A.corp.emc.com>
In-Reply-To: <7C4DFCE962635144B8FAE8CA11D0BF1E03D1C641EC@MX14A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [nfsv4] Chapter 12 for next rfc3530bis
Thread-Index: ActrrjVdgAMoWHCCS5Suh9avI+AQDAABtmxgAAJ7TZA=
References: <BF3BB6D12298F54B89C8DCC1E4073D80027DD8CC@CORPUSMX50A.corp.emc.com><20101012170759.GA22495@fieldses.org><20101012170936.GB22495@fieldses.org><BF3BB6D12298F54B89C8DCC1E4073D80027DDA78@CORPUSMX50A.corp.emc.com><20101012182853.GA25673@fieldses.org><BF3BB6D12298F54B89C8DCC1E4073D80027DDA91@CORPUSMX50A.corp.emc.com><20101012190552.GB25673@fieldses.org><BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com><20101014144238.GI24146@fieldses.org> <7C4DFCE962635144B8FAE8CA11D0BF1E03D1C641EC@MX14A.corp.emc.com>
From: david.noveck@emc.com
To: david.black@emc.com, bfields@fieldses.org
X-OriginalArrivalTime: 14 Oct 2010 16:53:15.0760 (UTC) FILETIME=[4BAB4700:01CB6BC0]
X-EMM-MHVC: 1
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Oct 2010 16:52:39 -0000

> Changing this situation will require passing charset 
> info on the wire, and that will be awful, IMHO. 

A) It is awful in my opinion, as well.
B) That isn't the NFSv4 protocol.  We have a design for this stuff and
it is to encode it in UTF-8 and we're not changing it, except if it
absolutely cannot work.

In the past when the issue of converting stings to UTF-8 according to
locale was raised, it was for name components and Linux implementations
would not agree to do anything like that.  There was an objection to
doing this kind of conversion in the kernel, as not appropriate to the
kernel environment.  I believe Linus was quite firm on the matter.

In the case of this lookup function I'm guessing that this happens
outside the kernel.  If that's the case, conversion based on the locale
to UTF-8 could happen before the data was returned for the lookup and
the kernel string conversion issue would not arise.

-----Original Message-----
From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
Of david.black@emc.com
Sent: Thursday, October 14, 2010 12:02 PM
To: bfields@fieldses.org
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis

Hi Bruce,

> So I think that's the:
> 
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> 
> option.
> 
> My understanding is that converting to utf8 is not so simple: you have
> to know which of the multiple possible extended ascii character sets
> you're converting from.  /etc/passwd doesn't necessarily come with
that
> information.

First of all, more than a "guess" is required, and some system help may
well be needed.  To take a specific example, if /etc/passwd contains
usernames in multiple incompatible ISO 8859 (extended ASCII) character
sets (http://www.terena.org/activities/multiling/ml-docs/iso-8859.html)
somebody has seriously screwed up the system administration - IMHO, the
"system administrator" component is probably defective and should be
replaced ;-).  Assuming a competent system administrator, there should
be at most one ISO 8859 charset in use and there should be a system
locale from which it's possible to determine which one.  In contrast,
trying to "guess" in isolation which 8859 charset is in use is
dangerous.

As should be clear from prior discussion, NFSv4 is UTF-8 on the wire,
which rules out:

> > 	- Just return the username as it is.

Changing this situation will require passing charset info on the wire,
and that will be awful, IMHO.

Converting other charsets to/from UTF-8 can in general require locale
info, not just knowing which charset, because the 8859 extended code
point space is very limited by comparison to Unicode.  For an
eye-opening related example see RFC 5895's discussion of converting
capital "I" to lower case for Turkish
(http://datatracker.ietf.org/doc/rfc5895/)

Thanks,
--David


> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Thursday, October 14, 2010 10:43 AM
> To: Noveck, David
> Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> 
> On Thu, Oct 14, 2010 at 09:58:19AM -0400, david.noveck@emc.com wrote:
> > First of all, if it non-UTF8, it is non-ASCII.  That way we only
have
> > one predicate to deal with.  Assuming that in the past Cyrillic,
Hebrew,
> > Nagari, etc. names have not been allowed (Why can't I have a
Linear-B
> > username :-) this only includes one-byte ascii and extended ascii
> > characters.  If there are extended-ascii characters, then you have
> > something that is not utf8, but the procedure to covert it to utf8
is
> > very simple.  You move around some of bits and come up with the
two-byte
> > utf8 equivalent of the extended ascii characters.  The ascii
characters
> > you copy over straight.
> >
> > Why can't I just say "Problem solved"?  Am I missing something?  I
> > understand that "Mission accomplished" is out of bounds.
> 
> So I think that's the:
> 
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> 
> option.
> 
> My understanding is that converting to utf8 is not so simple: you have
> to know which of the multiple possible extended ascii character sets
> you're converting from.  /etc/passwd doesn't necessarily come with
that
> information.
> 
> > I think there is a philosophical problem underlying this issue.  If
you
> > treat what is returned by the lookup as an array of bytes, you have
all
> > these problems with non-utf8-ness.  But if you treat it as an array
of
> > characters,
> 
> I can't assume that whatever password database is using knows about
> encodings--so I really am stuck with just an array of bytes.
> 
> --b.
> 
> > then the mapping to UTF-8 is clear.  But, the idea that it
> > is a simplification to treat everything as an array of bytes is very
> > prevalent.  It is true that it seems to avoid all that I18N
nastiness,
> > with that being left for (unlucky) others  to deal with.  But that
isn't
> > the philosophy of NFSv4 (for good reason), although for file name
> > components we have an accommodation to allow non-utf8 to work.  But
I
> > can't see a similar change for user and group names.  They must be
UTF-8
> > and the only way to provide UTF-8 depends on understanding the
string as
> > an array of characters in some known code and converting it to
UTF-8.
> >
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Tuesday, October 12, 2010 3:06 PM
> > To: Noveck, David
> > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> >
> > On Tue, Oct 12, 2010 at 02:47:40PM -0400, david.noveck@emc.com
wrote:
> > > You as the server can enforce it by simply checking whether it is
> > UTF-8
> > > and returning INVAL if it isn't.
> >
> > Alas, no, if I get a getattr for the owner of a file, find the file
is
> > owned by uid 8569, look up 8569 and it's somebody non-ascii and
> > non-utf8, my choices are:
> >
> > 	- Return an error, or claim not to support the owner attribute
> > 	  on this file.
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> > 	- Just return the username as it is.
> >
> > And if I pick 3 (and it's what the code would do now, and seems the
> > least of evils), and the client tries to send the same name back to
me
> > in a setattr, I can't see returning INVAL.
> >
> > I could try to avoid getting into the whole situation in the first
place
> > by forbidding those names.  That's not really up to me.
> >
> > In the end I guess this just means I'll be violating the spec in
some
> > (probably very unlikely?) cases.  Solaris too I bet, if it has
> > historically allowed such usernames.  It won't keep me up at night.
> >
> > --b.
> >
> > > The client is going to have some understanding of what it is being
> > > presented with and certainly can map an o-umlaut for example, to
the
> > > UTF-8 version of that which is two bytes.  It is supposed to send
> > UTF-8.
> > >
> > >
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, October 12, 2010 2:29 PM
> > > To: Noveck, David
> > > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > >
> > > On Tue, Oct 12, 2010 at 02:12:55PM -0400, david.noveck@emc.com
wrote:
> > > > On Tue, Oct 12, 2010 at 01:07:59PM -0400, bfields wrote:
> > > > > I just want to make sure I'm getting into the business of
mapping
> > > >
> > > > (I meant to have a "not" there!)
> > > >
> > > > > non-ascii non-utf8 usernames....
> > > >
> > > > UVMUST says it MUST be UTF-8.  So if you get into the business
of
> > > > mapping non-ascii non-utf8, you are non-compliant and you have
only
> > > > yourself to blame for having to map stuff that you aren't
supposed
> > to
> > > > have accepted in the first place.
> > >
> > > I have some control over nfsd, but none over useradd.  If there
are
> > > people out there with /etc/passwd's containing non-utf8 non-ascii
> > > usernames then the only way I'd see to enforce a MUST of utf-8
would
> > be
> > > by taking a stab at what encoding they're using and then mapping
to
> > and
> > > from utf-8.  No thanks!
> > >
> > > Hm, but non-ascii usernames can't be used in email addresses, can
> > they?
> > > So maybe we'll never see them in practice.
> > >
> > > --b.
> > >
> >

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4