Re: [nfsv4] Chapter 12 for next rfc3530bis

<david.black@emc.com> Thu, 14 October 2010 16:01 UTC

Return-Path: <david.black@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 93F193A6930 for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 09:01:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.511
X-Spam-Level:
X-Spam-Status: No, score=-106.511 tagged_above=-999 required=5 tests=[AWL=0.088, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VF9KsXrP97CK for <nfsv4@core3.amsl.com>; Thu, 14 Oct 2010 09:01:21 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 89BE83A6AD0 for <nfsv4@ietf.org>; Thu, 14 Oct 2010 09:01:21 -0700 (PDT)
Received: from hop04-l1d11-si03.isus.emc.com (HOP04-L1D11-SI03.isus.emc.com [10.254.111.23]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9EG2XbU013796 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 14 Oct 2010 12:02:33 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.253]) by hop04-l1d11-si03.isus.emc.com (RSA Interceptor); Thu, 14 Oct 2010 12:02:25 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9EG21UB020415; Thu, 14 Oct 2010 12:02:02 -0400
Received: from mxhub06.corp.emc.com ([128.221.46.114]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 14 Oct 2010 12:02:01 -0400
Received: from mx14a.corp.emc.com ([169.254.1.11]) by mxhub06.corp.emc.com ([128.221.46.114]) with mapi; Thu, 14 Oct 2010 12:01:59 -0400
From: david.black@emc.com
To: bfields@fieldses.org
Date: Thu, 14 Oct 2010 12:01:52 -0400
Thread-Topic: [nfsv4] Chapter 12 for next rfc3530bis
Thread-Index: ActrrjVdgAMoWHCCS5Suh9avI+AQDAABtmxg
Message-ID: <7C4DFCE962635144B8FAE8CA11D0BF1E03D1C641EC@MX14A.corp.emc.com>
References: <BF3BB6D12298F54B89C8DCC1E4073D80027DD8CC@CORPUSMX50A.corp.emc.com> <20101012170759.GA22495@fieldses.org> <20101012170936.GB22495@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA78@CORPUSMX50A.corp.emc.com> <20101012182853.GA25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D80027DDA91@CORPUSMX50A.corp.emc.com> <20101012190552.GB25673@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8002851625@CORPUSMX50A.corp.emc.com> <20101014144238.GI24146@fieldses.org>
In-Reply-To: <20101014144238.GI24146@fieldses.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-cr-puzzleid: {6421CCE9-1F18-4699-82BD-B3C8345C9F31}
x-cr-hashedpuzzle: AMka CoYv D0cW E2/V FWY7 FsGv ImAB Iu4z I2UI J8t/ LKRT Pqo8 VPBq Vgqn Vujh VwOV; 2; YgBmAGkAZQBsAGQAcwBAAGYAaQBlAGwAZABzAGUAcwAuAG8AcgBnADsAbgBmAHMAdgA0AEAAaQBlAHQAZgAuAG8AcgBnAA==; Sosha1_v1; 7; {6421CCE9-1F18-4699-82BD-B3C8345C9F31}; ZABhAHYAaQBkAC4AYgBsAGEAYwBrAEAAZQBtAGMALgBjAG8AbQA=; Thu, 14 Oct 2010 16:01:52 GMT; UgBFADoAIABbAG4AZgBzAHYANABdACAAQwBoAGEAcAB0AGUAcgAgADEAMgAgAGYAbwByACAAbgBlAHgAdAAgAHIAZgBjADMANQAzADAAYgBpAHMA
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginalArrivalTime: 14 Oct 2010 16:02:01.0829 (UTC) FILETIME=[2376A150:01CB6BB9]
X-EMM-MHVC: 1
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Oct 2010 16:01:24 -0000

Hi Bruce,

> So I think that's the:
> 
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> 
> option.
> 
> My understanding is that converting to utf8 is not so simple: you have
> to know which of the multiple possible extended ascii character sets
> you're converting from.  /etc/passwd doesn't necessarily come with that
> information.

First of all, more than a "guess" is required, and some system help may well be needed.  To take a specific example, if /etc/passwd contains usernames in multiple incompatible ISO 8859 (extended ASCII) character sets (http://www.terena.org/activities/multiling/ml-docs/iso-8859.html) somebody has seriously screwed up the system administration - IMHO, the "system administrator" component is probably defective and should be replaced ;-).  Assuming a competent system administrator, there should be at most one ISO 8859 charset in use and there should be a system locale from which it's possible to determine which one.  In contrast, trying to "guess" in isolation which 8859 charset is in use is dangerous.

As should be clear from prior discussion, NFSv4 is UTF-8 on the wire, which rules out:

> > 	- Just return the username as it is.

Changing this situation will require passing charset info on the wire, and that will be awful, IMHO.

Converting other charsets to/from UTF-8 can in general require locale info, not just knowing which charset, because the 8859 extended code point space is very limited by comparison to Unicode.  For an eye-opening related example see RFC 5895's discussion of converting capital "I" to lower case for Turkish (http://datatracker.ietf.org/doc/rfc5895/)

Thanks,
--David


> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Thursday, October 14, 2010 10:43 AM
> To: Noveck, David
> Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> 
> On Thu, Oct 14, 2010 at 09:58:19AM -0400, david.noveck@emc.com wrote:
> > First of all, if it non-UTF8, it is non-ASCII.  That way we only have
> > one predicate to deal with.  Assuming that in the past Cyrillic, Hebrew,
> > Nagari, etc. names have not been allowed (Why can't I have a Linear-B
> > username :-) this only includes one-byte ascii and extended ascii
> > characters.  If there are extended-ascii characters, then you have
> > something that is not utf8, but the procedure to covert it to utf8 is
> > very simple.  You move around some of bits and come up with the two-byte
> > utf8 equivalent of the extended ascii characters.  The ascii characters
> > you copy over straight.
> >
> > Why can't I just say "Problem solved"?  Am I missing something?  I
> > understand that "Mission accomplished" is out of bounds.
> 
> So I think that's the:
> 
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> 
> option.
> 
> My understanding is that converting to utf8 is not so simple: you have
> to know which of the multiple possible extended ascii character sets
> you're converting from.  /etc/passwd doesn't necessarily come with that
> information.
> 
> > I think there is a philosophical problem underlying this issue.  If you
> > treat what is returned by the lookup as an array of bytes, you have all
> > these problems with non-utf8-ness.  But if you treat it as an array of
> > characters,
> 
> I can't assume that whatever password database is using knows about
> encodings--so I really am stuck with just an array of bytes.
> 
> --b.
> 
> > then the mapping to UTF-8 is clear.  But, the idea that it
> > is a simplification to treat everything as an array of bytes is very
> > prevalent.  It is true that it seems to avoid all that I18N nastiness,
> > with that being left for (unlucky) others  to deal with.  But that isn't
> > the philosophy of NFSv4 (for good reason), although for file name
> > components we have an accommodation to allow non-utf8 to work.  But I
> > can't see a similar change for user and group names.  They must be UTF-8
> > and the only way to provide UTF-8 depends on understanding the string as
> > an array of characters in some known code and converting it to UTF-8.
> >
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Tuesday, October 12, 2010 3:06 PM
> > To: Noveck, David
> > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> >
> > On Tue, Oct 12, 2010 at 02:47:40PM -0400, david.noveck@emc.com wrote:
> > > You as the server can enforce it by simply checking whether it is
> > UTF-8
> > > and returning INVAL if it isn't.
> >
> > Alas, no, if I get a getattr for the owner of a file, find the file is
> > owned by uid 8569, look up 8569 and it's somebody non-ascii and
> > non-utf8, my choices are:
> >
> > 	- Return an error, or claim not to support the owner attribute
> > 	  on this file.
> > 	- Attempt to guess the encoding and map to utf-8 before
> > 	  returning.
> > 	- Just return the username as it is.
> >
> > And if I pick 3 (and it's what the code would do now, and seems the
> > least of evils), and the client tries to send the same name back to me
> > in a setattr, I can't see returning INVAL.
> >
> > I could try to avoid getting into the whole situation in the first place
> > by forbidding those names.  That's not really up to me.
> >
> > In the end I guess this just means I'll be violating the spec in some
> > (probably very unlikely?) cases.  Solaris too I bet, if it has
> > historically allowed such usernames.  It won't keep me up at night.
> >
> > --b.
> >
> > > The client is going to have some understanding of what it is being
> > > presented with and certainly can map an o-umlaut for example, to the
> > > UTF-8 version of that which is two bytes.  It is supposed to send
> > UTF-8.
> > >
> > >
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, October 12, 2010 2:29 PM
> > > To: Noveck, David
> > > Cc: Black, David; jlentini@netapp.com; nfsv4@ietf.org
> > > Subject: Re: [nfsv4] Chapter 12 for next rfc3530bis
> > >
> > > On Tue, Oct 12, 2010 at 02:12:55PM -0400, david.noveck@emc.com wrote:
> > > > On Tue, Oct 12, 2010 at 01:07:59PM -0400, bfields wrote:
> > > > > I just want to make sure I'm getting into the business of mapping
> > > >
> > > > (I meant to have a "not" there!)
> > > >
> > > > > non-ascii non-utf8 usernames....
> > > >
> > > > UVMUST says it MUST be UTF-8.  So if you get into the business of
> > > > mapping non-ascii non-utf8, you are non-compliant and you have only
> > > > yourself to blame for having to map stuff that you aren't supposed
> > to
> > > > have accepted in the first place.
> > >
> > > I have some control over nfsd, but none over useradd.  If there are
> > > people out there with /etc/passwd's containing non-utf8 non-ascii
> > > usernames then the only way I'd see to enforce a MUST of utf-8 would
> > be
> > > by taking a stab at what encoding they're using and then mapping to
> > and
> > > from utf-8.  No thanks!
> > >
> > > Hm, but non-ascii usernames can't be used in email addresses, can
> > they?
> > > So maybe we'll never see them in practice.
> > >
> > > --b.
> > >
> >