[ldapext] Case sensitivity summary

Andrew Findlay <andrew.findlay@skills-1st.co.uk> Fri, 04 December 2015 12:43 UTC

Return-Path: <andrew.findlay@skills-1st.co.uk>
X-Original-To: ldapext@ietfa.amsl.com
Delivered-To: ldapext@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3CED41B30ED for <ldapext@ietfa.amsl.com>; Fri, 4 Dec 2015 04:43:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.003
X-Spam-Level:
X-Spam-Status: No, score=-0.003 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PrYT6-mn6QNP for <ldapext@ietfa.amsl.com>; Fri, 4 Dec 2015 04:43:14 -0800 (PST)
Received: from kea.ourshack.com (kea.ourshack.com [IPv6:2001:470:1f15:20::201]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 448861B30E3 for <ldapext@ietf.org>; Fri, 4 Dec 2015 04:43:14 -0800 (PST)
Received: from 208.51.155.90.in-addr.arpa ([90.155.51.208] helo=slab.skills-1st.co.uk) by kea.ourshack.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <andrew.findlay@skills-1st.co.uk>) id 1a4phf-0004de-64 for ldapext@ietf.org; Fri, 04 Dec 2015 12:43:11 +0000
Received: from andrew by slab.skills-1st.co.uk with local (Exim 4.85) (envelope-from <andrew.findlay@skills-1st.co.uk>) id 1a4pco-0002Nr-8Q; Fri, 04 Dec 2015 12:38:10 +0000
Date: Fri, 4 Dec 2015 12:38:10 +0000
From: Andrew Findlay <andrew.findlay@skills-1st.co.uk>
To: LDAP Extensions list <ldapext@ietf.org>
Cc: LDAPEXT <ldapext@ietf.org>
Message-ID: <20151204123810.GA8983@slab.skills-1st.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8f2da2363f684ec6bc95f8046dd81bd3@hioexcmbx04-prd.hq.netapp.com> <CAJb3uA7Dsazhw2oVhoDsANQoeADQipqUWmMQ4wzM-4V5M8Z3tA@mail.gmail.com> <CAJb3uA4n+9LMj2gMYg_CA-YLechhnxk4mDsRQ2am+zeu-Veq1w@mail.gmail.com> <814F4E458AA9FF4E89CF1A9EDA0DE2A932F90F6F@OZWEX0209N2.msad.ms.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: Andrew Findlay <andrew.findlay@skills-1st.co.uk>
Archived-At: <http://mailarchive.ietf.org/arch/msg/ldapext/mYR34WLd7rU9qVNiPI4k0tq6iNI>
Subject: [ldapext] Case sensitivity summary
X-BeenThere: ldapext@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: LDAP Extension Working Group <ldapext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ldapext>, <mailto:ldapext-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ldapext/>
List-Post: <mailto:ldapext@ietf.org>
List-Help: <mailto:ldapext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ldapext>, <mailto:ldapext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Dec 2015 12:43:17 -0000

The more I look at this problem the more complex it gets, so time for a summary.

Mark Bannister points out that case is a hard problem. In fact if we consider
all the locales and cultures that a standard must serve it is *impossible* to
do case-folding 'right' for everyone at once.

> https://news.ycombinator.com/item?id=8876722
> http://pubs.opengroup.org/onlinepubs/000095399/xrat/xbd_chap04.html#tag_01_04_06

On the other hand, most computer users in Western cultures have become
used to a large degree of case-insensitivity and this has been rolled
into a lot of standards. There are several reasons for this, for example:

1)	History: until about 1985 there were a lot of I/O devices that
	could not handle lower case at all, so when lower-case became
	available it had to be used in a backwards-compatible way.
	Indeed, some banking systems still seem to be upper-case only.

2)	In some cultures there are no hard-and-fast rules for how certain
	name forms should be capitalised, so searches have to allow for
	this common ambiguity.

3)	Laziness. This might be considered a virtue.

Like it or not, all of the computing systems we are working with started
life with a very strong Western cultural bias. In fact most of them have
a bias towards the English language as used in North America.

This means that Western non-technical users expect case-insensitive behaviour
wherever that is possible. They have to be reminded that passwords are
(sometimes) case sensitive and that the Caps Lock key is a hazard.

The case-sensitive/insensitive problem is not just Unix vs Windows.
For most end-users the ever-growing range of web-based apps is far more
important. Many of those use e-mail addresses as usernames, and
assume that such things are always case-insensitive. The standards allow
case-sensitive names to the left of the '@' but in practice no mail admin
in their right mind would implement case sensitive addresses. At least
in this case we have a limited character set where case rules are well
defined!

We are not going to define a standard that forces everyone to do everything
'right' while providing interworking of differing systems. It is simply
not possible.

What we might aspire to is a standard that is general enough to be useful,
which is internally self-consistent, and which comes with some guidelines
to help people change their systems to work with it without breaking too
many things.

Charlie said:

> Well, in a cleanly integrated environment, I'd expect to see most
> users' Microsoft SamAccountName and POSIX uid be identical lower-cased
> strings less than 20 characters long.

... and later:

> Normal user accounts being
> created today would be unlikely to differ from each other only in
> case.  Just old stuff and unique hacks.

That seems like a very sensible target for systems serving predominantly
Western user-bases. To make it really safe you have to disallow all
non-ASCII characters, but that can be relaxed if the name form is carefully
controlled at account-creation time and the users are told to always
enter it exactly as defined.

> I just don't see any significant advantages to using a single naming
> attribute shared with other systems.   Why bother?   It's all pain, no
> gain.  Keeping case-sensitive and case-insensitive versions of user
> identifiers is easier and gives better results.  All software on the
> local node will perform as expected, and no OS documentation needs to
> be rewritten.

If we were just trying to serve Unix and Windows that might be true,
but the problem is wider. A unified account namespace for all systems
is a worthwhile target for any organisation, but every system added to
the mix brings its own quirks and limitations. Catering for those is
more in the realm of best-practice advice than standards definition.

> More importantly, *nix tools and system utilities are going to make
> case-sensitive comparisons of usernames internally, so if your name
> service daemons aren't case-sensitive as well, *nix-based systems are
> likely to be subtly broken.  Comparisons aren't restricted to the LDAP
> service host, they happen on the local OS too - including in
> site-developed code that was built to documented standards.

While true, the amount of trouble caused by this can be considerably
reduced by following a strict username allocation policy such as the one
proposed above.

Dhairesh said:

> So I agree with earlier comments that changing usernames that differ
> only in cases is a onetime pain that can be borne during the transition
> to LDAP from NIS.

Overall I think we are heading for something like this:

1)	Usernames and groupnames should have caseIgnoreMatch syntax

2)	Usernames and groupnames should preferably be stored in lower-case in
	cultures where that has meaning. Remember that LDAP is case-preserving
	even for case-insensitive attributes.

3)	Systems that are internally case-sensitive should take extra care
	when using data from LDAP.

I know this conflicts with Mark's wish to be able to import existing
NIS data unchanged to cope with weird existing situations, but I see that 
as a special case that may be better handled in another way.

Note that the above discussion *only* applies to usernames and groupnames.
Unix-specific mappings such as automount maps are a different issue.

Andrew
-- 
-----------------------------------------------------------------------
|                 From Andrew Findlay, Skills 1st Ltd                 |
| Consultant in large-scale systems, networks, and directory services |
|     http://www.skills-1st.co.uk/                +44 1628 782565     |
-----------------------------------------------------------------------