Re: Sorting names

Martijn Koster <m.koster@nexor.co.uk> Thu, 20 October 1994 11:09 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa01069; 20 Oct 94 7:09 EDT
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa01065; 20 Oct 94 7:09 EDT
Received: from mocha.Bunyip.Com by CNRI.Reston.VA.US id aa03065; 20 Oct 94 7:09 EDT
Received: by mocha.bunyip.com (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA14673 on Thu, 20 Oct 94 06:45:44 -0400
Received: from sifon.CC.McGill.CA by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA14669 (mail destined for /usr/lib/sendmail -odq -oi -fiafa-request iafa-out) on Thu, 20 Oct 94 06:45:36 -0400
Received: from lancaster.nexor.co.uk (lancaster.nexor.co.uk [128.243.6.3]) by sifon.CC.McGill.CA (8.6.9/8.6.6) with ESMTP id GAA06502 for <iafa@cc.mcgill.ca>; Thu, 20 Oct 1994 06:44:27 -0400
Message-Id: <199410201044.GAA06502@sifon.CC.McGill.CA>
Received: from nexor.co.uk (actually host victor.nexor.co.uk) by lancaster.nexor.co.uk with SMTP (PP); Thu, 20 Oct 1994 11:44:32 +0100
To: Sally Hambridge <sallyh@ludwig.intel.com>
Cc: iafa@cc.mcgill.ca
Subject: Re: Sorting names
In-Reply-To: Your message of "Wed, 19 Oct 1994 11:23:09 PDT." <9410191823.AA05358@Ludwig.intel.com>
Date: Thu, 20 Oct 1994 11:44:20 +0100
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Martijn Koster <m.koster@nexor.co.uk>

Sally Hambridge <sallyh@Ludwig.intel.com> writes:

> This is a challenge.  How do I persuade you that sorting is important
> when you're not convinced it is.  Hmmmm.

I guess I should have been more specifc :-) I think that sorting by
author name is of less utility than it appears, I'll try and explain.

> If they get a lot of hits back ... they will want to orgranize the
> templates in a way to make browsing through them easier.

> This may be one of several ways:  By reverse date order; by an index
> weighting scheme (as Wais does), or by author name.  

If they know something about the author name, they can reduce the
results by specifying something like "author=*oster", even if the data
is unsorted. If their software doen't allow this they can do it
themselves, in which case ordering helps only if they happen to know
the prefix of the surname, and the sorting is done properly. In this
case further searching is far more useful.

The only other advantage of sorting is that you hopefully end up with
multiple publications from the same person together, which means you
can skip that name if it's not what you want, or you may assume some
similarity between their publications.

So yes, sorting by author is useful but in my mind far less useful
than other methods such as further searching, weighted indices,
keywords etc. I think the Sorted by Author requirement exists mainly
because non-electronic systems have no alternatives. But I won't go
into info-system psychology theory (as I know nothing about it :-)

My dislike of "Last name, First name" is that:
- not every one will do it, so you cannot rely on it (problem)
- people will use inconsistent conventions with fancy names (problem)
- You end up with "This was written by Koster, Martijn",
  which I think is ugly. (opinion)

> Try this:  if we don't allow sorting of these names, they will *NEVER*
> be sortable.  If in the future people decide they need this ability, we
> will not be able to offer it without wild code convolutions
> If your objection is that this *LOOKS* bad, then I say I favor ugly
> templates over ugly code.

I'm not against allowing them to sorted at all, but I'm wary of
it causing confusion and interoperability problems.

> I don't mean to rant, but I do think this needs to change.

OK. Just as an observation, X.500 does this by separating the concepts
of a "Common Name" and "Surname" and has attributes for both. That I
don't like either :-) Then they got into arguments about the fact that
the concept of a surname is rather oriented to western cultures :-)
Interestingly enough they didn't consider sortability too important as
you can do it all with searches.


George Ferguson (ferguson@cs.rochester.edu) writes:

| I know, nobody knows that I'm even on this list. 

This list is full of surprises :-)

| BibTeX, which allows the following possibilities, that can easily be parsed:
| 
|         First von Last
|         von Last, First
|         von Last, Jr, First
| 
| where "First", "Last", and "Jr" and possibily empty sets of tokens and
| "von" is a possibly empty set of uncapitalized tokens. "Last" is
| nonempty if any of them are. So
|         Charles Louis Xavier Joseph de la Vallee Poussin
| has four tokens in the First part, two in the von, two in the last and
| an empty Jr.
| 
| The BibTeX supplementary docs state:
|     You may always use the first form; you shouldn't if there's a
|     "Jr" part, or the "Last" part has multiple tokens but there's
|     no "von" part.
| 
| So either wording corresponding to these observations belongs in the
| document, or input needs to be restricted to the third one, above,
| since of course these things need to be sortable by name.

Now this I like a lot; it seems well thought out, addresses fancy
names, still looks sensible in format 1, and can be parsed into the
orther formats. Sally, would this address your requirement? George,
can you give me some reference to those BibTeX docs?

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html