Re: [iola-conversion-tool] Multi-part names not displaying properly on http://datatracker.ietf.org/wg/

Henrik Levkowetz <henrik@levkowetz.com> Thu, 01 March 2012 12:09 UTC

Message-ID: <4F4F6709.7010603@levkowetz.com>
Date: Thu, 01 Mar 2012 13:09:45 +0100
From: Henrik Levkowetz <henrik@levkowetz.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: Cindy Morgan <cmorgan@amsl.com>
References: <24BD5CE3-41A6-4964-A609-6C86D667662E@amsl.com>
In-Reply-To: <24BD5CE3-41A6-4964-A609-6C86D667662E@amsl.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Cc: iola-conversion-tool@ietf.org
Subject: Re: [iola-conversion-tool] Multi-part names not displaying properly on http://datatracker.ietf.org/wg/
Precedence: list

Hi,

I've entered this into the tracker as ticket #783

On 2012-02-29 18:00 Cindy Morgan said:
> > We have several WG chairs who have multi-part names.  While it looks like their entire names are being displayed on the WG charter pages, the  master WG list at http://datatracker.ietf.org/wg/ is cutting out the middle parts of those names.
> > 
> > Some examples: 
> > 
> > "Francois Faucheur" on http://datatracker.ietf.org/wg/
> > "Francois Le Faucheur" on http://datatracker.ietf.org/wg/cdni/charter/
> > 
> > "Jamal Salim" on http://datatracker.ietf.org/wg/
> > "Jamal Hadi Salim" on http://datatracker.ietf.org/wg/forces/charter/
> > 
> > "Gunter Velde" on http://datatracker.ietf.org/wg/
> > "Gunter Van de Velde" on http://datatracker.ietf.org/wg/opsec/charter/
> > 
> > (There may be--and probably are--more, but these are the ones that jump out at me on first glance.  Please let me know if you need a more thorough audit.)
Thanks for noticing this.

(The following has also been added to the ticket:)

A first fix has been applied, but refinement of the names in the database will be needed.

The situation is this:

The old model treated names as if all patterns fit an anglosaxon name pattern,
which lets you split a name into a first 'first', a middle 'middle' and a
last 'last'.  This works well for some names, but not so well for others.

As an example, take Spanish names (Mexican names are slightly different,
again).  For a comprehensive description, check
  http://en.wikipedia.org/wiki/Spanish_names .

Here's a simplified take: Spanish names have given names and a surname, where
the first given name is sometimes composed by two words ('Juan Pablo') -- it's
not a first and middle name, but the first name -- and a surname, which has
two parts, composed from the father's first surname and the mother's first
surname. If the name is shortened, for daily work or when addressed by surname
alone, for instance, the _first_ surname is used -- not the last: 
  "A man named José Antonio Gómez Iglesias would normally be addressed as Señor
   Gómez instead of Señor Iglesias." (from the article).
Many people in daily (email) correspondence uses only the patronymic surname
(something I became very aware of when working with our Yaco developer, "Emilio
A. Sánchez López", who uses almost, but not quite, consistently "Emilio A.
Sánchez" for his emails).

If we try to force this into the legacy fields, it comes out wrong one way or
another -- either the double names will always be used, if both are put into
the surname field, or only one will ever be recognized, if only the patronymic
surname is entered.

The new database starts out by not assuming that it knows best how a name should
be split, instead it has one utf-8 field and one ascii field for the preferred
presentation name, another ascii filed for a shortened name, and any number of
aliases for alternative forms of the name, maybe containing titles, honorifics,
or other variations like both surnames for someone with a Spanish name who has
a preferred presentation using only the patronymic surname.  It puts name
splitting, for where it may be needed, into code, where it can be updated and
refined.

Now, there is a lot of variations in preferences here, and the conversion from
the old database was maybe a bit too simplified, with the outcome that in a
number of cases the names will have to be adjusted, so that the preferred name
is indicated, and alternative forms are entered as aliases.

There are actually quite few places in the datatracker where we *need* to split
out the friendly name, the formal address surname, etc., but there are code
to do that, which clearly also need refinement.  But as long as our usage
doesn't normally need that, we should be OK with the preferred name in utf-8
and ascii, with programmatic extraction of parts.

Ole and I have previously discussed what I mention above, and have also touched
on the possibility that name splitting code may need a hints field (e.g.,
'Spanish', 'Arabic', etc.) -- that is a refinement we can add if it turns out
that it's needed to resolve name splitting properly.

Currently (after my first fix) the code which produces the page
(http://datatracker.ietf.org/wg/) combines first, middle and last, but it
should transition to using the preferred name, as soon as names with prefx and
suffix parts which should not normally be displayed have been modified to have
the forms with prefix/suffix as aliases, and the preferred display form entered
adjusted to work in this context.


Best regards,

	Henrik

[iola-conversion-tool] Multi-part names not displ… Cindy Morgan
Re: [iola-conversion-tool] Multi-part names not d… Henrik Levkowetz