[rfc-i] References

duerst at it.aoyama.ac.jp ( Martin J. Dürst ) Mon, 07 March 2016 10:27 UTC

From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=)
Date: Mon, 7 Mar 2016 19:27:53 +0900
Subject: [rfc-i] References
In-Reply-To: <98505FF6-DD0E-4884-8B70-B500C960C97D@vpnc.org>
References: <6FA9944B-DBB6-45E5-810B-0A3417B3ACB9@cisco.com> <59D97957-0B24-478A-8EA5-02AF6B023EB0@vpnc.org> <56CCF29F.8080200@rfc-editor.org> <98505FF6-DD0E-4884-8B70-B500C960C97D@vpnc.org>
Message-ID: <56DD57A9.9090006@it.aoyama.ac.jp>

Trying to tie some loose ends; if they are already tied up, please ignore.

On 2016/02/24 09:16, Paul Hoffman wrote:
> On 23 Feb 2016, at 16:00, Heather Flanagan (RFC Series Editor) wrote:
>
>> For your example, it would be pretty simple:
>>
>> P. Hoffman
>> ?x?mple Corp.
>>
>> See draft-iab-rfc-nonascii-00.txt, Section 3.2.
>> "  Person names may appear in several places within an RFC.  In all
>>  cases, valid Unicode is required.  For names that include characters
>>  outside of the Unicode Latin and Latin Extended script,

There's no such thing as "Latin Extended script". This should change to
e.g. "outside of the Unicode Latin script" or "outside of the Unicode 
Latin blocks".

>>  an author-
>>  provided, ASCII-only identifier is required to assist in search and
>>  indexing of the document."
>
> Good catch, but I gave a bad example. How would you propose that the
> display of organization names be for:
>
> <author initials="P." surname="Hoffman" fullname="Paul Hoffman">
> <organization ascii="Example Corp.">???? Corp.</organization>
> </author>
>
> Your text above says the ASCII-only identifier "is required", but Joe's
> top-level question is "how are these things rendered in the output
> formats?".

My understanding (from an usability/desirability point of view) would be 
as follows:

If the 'original' is in all-ASCII or all-Latin, then render just that.
If the 'original' contains non-Latin, then render that, followed by an 
ASCII/Latin fallback in parentheses.

In the 'author's address' section, potentially not only list the Latin 
fallback, but also the ASCII-only fallback.

If all/many items in a single location are in non-Latin/non-ASCII, then 
group them. If it's only individual items, don't repeat unnecessarily.

So for your example above, it should be something like (suitably 
right-alligned)
P. Hoffman
???? Corp. (Example Corp.)

Or if the actual Romanized name is indeed "?x?mple Corp.", then it 
should be:
P. Hoffman
???? Corp. (?x?mple Corp.)

For a case such as:
<author fullname="??" asciiFullname="English Name" asciiInitials="E." 
asciiSurname="Name"/>

it should be:
?? (E. Name)

and for this one:
<author initials="?." asciiInitials="N." surname="One" fullname="?o One" 
asciiFullname="No One"/>
it should be:
?. One
(and the ASCII should only appear internally in XML, or in the 
ASCII-only version (if that's still part of the plan)).

It gets more complicated if we have somebody who's name is let's say
??, but who writes their name as ?o ?ne in Latin (and No One in ASCII).

Such a situation may be rare, because people usually go all the way to 
ASCII when they Romanize their name. Situations where I can imagine the 
earliest needs are e.g. pinyin (the 'standard' way to write Chinese in 
Latin script these days), which uses some diacritics, or somebody from 
North Africa using Arabic script primarily and French accents on their 
Romanized name.

It may be that we don't have the necessary attributes to handle such a 
situation. It may also be that we need some logic to check whether a 
given string is Latin or not (NOT ASCII or not, which is trivial). But 
probably not; we may be able to use the presence/absence of some type of 
attributes.

Regards,   Martin.