Re: [precis] local case mapping

Peter Saint-Andre <stpeter@stpeter.im> Tue, 08 October 2013 19:55 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9271821F9E6D for <precis@ietfa.amsl.com>; Tue, 8 Oct 2013 12:55:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YicXm6kyvBR9 for <precis@ietfa.amsl.com>; Tue, 8 Oct 2013 12:55:35 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 1154521F9D69 for <precis@ietf.org>; Tue, 8 Oct 2013 12:55:35 -0700 (PDT)
Received: from sjc-vpn5-1390.cisco.com (unknown [128.107.239.233]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 7B254414D9; Tue, 8 Oct 2013 14:01:26 -0600 (MDT)
Message-ID: <5254632F.6060106@stpeter.im>
Date: Tue, 08 Oct 2013 13:55:27 -0600
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Andrew Sullivan <ajs@anvilwalrusden.com>
References: <5227A979.7050403@stpeter.im> <E0DDC70E-DF8C-4163-8ED5-4ADA115DDB72@kmd.keio.ac.jp> <4C8248EF-51BD-4736-A930-E2FEE610EC03@kmd.keio.ac.jp> <20131005031746.GC38902@mx1.yitter.info>
In-Reply-To: <20131005031746.GC38902@mx1.yitter.info>
X-Enigmail-Version: 1.5.2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Cc: precis@ietf.org
Subject: Re: [precis] local case mapping
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Oct 2013 19:55:42 -0000

Hi Andrew, thanks for helping to move things forward.

On 10/4/13 9:17 PM, Andrew Sullivan wrote:
> Dear colleagues,
> 
> I reviewed draft-ietf-precis-mappings-03 today, at long last.  I
> apologise for being so late.

No worries. Better that we get it right than that we finish it quickly.
Or maybe that's just a convenient excuse. :-)

> I put together a number of incoherent questions about the case folding
> stuff, but fortunately I re-read the mailing list archives on this
> topic before posting a long message.  I agree with Peter: I find this
> section of the document very confusing, and I think it may be wrong.
> In particular …
> 
> On Thu, Sep 19, 2013 at 04:39:12PM +0900, Takahiro Nemoto wrote:
> 
>> Considering the maintenance and preservation of the document, 
>> leaving it the way it is now is not a bad idea.
> 
> …I am pretty sure it shouldn't be left the way it is.
> 
> It seems to me that our principle generally needs to be that Unicode
> is the thing we use, and if Unicode is broken it's Not Our Problem.
> So we should figure out how to say, "Do the Unicode-y right thing
> here," and then put that in.  I especially don't want to get into
> specifying special language-specific tables ourselves: we don't have
> the expertise, I think.
> 
> I can't think of any better suggested text than what Peter already
> sent, so I think that's the right direction.  In the unlikely event
> something clearer comes to me in the night, I promise to write it
> down.

I suggested text to clear up the first paragraph. However, my message
merely asked the key question, but left it unanswered: what are we
trying to accomplish here?

As I noted, Appendix B.1 simply matches the Language-Sensitive Mappings
from the SpecialCasing.txt file in the Unicode Character Database. If
that's *all* we're trying to accomplish, then we could simply say "apply
the Language-Sensitive Mappings in SpecialCasing.txt".

However, I get the sense that we're actually trying to accomplish more,
e.g., applying at least the context-sensitive mapping for Greek final
sigma -- in my example, a nickname of "ΦΙΛΟΣ ΜΟΙ" would be case folded
to "φιλος μοι" (with a Greek final sigma, which is correct in Greek) and
not to "φιλοσ μοι" (with a Greek medial sigma, which is incorrect in Greek).

It's also not clear to me if we have a position on full case folding vs.
simple case folding (e.g., ẞ = U+1E9E to "ss" instead of "ß" = U+00DF).
It seems to me that we might want to suggest a consistent approach here
so that we have improved interoperability.

So IMHO one approach would be:

1. Apply the language-sensitive mappings from SpecialCasing.txt
2. Apply the context-sensitive (i.e., "language-insensitive") mappings
from SpecialCasing.txt

I'm still not sure what to do about about full vs. simple case mapping,
but I see no strong reason to prefer simple case mapping because I don't
see a problem with our algorithm resulting in two characters (e.g.,
"ss") instead of one.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/