Re: [precis] toLower() vs. toCaseFold()

Martin J. Dürst <duerst@it.aoyama.ac.jp> Fri, 06 May 2016 06:54 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EE3012D7CD for <precis@ietfa.amsl.com>; Thu, 5 May 2016 23:54:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Ud9Yr7uzotZ for <precis@ietfa.amsl.com>; Thu, 5 May 2016 23:54:08 -0700 (PDT)
Received: from JPN01-TY1-obe.outbound.protection.outlook.com (mail-ty1jpn01on0123.outbound.protection.outlook.com [104.47.93.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4C49D12D517 for <precis@ietf.org>; Thu, 5 May 2016 23:54:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector1-it-aoyama-ac-jp; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=BA1cbm4yRMmTT0t6CUAm3U6U3ECue0sOJgLKxHZ09NI=; b=FqEZUV7e5wei8z1wMAaSxD51h/hTfxW+KmFKXw3NAVk6ruGApk9ELHFbwrkveAZnJdwGSbn3J6GVLb2hiPDcwQ+9UGgvjttVy6o+o1COPDjfXXJU0zM/ZX5RQ4wfgukZYFCrkvzn23eRpvKcs8BnhzndS6aAk8SQY7OhCDnX6hY=
Authentication-Results: ietf.org; dkim=none (message not signed) header.d=none;ietf.org; dmarc=none action=none header.from=it.aoyama.ac.jp;
Received: from [133.2.210.64] (133.2.210.64) by TYXPR01MB0928.jpnprd01.prod.outlook.com (10.168.45.23) with Microsoft SMTP Server (TLS) id 15.1.485.9; Fri, 6 May 2016 06:54:02 +0000
To: Peter Saint-Andre <stpeter@stpeter.im>, "precis@ietf.org" <precis@ietf.org>
References: <572A7AF9.3050903@stpeter.im>
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
Message-ID: <bdb9e334-ec43-4bb1-16fd-0f2264018414@it.aoyama.ac.jp>
Date: Fri, 06 May 2016 15:54:02 +0900
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <572A7AF9.3050903@stpeter.im>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [133.2.210.64]
X-ClientProxiedBy: TY1PR0201CA0020.apcprd02.prod.outlook.com (10.164.90.158) To TYXPR01MB0928.jpnprd01.prod.outlook.com (10.168.45.23)
X-MS-Office365-Filtering-Correlation-Id: 689c739a-7ed4-426f-3510-08d3757b35b6
X-Microsoft-Exchange-Diagnostics: 1; TYXPR01MB0928; 2:Znfjctkd3HneAhz7LGHeltYBGTOzZXKhg+DrI79G2IAE97a4XE8EC3XGiLwz6ktygr1iviairo4gAWp6ssBHH98LWgzHTUxdQFfrovNKq9NgnU5UztnbUz0h/ZSwWdhd96fwW3sDl+ochMZVVmYgYj6QNR8VS4YoiRXzRCnYv9zlSQ3fZ6i3TX06dfEK+IJb; 3:7JgX2rYjalIMYWR4WHtGE5Jgyn9YfsQUYuaYbf8nm2qYb9YYgykX7Lthy09Y1Ey5ukNWgK69yEFBGkFHeQ6ZLR6MUVT+U15Hrh52mmyjyA7zb+n5MmtqqADpGK3PrNXC; 25:xzgxMD7tGYBzCW56SXv8RWSKQqvm12nrzNMhWVNM0B0qeAeBDJqVzAsU/W7uw6uoEqmonUGWANxWMq3arazIMq3ch3bSKmaB88LGYm2rwmQRp5EUVdP5vHsKqkYjawDFZ9d6w+PqPhfUqOofYWkA/k6dOFolb9LsOG7mEZvkcCdOoTluD6VrKWrEAdHCR44WOAPDOjLFlgIXh89+KEL0LUiFqwU/AR9y9krixy3MGdWeVlt3415Gd+dcrhgsLo35jqbsbHjtrGj1yjR1xyxK0yrWrNdwr1epZ7iNjtpz8Txq6AGKWBumZuQqAfD7097LsRQ3y8CpgMfmlDmjZlQGJprig2+T6KdltWdSumWgpVESOJYbcytqKN3MnehZTH8cRTnGRyIkCVkTWHGpT6rCnwNiA/kdK/9q80JZdC8LV+FyQqFHC9f+5f/FAQJmoL50
X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:TYXPR01MB0928;
X-Microsoft-Antispam-PRVS: <TYXPR01MB09283985DC0EA17EF1B8EF0ACA7D0@TYXPR01MB0928.jpnprd01.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:;
X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040130)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041072)(6042046)(6043046); SRVR:TYXPR01MB0928; BCL:0; PCL:0; RULEID:; SRVR:TYXPR01MB0928;
X-Microsoft-Exchange-Diagnostics: 1; TYXPR01MB0928; 4:si4B9NQM6X4SRZurxZK71qeLzxy87nRKc6OmG+edrvELHBL/JfIladYb9/R6X5fdDn4KvpfAXAwDu3cl78LS7GKYVxuFJ/Stk9btBLtq7GW75bgJ5Hy4c50TnWXSDTAkWZBnjOyRvKBeKXBpZ5cw2qZ3kgCeob5XnS15d0GatWpxAk91RRGo3UtNAjztV6OO6DNY9mj7Eg4bn1C9QTCwmyydtLl/A9HEvif2FMEgGgqY5RrBbBE2vgdjHzXmeKwau+7WV8XNP7C0RQn/+1EvvlepgOYr10DTugxcM2q2H0iBmrUqMEEv0xKJSmaQI83WzTPZKC04rPEcDAsc5p1h7+/IaJ60PtArqJsCHkm1yWOHn0VdUBQMEVGOfQLxr5hgLSJ6aXQfoVJCMaEkAFKKKMBj12J6qird7VrsxbHy9T65F3eJtGWCrpLSStRB04DU
X-Forefront-PRVS: 09347618C4
X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(4630300001)(6009001)(6049001)(24454002)(86362001)(74482002)(6116002)(586003)(5004730100002)(5008740100001)(3846002)(64126003)(23676002)(50986999)(54356999)(42186005)(31696002)(33646002)(31686004)(50466002)(47776003)(2870700001)(66066001)(107886002)(2906002)(2501003)(83506001)(81166005)(77096005)(5001770100001)(92566002)(189998001)(2950100001)(76176999)(4001350100001)(65826006)(3940600001); DIR:OUT; SFP:1102; SCL:1; SRVR:TYXPR01MB0928; H:[133.2.210.64]; FPR:; SPF:None; MLV:sfv; LANG:en;
X-Microsoft-Exchange-Diagnostics: 1;TYXPR01MB0928;23:MmrNd1Iw78YSZaP/IjFgS6C5eUZmuvhrjW4tbG2iEfpEC62fpNzLeX12u1LPDLjSHimLLQXrd1Vo6h2M6dq6WY9Lorh0gp26vtb2LA2eSv5PlQvXixVU6KnUjUG+9AliXreC9x5y95WEYzXdNyOL87FHHAAa09FVvzvkElYMZUR7JN0AM0KWIH1d4425oDpD7QPUFRjTPc4x3eEFlTn9gLYfKqBBbSaTLzjFL3+P8/NDAaL72ofz1/kesgoPWFJWrb144HIuEM64XxLyp6ZVNsClJmBk+zaeQvncL2lhA2X2uA3XmfdbxkiNTrME3lPoiDReNA9s7l7zDm0AhrMf2vEVQDbRMwCJdK8hGmlfxehVNTzXmzva/Jd2mQh9Avxvse3c97V9k1q6FuiEhk+otY4vThwp+CAWosYQwYOi7qOnjHzB+yrDbRfJheGttGx9dcR/Mh4mfwGb7xMyl2YcSJUjQU9ybAQa73UZd3yzP3PC+o6w13fF0ZGf7r8ekzbkqOAjZ4c+5L9uTV2LzqfIgK8gH+j8SWXeCM709b/ESJ2UQ6hlKrHSaMACgceR4AwvlFNP38FMbIgAdEZsvmJ3N2nYzsaF80gqBjK41TNWNKEeP7S7YKdbktq8zusBijRcFFTWeJZt8EEMZIEcxrYcx+xv+ABMXTnYWFrf7zhZS0r0iqHGFWL5IQVzRdlNi9KsO6FlY5hR3sO8R4FmY5gYH3pF3f6fBVKAipwweAue+XnrtSn0sxiHyU4KEUu1MQ1Xh8inQL2ZvdrN6gRjvYZW9WbebURXZDXeLHtEEkePO/8/gQu/DrJKFYoBvYzvrgFO8iV7sHShon5sKeQy2n3Wr8aqo5T3OUvuh/m2TxYXWXIPrUZ9niRnzOuVWqtt9FS0+diHeR4EMcAqFo6IZ6Z8RHr/JH6b2erN+oLoeW/3ojL5NNzaBL1WF3ahC1O2EYwMYzLIfobx2fjV1NHAj58aNHS1HeJSjegelfT3yI5YzTQLxy9jy6JXCiyvuWVbRFKM
X-Microsoft-Exchange-Diagnostics: 1; TYXPR01MB0928; 5:ghzx7XNzonHyYVr6X8vonCSMNCP1XV0oviD2BWBlq22Ktl0+DS2FBQy12AWSHXjjg4Q5s8q2j5ThxG+P/ZvLIVLcAs63j5DoeGk4K6JOTDp5RQfuXsyrwXBCGLdqXj8V+9cvDCXwDd/yhtCjG3ekaA==; 24:sKHSNvjirO7pWVZIBGlY29CRxC/fQ7Wve9AB5rIJlEgqwyvtnoXwp8x8KA/i764LRLcVL3aa9QopsF9kj7kJmVeXjGQ5havPgN0eSQOHprk=; 7:7+BZWUl7UDxrT99jz7A1Rrtm5ozDfz4vtKQePDav0mY0hfS2YwHLAXPDqZXXZ18IpvUtDB4jOKMEAgkYpHE7y95J4AoX6mpfUWd03E0wKvNZJ+eKajl1rYy6y0t9WrbLhmmlPiARvYWs9S2F57A9TyBdEI/WboUqGN30ZvItL1H8x8E96BkFgx+PJ+8eY4CK
SpamDiagnosticOutput: 1:23
SpamDiagnosticMetadata: NSPM
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 May 2016 06:54:02.9901 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYXPR01MB0928
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/uTzRfCN8w9LIe8bFVcJ-1Z8D_0Q>
Subject: Re: [precis] toLower() vs. toCaseFold()
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2016 06:54:11 -0000

Hello Peter,

On 2016/05/05 07:43, Peter Saint-Andre wrote:

> I suggested that we add some text about this to 7564bis. Here is a
> proposed paragraph for insertion in §5.2.3 ("Case-Mapping Rule"):
>
>    The Unicode toCaseFold() operation defined by the Unicode Default
>    Case Folding algorithm is most appropriate when an application needs
>    to compare two strings.  When an application merely wishes to convert
>    uppercase and titlecase code points to the lowercase equivalents
>    while preserving lowercase code points, the Unicode toLower()
>    operation is more appropriate and is less likely to violate the
>    "Principle of Least Astonishment".  Therefore, application developers
>    are advised to carefully consider whether they truly need to use the
>    toCaseFold() operation in a given situation, or whether the toLower()
>    operation would be more appropriate than the toCaseFold() operation.
>
> Suggestions for improvement are welcome, especially from John. (E.g., we
> might want to more explicitly call out comparison vs. other contexts in
> the normative text elsewhere in §5.2.3).

I think 'compare' should be changed to 'search'. That's the prototypical 
use case for CaseFold.

Also, the language in the "Therefore" sentence is somewhat convoluted. 
It's unclear which alternative this text prefers. I suggest that if we 
want to put the two alternatives on an equal footing (i.e. make sure the 
application designer thinks carefully), then a more parallel sentence 
structure, avoiding words such as "carefully", "truly", and "would", 
would be more appropriate. What about:

                                        Therefore, application developers
    are advised to carefully consider whether toCaseFold() or
    toLower() is more appropriate.

Regards,   Martin.