Re: [precis] toLower() vs. toCaseFold()

Peter Saint-Andre <stpeter@stpeter.im> Sat, 03 September 2016 22:53 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E45E412B12D for <precis@ietfa.amsl.com>; Sat, 3 Sep 2016 15:53:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.51
X-Spam-Level:
X-Spam-Status: No, score=-1.51 tagged_above=-999 required=5 tests=[RP_MATCHES_RCVD=-1.508, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R9Exipg9WAO3 for <precis@ietfa.amsl.com>; Sat, 3 Sep 2016 15:53:45 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 2B46F12B12C for <precis@ietf.org>; Sat, 3 Sep 2016 15:53:45 -0700 (PDT)
Received: from aither.local (unknown [73.34.202.214]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 3E470F0793; Sat, 3 Sep 2016 16:53:52 -0600 (MDT)
To: John C Klensin <john-ietf@jck.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, precis@ietf.org
References: <6F0075DBF071EB43A3F97F73@JcK-HP8200.jck.com>
From: Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <a77ec70f-ad2d-4772-d76f-c98d89e5da14@stpeter.im>
Date: Sat, 03 Sep 2016 16:53:44 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <6F0075DBF071EB43A3F97F73@JcK-HP8200.jck.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/IDg2VPznWmUC84jPypFL1QENdRo>
Subject: Re: [precis] toLower() vs. toCaseFold()
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Sep 2016 22:53:47 -0000

On 5/6/16 8:40 PM, John C Klensin wrote:
> (sorry... earlier copy sent from wrong address)

I'm sorry that it has taken me 4 months to reply to this message! :(

> --On Friday, May 06, 2016 15:54 +0900 "Martin J. Dürst"
> <duerst@it.aoyama.ac.jp> wrote:
>
>> Hello Peter,
>>
>> On 2016/05/05 07:43, Peter Saint-Andre wrote:
>>
>>> I suggested that we add some text about this to 7564bis. Here
>>> is a proposed paragraph for insertion in §5.2.3
>>> ("Case-Mapping Rule"):
>>>
>>>    The Unicode toCaseFold() operation defined by the Unicode
>>>    Default Case Folding algorithm is most appropriate when an
>>>    application needs to compare two strings.  When an
>>>    application merely wishes to convert uppercase and
>>>    titlecase code points to the lowercase equivalents while
>>>    preserving lowercase code points, the Unicode toLower()
>>>    operation is more appropriate and is less likely to
>>>    violate the "Principle of Least Astonishment".  Therefore,
>>>    application developers are advised to carefully consider
>>>    whether they truly need to use the toCaseFold() operation
>>>    in a given situation, or whether the toLower() operation
>>>    would be more appropriate than the toCaseFold() operation.
>>>
>>> Suggestions for improvement are welcome, especially from
>>> John. (E.g., we might want to more explicitly call out
>>> comparison vs. other contexts in the normative text elsewhere
>>> in §5.2.3).
>>
>> I think 'compare' should be changed to 'search'. That's the
>> prototypical use case for CaseFold.
>
> Hmm.  If we have to choose, I think I prefer "compare".  I just
> looked at the subsections on "Default Case Folding" and "Default
> Caseless Matching" in Section 3.13 of TUS 8.0 and it says a lot
> about comparison and nothing about search.   Recommended
> compromise:  Make the relevant sentence fragment read "most
> appropriate when an application needs to compare two strings
> such as in search operations."

+1

> I'd still prefer to denounce toCaseFold completely, especially
> where identifiers are concerned.  It just has far too much
> potential for being destructive and creating false results
> (either positive or negative) when the language context is
> unknown.  People/designers/implementers who are not prepared to
> understand those issues and their implications should really not
> be using the thing.
>
>> Also, the language in the "Therefore" sentence is somewhat
>> convoluted. It's unclear which alternative this text prefers.
>> I suggest that if we want to put the two alternatives on an
>> equal footing (i.e. make sure the application designer thinks
>> carefully), then a more parallel sentence structure, avoiding
>> words such as "carefully", "truly", and "would", would be more
>> appropriate. What about:
>>
>>                                         Therefore, application
>> developers
>>     are advised to carefully consider whether toCaseFold() or
>>     toLower() is more appropriate.
>
> For the reasons above, I'm not sure that an even footing is
> appropriate.  I'd rather have the guidance be closer to "use
> toLowerCase, which your users are likely to understand, unless
> you need CaseFolding for some particular reason and understand
> its implications"

That is indeed more consistent with the concerns you have expressed 
within the working group.

Going in that direction will require some adjustments to 7613bis and 
7700bis, but the changes should be straightforward. I will do that 
before publishing the -02 versions (hopefully today).

Peter