[precis] toLower() vs. toCaseFold()

Peter Saint-Andre <stpeter@stpeter.im> Wed, 04 May 2016 22:43 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D646612D913 for <precis@ietfa.amsl.com>; Wed, 4 May 2016 15:43:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.898
X-Spam-Level:
X-Spam-Status: No, score=-2.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.996, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SnhiWANkIcyl for <precis@ietfa.amsl.com>; Wed, 4 May 2016 15:43:06 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 3623612D673 for <precis@ietf.org>; Wed, 4 May 2016 15:43:06 -0700 (PDT)
Received: from aither.local (unknown [73.34.202.214]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 8C082E8241; Wed, 4 May 2016 16:52:44 -0600 (MDT)
To: "precis@ietf.org" <precis@ietf.org>
From: Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <572A7AF9.3050903@stpeter.im>
Date: Wed, 04 May 2016 16:43:05 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/T9FJw0Z8CXTPKpcqlyqtGt1nHsM>
Subject: [precis] toLower() vs. toCaseFold()
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2016 22:43:08 -0000

In a previous thread, we (mostly John) discussed the relative 
desirability of using toLower() vs. toCaseFold(); see for instance:

https://www.ietf.org/mail-archive/web/precis/current/msg01158.html
https://www.ietf.org/mail-archive/web/precis/current/msg01159.html

I suggested that we add some text about this to 7564bis. Here is a 
proposed paragraph for insertion in §5.2.3 ("Case-Mapping Rule"):

    The Unicode toCaseFold() operation defined by the Unicode Default
    Case Folding algorithm is most appropriate when an application needs
    to compare two strings.  When an application merely wishes to convert
    uppercase and titlecase code points to the lowercase equivalents
    while preserving lowercase code points, the Unicode toLower()
    operation is more appropriate and is less likely to violate the
    "Principle of Least Astonishment".  Therefore, application developers
    are advised to carefully consider whether they truly need to use the
    toCaseFold() operation in a given situation, or whether the toLower()
    operation would be more appropriate than the toCaseFold() operation.

Suggestions for improvement are welcome, especially from John. (E.g., we 
might want to more explicitly call out comparison vs. other contexts in 
the normative text elsewhere in §5.2.3).

Thanks,

Peter