[precis] HasCompat()

Peter Saint-Andre <stpeter@stpeter.im> Sun, 12 February 2017 21:48 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D0522129B63 for <precis@ietfa.amsl.com>; Sun, 12 Feb 2017 13:48:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=stpeter.im header.b=tFdHbJID; dkim=pass (1024-bit key) header.d=messagingengine.com header.b=CfS+aAVu
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wAX4iuUnOP53 for <precis@ietfa.amsl.com>; Sun, 12 Feb 2017 13:48:15 -0800 (PST)
Received: from new1-smtp.messagingengine.com (new1-smtp.messagingengine.com [66.111.4.221]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C3DE6129413 for <precis@ietf.org>; Sun, 12 Feb 2017 13:48:15 -0800 (PST)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailnew.nyi.internal (Postfix) with ESMTP id 1B9D11958; Sun, 12 Feb 2017 16:48:15 -0500 (EST)
Received: from frontend1 ([10.202.2.160]) by compute2.internal (MEProxy); Sun, 12 Feb 2017 16:48:15 -0500
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=stpeter.im; h= content-transfer-encoding:content-type:date:from:message-id :mime-version:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=dig2/dmYuQqscmijPYWnjhO+KEQ=; b=tFdHbJ IDCLRCPG6ppYoPecnSZwR0aRb+/SkuUwqSIcO3kihAx+/nQel1x5lDxKq2qMokHc tLVLAsPpjkqNTNPUt/cb2jmAyg5sqfBQADDWzAl5pfbXeaFkuFwvatC0//etaLoZ jMlO7KOmLxt27nwtZhDec2GOcIkTUjh2na6ps=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=smtpout; bh=dig2/dmYuQqscm ijPYWnjhO+KEQ=; b=CfS+aAVuq9rf8ZGOZlfc9QXOWrCpbHRnZjwiLueqah1VSy AoMB+k5wmPndON2G8HhndYDTV9CpkbL6Xxdf+qlwCPfQRJ5v49887EPXLFr4o/ya qsKWzoesx937w7I2YLAYNbWfXyqyd7uJ4skHMoB8oTv4dlFb6cCyc1i71kfIU=
X-ME-Sender: <xms:HtigWFIZJtcnc3BtNgpnKJIMWaTUQO8ko5QZ7CdnOI2Hk7RqQZyYbw>
X-Sasl-enc: r1T3fOl3pr8ruG8JE75gIsXjvMagkkUanmrJOVyqkabT 1486936094
Received: from aither.local (unknown [76.25.4.24]) by mail.messagingengine.com (Postfix) with ESMTPA id 8BC147E2E6; Sun, 12 Feb 2017 16:48:14 -0500 (EST)
To: "precis@ietf.org" <precis@ietf.org>
From: Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <3b1a3932-da73-2ccb-115f-f5da4964ad13@stpeter.im>
Date: Sun, 12 Feb 2017 14:48:13 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.7.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/0yBWgkssCaXiIQwXHaz7eFMWRDA>
Subject: [precis] HasCompat()
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Feb 2017 21:48:17 -0000

In an off-list conversation, John Klensin pointed out to me that there 
could be confusion about the definition of the HasCompat() category from 
Section 9.17 of RFC 7564 and of draft-ietf-precis-7564bis-04.

I can't speak for my co-author Marc Blanchet, but I've always considered 
HasCompat to apply in a "unidirectional" way to the input characters. 
For instance, if we have three code points P0, P1, and P2 such that 
NFKC(P1P2) = P0P0, then the HasCompat() category is assigned to P1 and 
P2 but not to P0. That is, P1 and P2 are decomposed and then recomposed 
in a lossy way because we can't tell from the output string P0P0 what 
the input string was, and there is way to determine all the characters 
that could be decomposed and recomposed into P0P0. It seems that the 
current text might be a bit confusing (as I understand what John wrote, 
the term "has a compatibility equivalent" could be taken to apply to P0 
in this example), so I will try to make it clearer.

Furthermore, John pointed out that the HasCompat() categorization for a 
given input string could potentially change across Unicode versions 
(e.g., if the input string includes a precomposed character that was 
added in a recent version of Unicode). Although I'm not sure if this is 
unavoidable, it does seem that we need to at least mention the potential 
instability of this category.

Peter