Re: [precis] names and usernames

Peter Saint-Andre <> Mon, 27 February 2017 00:21 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3DD261299C2 for <>; Sun, 26 Feb 2017 16:21:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key) header.b=WkEv49TG; dkim=pass (1024-bit key) header.b=Bu1zX16o
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id lEIAqAH2LWzL for <>; Sun, 26 Feb 2017 16:21:44 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6FE931299C1 for <>; Sun, 26 Feb 2017 16:21:44 -0800 (PST)
Received: from compute2.internal (compute2.nyi.internal []) by mailnew.nyi.internal (Postfix) with ESMTP id D9FD49E53; Sun, 26 Feb 2017 19:21:43 -0500 (EST)
Received: from frontend1 ([]) by compute2.internal (MEProxy); Sun, 26 Feb 2017 19:21:43 -0500
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=AaUQhtI85mb0EQU P8+Co3rNKZ6k=; b=WkEv49TGg/DHW76nA+Ro/kwmQ8X6Mlm2VDTqoJf22Ue7LiO I++BwHojeGzLAvgtyvXiZx+usF+wxytLI20j3ntuET6XvpQ5ecdwtmLeakxA2jzV J8RAvmp31Aumn8oJGuE2TaU/pz8BO7jxVqZu/uPZ5kCjBKvU0icGkEpZ0fM4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s= smtpout; bh=AaUQhtI85mb0EQUP8+Co3rNKZ6k=; b=Bu1zX16on6F2PtTfnp9R hL196U9OjlI9d5AoUedKE+RLrnQpZlrqXYLqZ45TVuOKo9em8zhR9Z88b9VfN7JR r2vxuD8oCsOcRZQJiIZJW8YNb6QUjG8Ok2+emxZzUGuxB+MeXQxYiUhaLAg7QWEI hJ8Py7JJFh+y0I01kZtfFh0=
X-ME-Sender: <xms:F3GzWFt0IMF5Cp6pUyc8tyDEksJTiUlC7l9K9w3t62IZ9q5dqEFQJA>
X-Sasl-enc: lb+n+cXywX5NtjB8YEMlGjHMIsofcDz/EQ3pQF4QSofq 1488154903
Received: from aither.local (unknown []) by (Postfix) with ESMTPA id 48D107E15C; Sun, 26 Feb 2017 19:21:43 -0500 (EST)
To: John C Klensin <>,
References: <> <2F562E0E75615D28FB8474A8@PSB> <>
From: Peter Saint-Andre <>
Message-ID: <>
Date: Sun, 26 Feb 2017 17:21:42 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [precis] names and usernames
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 27 Feb 2017 00:21:45 -0000

I'm still waiting to hear from experts with knowledge of Indic and
eastern Arabic scripts. However, I have been corresponding offlist with
someone who has knowledge of the second issue I raised...

>>> Second, apparently some Chinese family names 

According to my correspondent, the challenge is representation of some
given names, not family names (e.g., legislation in Taiwan stipulates
that a given name can include any character that has ever appeared in a
dictionary, even dictionaries published hundreds of years ago).

>>> are typically
>>> written (especially outside the People's Republic of China)
>>> using characters that the Unicode Consortium assigns to
>>> non-BMP code points 

John, forgive my ignorance, but it seems to me that the plane is
irrelevant here: in PRECIS we base decisions on code point properties.
Thus, for instance, any code point whose Unicode general category is
"Lo" (other letter) is allowed in the PRECIS IdentifierClass (per
Section 9.1 of RFC 7564), regardless of the plane. As an example, a code
point like U+2F804 (CJK COMPATIBILITY IDEOGRAPH-2F804) would be allowed,
even though it is in the Supplementary Ideographic Plane.

>>> or assigns in the BMP but as
>>> compatibility decomposable characters (and thus disallowed by
>>> RFC 7564 in the IdentifierClass).

My correspondent said it should be fine to disallow compatibility
decomposable characters such as U+328A (CIRCLED IDEOGRAPH MOON) because
according to him they would not be used in given or family names.

All of this is second-hand, so take it with a grain of salt.