Re: [precis] names and usernames

Florian Zeitz <> Mon, 13 February 2017 03:15 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id EEDBC1294D1 for <>; Sun, 12 Feb 2017 19:15:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 6O88pBv_ZnLL for <>; Sun, 12 Feb 2017 19:15:48 -0800 (PST)
Received: from ( [IPv6:2a02:d40:3:1:10a1:5eff:fe52:509]) by (Postfix) with ESMTP id BA04B1295AC for <>; Sun, 12 Feb 2017 19:15:48 -0800 (PST)
Received: from [IPv6:2001:4dd7:a6be:0:39a4:2f97:dca8:a13f] ( [IPv6:2001:4dd7:a6be:0:39a4:2f97:dca8:a13f]) by (Postfix) with ESMTPSA id A3F861021AF5B for <>; Mon, 13 Feb 2017 04:15:47 +0100 (CET)
References: <>
From: Florian Zeitz <>
Message-ID: <>
Date: Mon, 13 Feb 2017 04:15:31 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [precis] names and usernames
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 13 Feb 2017 03:15:51 -0000

Am 13.02.2017 um 00:29 schrieb Peter Saint-Andre:
> John Klensin has brought to my attention that it is currently impossible
> to represent some people's names in PRECIS usernames because some of the
> relevant Unicode code points are disallowed by the IdentifierClass
> defined in RFC 7564 (and thus by the UsernameCaseMapped and
> UsernameCasePreserved profiles defined in RFC 7613).
> First, RFC 7564 disallows "default ignorable" code points in the
> IdentifierClass. However, as I understand it some of these code points
> are need to represent characters in names that might be desirable to
> people living within communities that use Indic script and eastern
> Arabic script (e.g., Persian and writing systems derived from Persian).
> In particular, the Unicode Standard specifies that ZWJ and ZWNJ are
> "default ignorable" and it seems that these code points are especially
> important in this context.
I'd have to look at it in more detail, but that assessment seems wrong
to me.
Algorithmically we check for JoinControl before
PrecisIgnorableProperties, making ZWJ and ZWNJ CONTEXTJ.
That allows them to occur after virama and where they break a cursive
connections. I'm not sure those are the only cases that John is
concerned about, but they are not generally disallowed as I understand it.

That said, I always found it a bit unsettling that it is virtually
impossible to determine the algorithmic result from the textual
description of what is and isn't allowed.

Florian Zeitz