[precis] Emoji, Names and Normalisation

Daniel Oaks <daniel@danieloaks.net> Fri, 17 March 2017 01:53 UTC

Return-Path: <daniel@danieloaks.net>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 33DED129BA0 for <precis@ietfa.amsl.com>; Thu, 16 Mar 2017 18:53:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.588
X-Spam-Status: No, score=-2.588 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=danieloaks-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id RupY_1YnUvTR for <precis@ietfa.amsl.com>; Thu, 16 Mar 2017 18:53:19 -0700 (PDT)
Received: from mail-qt0-x22f.google.com (mail-qt0-x22f.google.com [IPv6:2607:f8b0:400d:c0d::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 74DFC129BB3 for <precis@ietf.org>; Thu, 16 Mar 2017 18:53:19 -0700 (PDT)
Received: by mail-qt0-x22f.google.com with SMTP id i34so52903081qtc.0 for <precis@ietf.org>; Thu, 16 Mar 2017 18:53:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danieloaks-net.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=w6hEr8ptPAzIUU3HMZR1M9kxBNoJ/+ssaL9Yu5VOdRc=; b=tP1XSERnIQxAphCswGX+BbTp0anpREbK8Ac7+/hHWKxo6/yZmv+aG6kJI7TSC9Lhs9 9nEYmGm4Um/JC+JRDFYWjYFG/1zEvDBQY6lkNS3I/GbKG2T3nUPuAD0qFZ4H8vmwQ9+U QJUd5Pq05FkqIzAdn+Sv6L69LNfLcvtDDwj8HXavufk93PECex8aVRRHmbWGkRONCs6z 273CyE7eUo/6NCm+ZnCjggE6BIEkHPiKOKeVpmsD8639/s/Z6YNZsrWdM2SegQcCD8Vr Hujq2A+18K3/dWLtMVE64omAmZ1NqNVjfys4xIZL9mcT+FUBa0tf9l0Kr5D6udQMOgTq O6kA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=w6hEr8ptPAzIUU3HMZR1M9kxBNoJ/+ssaL9Yu5VOdRc=; b=GGjm7HKm6qG9rc1PWQq9nl4qkII7BJ+LJc/HVjEYly9WEVBeSYLC3NDip4mx3qiQ66 tzZrH2hVlZj3bxtOauO8F/Sm//MhKHLaP2Bbezb8iol8NCQJySPNwNc3iV68HPxvdO1z t19UsLcsM4/Yc7Bk5E4sPHoJuKU4n+9zadftl2hdpZYoCLNFfrtxpXuHmu7vKRJPxbv6 MFrxjbzMQsORiQj3IL4brBjklPQ+in3nmJ4WCxaIRIhmvxPyJdaRC5bhowQBK+0yYLbh wmBaFSqbLBMoXr1DgntHhYLaGUTPosnm3sNvJmECkw45aqKiY9HKqZZuSLIoGztIlQFN gxsQ==
X-Gm-Message-State: AFeK/H0R7WcYa6JpA7fv96xnaycnUjftDZKO7t9RNL1ZjOXDcGwpY+K46uv0oj9OoYeX4KuZHXgLW3EWB+goyQ==
X-Received: by with SMTP id e40mr11179022qtc.251.1489715598349; Thu, 16 Mar 2017 18:53:18 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Thu, 16 Mar 2017 18:52:57 -0700 (PDT)
From: Daniel Oaks <daniel@danieloaks.net>
Date: Fri, 17 Mar 2017 11:52:57 +1000
Message-ID: <CALmuJGcQg_dRuciG-aNs9z055eHZTP1qN8d1t2WMEP+dOX5Grw@mail.gmail.com>
To: precis@ietf.org
Content-Type: multipart/alternative; boundary=001a113977dc0e6da8054ae371b0
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/YmH4Gw-HzM2ljTI06JFtdk3ukAA>
Subject: [precis] Emoji, Names and Normalisation
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Mar 2017 01:53:21 -0000

Hey everyone,

I do work with the IRC chat protocol. Specifically, right now I'm doing
work around allowing proper Unicode support, and writing the casefolding
specs that would required to allow that.

My current solution is based on PRECIS, but I'm running into an issue and
not exactly sure how to solve it.

Essentially, we need to casefold 'nicknames' (usernames that clients are
referred to by), and for 'channel names' (chat room names). It would be
much preferred to use an *IdentifierClass* profile. Using a single profile
for both name types is also much preferred for reasons of implementation
simplicity and for other protocol reasons (while we can have different ones
for both if it's necessary, sticking to a single one would be much

The only real profile out there which matches that description right now is
UsernameCaseMapped, which while does everything we want to for nicknames,
disallows emoji in channel names (which some services have already
knowingly allowed).

I haven't dived deep into Unicode and normalisation, but would there be a
way for an *IdentifierClass* profile to allow and appropriately normalise
emoji? If so, would the best thing for us to do here be to actually create
our own profile for IRC (channel) names? I'm wary of doing so seeing the
advice against profile proliferation here
<https://tools.ietf.org/html/rfc7564#section-5.1>;, but given the
restriction it's difficult for us to adopt an *IdentifierClass* profile for
this without creating our own.

Any advice on what we should do here would be much appreciated. Thanks for
the work you've all done so far!

Daniel Oakley