Re: [precis] Emoji, Names and Normalisation

John C Klensin <john-ietf@jck.com> Sat, 18 March 2017 20:08 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA0BE12709D for <precis@ietfa.amsl.com>; Sat, 18 Mar 2017 13:08:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nOZl9cBS-1MK for <precis@ietfa.amsl.com>; Sat, 18 Mar 2017 13:08:06 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 35F2312942F for <precis@ietf.org>; Sat, 18 Mar 2017 13:08:06 -0700 (PDT)
Received: from [198.252.137.70] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1cpKdv-000EEE-Cd; Sat, 18 Mar 2017 16:08:03 -0400
Date: Sat, 18 Mar 2017 16:07:54 -0400
From: John C Klensin <john-ietf@jck.com>
To: Daniel Oaks <daniel@danieloaks.net>
cc: precis@ietf.org
Message-ID: <DEC1FB3BF31BFA0B97598F2E@PSB>
In-Reply-To: <CALmuJGcQg_dRuciG-aNs9z055eHZTP1qN8d1t2WMEP+dOX5Grw@mail.gmail.com>
References: <CALmuJGcQg_dRuciG-aNs9z055eHZTP1qN8d1t2WMEP+dOX5Grw@mail.gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.70
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/S1tKV-5NVie0PeYF-c5tCOxoloM>
Subject: Re: [precis] Emoji, Names and Normalisation
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Mar 2017 20:08:09 -0000

Daniel,

(top post)

Two very quick comments/ suggestions:

(1) You could do a lot worse than using a PRECIS IdentifierClass
profile, if only because it would give you consistency with
other applications based on IETF specs. Regardless of the number
of things users and designers "want" (often including a
plentiful supply of ponies), one thing I've learned from far
more years of experience with users and user interfaces is that,
in the last analysis, the dominant desire is for consistency and
predictability.  It also turns that that consistency and
predictability makes things more secure, so that is a win across
the board.   

Where possible, I also prefer to avoid case folding in
identifiers.  Yes, users "want" it and we generally assume that
it is important, but remember that Unix and closely-related
systems don't do that and most people stopped complaining years
ago.  Avoiding it gets more important with non-ASCII strings
because there are some language and/or locality issues that
lead, for a small number of edge cases, to behavior that users
who haven't studied and accepted the rules think are
inconsistent with their expectations.   If you must do case
conversions, UsernameCaseMapped is at least as good as anything
else you might choose and has just about the same consistency
properties as other standardized IdentifierClass profiles.

(2) As much as people want them, emoji are really not suitable
for use in identifiers.   Maybe they will be some years from now
but, at present, the names assigned by Unicode are not generally
accepted and used, the icons used for a given one are
significantly inconsistent across systems, there is no agreement
about matching rules or normalization (especially in the context
of modifiers), and so on.  If you (or the IETF) were to invent
emoji normalization, someone else (such as the Unicode
Consortium, but they aren't the only candidate) may come along,
create their own version, and _really_ confuse your users.
FWIW, the Unicode Consortium seems to agree about unsuitability
in identifiers -- some issues with domain names notwithstanding,
their identifier recommendations (UAX #31: Unicode Identifier
and Pattern Syntax) do not allow emoji in identifiers.

This topic has been discussed extensively during the last few
weeks on the IDNA-Update mailing list (archives at
http://www.alvestrand.no/pipermail/idna-update/, at least for
now).  While parts of the discussion are specific to domain
names, you can learn a lot more there if you are curious.

    john




--On Friday, March 17, 2017 11:52 +1000 Daniel Oaks
<daniel@danieloaks.net> wrote:

> Hey everyone,
> 
> I do work with the IRC chat protocol. Specifically, right now
> I'm doing work around allowing proper Unicode support, and
> writing the casefolding specs that would required to allow
> that.
> 
> My current solution is based on PRECIS, but I'm running into
> an issue and not exactly sure how to solve it.
> 
> Essentially, we need to casefold 'nicknames' (usernames that
> clients are referred to by), and for 'channel names' (chat
> room names). It would be much preferred to use an
> *IdentifierClass* profile. Using a single profile for both
> name types is also much preferred for reasons of implementation
> simplicity and for other protocol reasons (while we can have
> different ones for both if it's necessary, sticking to a
> single one would be much preferred).
> 
> The only real profile out there which matches that description
> right now is UsernameCaseMapped, which while does everything
> we want to for nicknames, disallows emoji in channel names
> (which some services have already knowingly allowed).
> 
> I haven't dived deep into Unicode and normalisation, but would
> there be a way for an *IdentifierClass* profile to allow and
> appropriately normalise emoji? If so, would the best thing for
> us to do here be to actually create our own profile for IRC
> (channel) names? I'm wary of doing so seeing the advice
> against profile proliferation here
> <https://tools.ietf.org/html/rfc7564#section-5.1>, but given
> the restriction it's difficult for us to adopt an
> *IdentifierClass* profile for this without creating our own.
> 
> Any advice on what we should do here would be much
> appreciated. Thanks for the work you've all done so far!
> 
> Regards,
> Daniel Oakley