Re: [precis] Emoji, Names and Normalisation

Daniel Oaks <daniel@danieloaks.net> Sun, 19 March 2017 10:26 UTC

Return-Path: <daniel@danieloaks.net>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 487DB120724 for <precis@ietfa.amsl.com>; Sun, 19 Mar 2017 03:26:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=danieloaks-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rfYgHgfrJD-0 for <precis@ietfa.amsl.com>; Sun, 19 Mar 2017 03:26:55 -0700 (PDT)
Received: from mail-qk0-x234.google.com (mail-qk0-x234.google.com [IPv6:2607:f8b0:400d:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5AC65126C7A for <precis@ietf.org>; Sun, 19 Mar 2017 03:26:53 -0700 (PDT)
Received: by mail-qk0-x234.google.com with SMTP id y76so92190603qkb.0 for <precis@ietf.org>; Sun, 19 Mar 2017 03:26:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danieloaks-net.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=GeiGa/yjLDtfQea57MCFX7QaPxVfl5GUqOlaQEyCAeQ=; b=azutvZLhCdSgEL0VENP2UMWW9BQU+d4gedEoTImbGLdjXWJJsWfityuicdGlgD4qFs VKO1z0WVyo9+rcv5zNR0xBD2NNPQBrxDrphBczJha2sHy+8JDFkOe/aGf1zTa6WumuTI 352yBCm6S5x7mb5fFAkKl01tpKv9fTTx/SHa9Zr2oNx90tJhGf9aK8HzbSz0bQaybZpp P9NbC4dWb4o5Tpkl6Mqy4WM3HWTQ14q9iZD7knvHTJ+TnJl5lG1x/yBFZSj9NvBXN1oS Wu0AdUnLhZJn1hkHrcbKmIW99lRvjZgf+X7xucYcbuEpba8hRkA4VSdox2m/v4yfnRei 37cg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=GeiGa/yjLDtfQea57MCFX7QaPxVfl5GUqOlaQEyCAeQ=; b=luJbGwgmmjUfZGoab7iZibOf1whDdatvP9uH9t4fxRDtRdtuzX6cTNX2N6hKan8JkR RyRfhQ7miWBc+wCyCLtgFapwvna2Su+K3Gx3g7uTw+Gm5L85IkLj7IRtn1b/LFi32ndK 7kioFYeGrKUwb1XNJc4inAkDjEfwBFXtr3dvvxyhtEELl9IAddbhgg/O0pv5ezf3jM4N EZ3iAVyTKcwRTrBl16mNqZdvRpmJkFhox2UEP78Jm3vOas5BPCVUJuzPAjJ2IV9ZU82o LRaAyVrMIt5LFTdhMeq3aLn1Iei6rlYT9uPPqMxPW7mlTgkV7+NC9wNOU14UW+sfdsqc RL3Q==
X-Gm-Message-State: AFeK/H3rG+4qx8jdgloE4NOI2Lv7DeCumt2fzyjgx8fE1aSVmTtsXh5Gcr7etpPnJyFiPebd/D/IzHfNkfNHEw==
X-Received: by 10.55.41.16 with SMTP id p16mr19375676qkh.321.1489919212200; Sun, 19 Mar 2017 03:26:52 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.200.35.2 with HTTP; Sun, 19 Mar 2017 03:26:31 -0700 (PDT)
In-Reply-To: <DEC1FB3BF31BFA0B97598F2E@PSB>
References: <CALmuJGcQg_dRuciG-aNs9z055eHZTP1qN8d1t2WMEP+dOX5Grw@mail.gmail.com> <DEC1FB3BF31BFA0B97598F2E@PSB>
From: Daniel Oaks <daniel@danieloaks.net>
Date: Sun, 19 Mar 2017 20:26:31 +1000
Message-ID: <CALmuJGemwuJo4FUptQN42fGhvbMkdGYPx-uSpRbu9+bG70TBwA@mail.gmail.com>
To: John C Klensin <john-ietf@jck.com>
Cc: precis@ietf.org
Content-Type: multipart/alternative; boundary="001a1140572663374e054b12d905"
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/QzR8PfoR0Snps56ZFK_v5PT8PJg>
Subject: Re: [precis] Emoji, Names and Normalisation
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Mar 2017 10:26:58 -0000

Hi John,

Thanks very much for your comments.

I can definitely see what you mean regarding the issues surrounding
casefolding emoji -- thanks for spelling them out so clearly and providing
the relevant links, I've certainly got some reading to do.

Regardless of emoji, the expanded range of characters we can use thanks to
adopting an IdentifierClass profile makes it more than worthwhile anyways.
Looking at the issues surrounding this, I'll likely go forward with the
UsernameCaseMapped profile for this. All else fails, later on if/when a
consistent strategy is adopted across the industry for casefolding emoji we
can always revisit this.

Your point about case conversions I can understand, particularly in newer
systems with the sort of audience that'd expect that. In this particular
protocol, not doing case conversion would go against many years of
established behaviour for both software and users so we'll probably stick
with the folding for now. However, I appreciate the view and I'll
definitely keep it in mind going forward.

Thanks again for your reply, I have a much clearer idea of where to go with
this.

Regards,
Daniel Oakley

On 19 March 2017 at 06:07, John C Klensin <john-ietf@jck.com> wrote:

> Daniel,
>
> (top post)
>
> Two very quick comments/ suggestions:
>
> (1) You could do a lot worse than using a PRECIS IdentifierClass
> profile, if only because it would give you consistency with
> other applications based on IETF specs. Regardless of the number
> of things users and designers "want" (often including a
> plentiful supply of ponies), one thing I've learned from far
> more years of experience with users and user interfaces is that,
> in the last analysis, the dominant desire is for consistency and
> predictability.  It also turns that that consistency and
> predictability makes things more secure, so that is a win across
> the board.
>
> Where possible, I also prefer to avoid case folding in
> identifiers.  Yes, users "want" it and we generally assume that
> it is important, but remember that Unix and closely-related
> systems don't do that and most people stopped complaining years
> ago.  Avoiding it gets more important with non-ASCII strings
> because there are some language and/or locality issues that
> lead, for a small number of edge cases, to behavior that users
> who haven't studied and accepted the rules think are
> inconsistent with their expectations.   If you must do case
> conversions, UsernameCaseMapped is at least as good as anything
> else you might choose and has just about the same consistency
> properties as other standardized IdentifierClass profiles.
>
> (2) As much as people want them, emoji are really not suitable
> for use in identifiers.   Maybe they will be some years from now
> but, at present, the names assigned by Unicode are not generally
> accepted and used, the icons used for a given one are
> significantly inconsistent across systems, there is no agreement
> about matching rules or normalization (especially in the context
> of modifiers), and so on.  If you (or the IETF) were to invent
> emoji normalization, someone else (such as the Unicode
> Consortium, but they aren't the only candidate) may come along,
> create their own version, and _really_ confuse your users.
> FWIW, the Unicode Consortium seems to agree about unsuitability
> in identifiers -- some issues with domain names notwithstanding,
> their identifier recommendations (UAX #31: Unicode Identifier
> and Pattern Syntax) do not allow emoji in identifiers.
>
> This topic has been discussed extensively during the last few
> weeks on the IDNA-Update mailing list (archives at
> http://www.alvestrand.no/pipermail/idna-update/, at least for
> now).  While parts of the discussion are specific to domain
> names, you can learn a lot more there if you are curious.
>
>     john
>
>
>
>
> --On Friday, March 17, 2017 11:52 +1000 Daniel Oaks
> <daniel@danieloaks.net> wrote:
>
> > Hey everyone,
> >
> > I do work with the IRC chat protocol. Specifically, right now
> > I'm doing work around allowing proper Unicode support, and
> > writing the casefolding specs that would required to allow
> > that.
> >
> > My current solution is based on PRECIS, but I'm running into
> > an issue and not exactly sure how to solve it.
> >
> > Essentially, we need to casefold 'nicknames' (usernames that
> > clients are referred to by), and for 'channel names' (chat
> > room names). It would be much preferred to use an
> > *IdentifierClass* profile. Using a single profile for both
> > name types is also much preferred for reasons of implementation
> > simplicity and for other protocol reasons (while we can have
> > different ones for both if it's necessary, sticking to a
> > single one would be much preferred).
> >
> > The only real profile out there which matches that description
> > right now is UsernameCaseMapped, which while does everything
> > we want to for nicknames, disallows emoji in channel names
> > (which some services have already knowingly allowed).
> >
> > I haven't dived deep into Unicode and normalisation, but would
> > there be a way for an *IdentifierClass* profile to allow and
> > appropriately normalise emoji? If so, would the best thing for
> > us to do here be to actually create our own profile for IRC
> > (channel) names? I'm wary of doing so seeing the advice
> > against profile proliferation here
> > <https://tools.ietf.org/html/rfc7564#section-5.1>, but given
> > the restriction it's difficult for us to adopt an
> > *IdentifierClass* profile for this without creating our own.
> >
> > Any advice on what we should do here would be much
> > appreciated. Thanks for the work you've all done so far!
> >
> > Regards,
> > Daniel Oakley
>
>
>
>
>