Re: [precis] Emoji, Names and Normalisation
John C Klensin <john-ietf@jck.com> Sat, 18 March 2017 20:08 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA0BE12709D for <precis@ietfa.amsl.com>; Sat, 18 Mar 2017 13:08:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nOZl9cBS-1MK for <precis@ietfa.amsl.com>; Sat, 18 Mar 2017 13:08:06 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 35F2312942F for <precis@ietf.org>; Sat, 18 Mar 2017 13:08:06 -0700 (PDT)
Received: from [198.252.137.70] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1cpKdv-000EEE-Cd; Sat, 18 Mar 2017 16:08:03 -0400
Date: Sat, 18 Mar 2017 16:07:54 -0400
From: John C Klensin <john-ietf@jck.com>
To: Daniel Oaks <daniel@danieloaks.net>
cc: precis@ietf.org
Message-ID: <DEC1FB3BF31BFA0B97598F2E@PSB>
In-Reply-To: <CALmuJGcQg_dRuciG-aNs9z055eHZTP1qN8d1t2WMEP+dOX5Grw@mail.gmail.com>
References: <CALmuJGcQg_dRuciG-aNs9z055eHZTP1qN8d1t2WMEP+dOX5Grw@mail.gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.70
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/S1tKV-5NVie0PeYF-c5tCOxoloM>
Subject: Re: [precis] Emoji, Names and Normalisation
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Mar 2017 20:08:09 -0000
Daniel, (top post) Two very quick comments/ suggestions: (1) You could do a lot worse than using a PRECIS IdentifierClass profile, if only because it would give you consistency with other applications based on IETF specs. Regardless of the number of things users and designers "want" (often including a plentiful supply of ponies), one thing I've learned from far more years of experience with users and user interfaces is that, in the last analysis, the dominant desire is for consistency and predictability. It also turns that that consistency and predictability makes things more secure, so that is a win across the board. Where possible, I also prefer to avoid case folding in identifiers. Yes, users "want" it and we generally assume that it is important, but remember that Unix and closely-related systems don't do that and most people stopped complaining years ago. Avoiding it gets more important with non-ASCII strings because there are some language and/or locality issues that lead, for a small number of edge cases, to behavior that users who haven't studied and accepted the rules think are inconsistent with their expectations. If you must do case conversions, UsernameCaseMapped is at least as good as anything else you might choose and has just about the same consistency properties as other standardized IdentifierClass profiles. (2) As much as people want them, emoji are really not suitable for use in identifiers. Maybe they will be some years from now but, at present, the names assigned by Unicode are not generally accepted and used, the icons used for a given one are significantly inconsistent across systems, there is no agreement about matching rules or normalization (especially in the context of modifiers), and so on. If you (or the IETF) were to invent emoji normalization, someone else (such as the Unicode Consortium, but they aren't the only candidate) may come along, create their own version, and _really_ confuse your users. FWIW, the Unicode Consortium seems to agree about unsuitability in identifiers -- some issues with domain names notwithstanding, their identifier recommendations (UAX #31: Unicode Identifier and Pattern Syntax) do not allow emoji in identifiers. This topic has been discussed extensively during the last few weeks on the IDNA-Update mailing list (archives at http://www.alvestrand.no/pipermail/idna-update/, at least for now). While parts of the discussion are specific to domain names, you can learn a lot more there if you are curious. john --On Friday, March 17, 2017 11:52 +1000 Daniel Oaks <daniel@danieloaks.net> wrote: > Hey everyone, > > I do work with the IRC chat protocol. Specifically, right now > I'm doing work around allowing proper Unicode support, and > writing the casefolding specs that would required to allow > that. > > My current solution is based on PRECIS, but I'm running into > an issue and not exactly sure how to solve it. > > Essentially, we need to casefold 'nicknames' (usernames that > clients are referred to by), and for 'channel names' (chat > room names). It would be much preferred to use an > *IdentifierClass* profile. Using a single profile for both > name types is also much preferred for reasons of implementation > simplicity and for other protocol reasons (while we can have > different ones for both if it's necessary, sticking to a > single one would be much preferred). > > The only real profile out there which matches that description > right now is UsernameCaseMapped, which while does everything > we want to for nicknames, disallows emoji in channel names > (which some services have already knowingly allowed). > > I haven't dived deep into Unicode and normalisation, but would > there be a way for an *IdentifierClass* profile to allow and > appropriately normalise emoji? If so, would the best thing for > us to do here be to actually create our own profile for IRC > (channel) names? I'm wary of doing so seeing the advice > against profile proliferation here > <https://tools.ietf.org/html/rfc7564#section-5.1>, but given > the restriction it's difficult for us to adopt an > *IdentifierClass* profile for this without creating our own. > > Any advice on what we should do here would be much > appreciated. Thanks for the work you've all done so far! > > Regards, > Daniel Oakley
- [precis] Emoji, Names and Normalisation Daniel Oaks
- Re: [precis] Emoji, Names and Normalisation John C Klensin
- Re: [precis] Emoji, Names and Normalisation Daniel Oaks
- Re: [precis] Emoji, Names and Normalisation Andrew Sullivan