Re: [I18ndir] Study Group on Use of Emoji as Second Level Domain

John C Klensin <john-ietf@jck.com> Fri, 08 March 2019 17:08 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BCD921313CF for <i18ndir@ietfa.amsl.com>; Fri, 8 Mar 2019 09:08:00 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XkhlUbHyXuAc for <i18ndir@ietfa.amsl.com>; Fri, 8 Mar 2019 09:07:58 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A19031313CD for <i18ndir@ietf.org>; Fri, 8 Mar 2019 09:07:58 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1h2IyS-000DZv-Ja; Fri, 08 Mar 2019 12:07:56 -0500
Date: Fri, 08 Mar 2019 12:07:50 -0500
From: John C Klensin <john-ietf@jck.com>
To: Marc Blanchet <marc.blanchet@viagenie.ca>
cc: IETF I18N Directorate <i18ndir@ietf.org>
Message-ID: <BB2C46B8E7989AFD9925C55A@PSB>
In-Reply-To: <132AD5F9-EFAD-4A26-B439-D55AC5D92634@viagenie.ca>
References: <1d07e7ef-7c2f-e98a-4ff8-a1de5a8102dc@it.aoyama.ac.jp> <8893E807E58D89AEDAB37E6B@PSB> <132AD5F9-EFAD-4A26-B439-D55AC5D92634@viagenie.ca>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/rZi4-suV0LjAVVgVGyHgr6ltK_o>
Subject: Re: [I18ndir] Study Group on Use of Emoji as Second Level Domain
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2019 17:08:01 -0000

Marc,

I was trying to be brief in the hope of having time to look
carefully at Harald's note today but, to clarify...

--On Friday, March 8, 2019 09:59 -0500 Marc Blanchet
<marc.blanchet@viagenie.ca> wrote:

>> (3) I've looked briefly at UTR#46 for Unicode 12
>> (https://www.unicode.org/reports/tr46/ and
>> https://unicode.org/Public/idna/12.0.0/IdnaMappingTable.txt
>> and it still allows emoji (or at least a considerable number
>> of them -- I haven't spotted any exceptions).  Because
>> (except for earlier emoticons) they are prohibited by IDNA2003
> 
> not sure that IDNA2003 prohibited them specifically because at
> that time, we were using Unicode 3.2 as the base and emojis
> did not exist. And all « Unassigned » code points were
> giving some direction (store/query) but was underspecified
> essentially.

One can certainly call it "underspecified", but it seems to me
that the following are/were perfectly clear in IDNA2008, at
least for strings accepted by registries for delegation and
storage in the DNS, which is, AFAIK, the only issue before
ICANN.   By definition, such as string is a "stored string" (see
Section 4(1) of RFC 3490 and Section 7.2 of RFC 3454
(Stringprep).  The latter includes the statement "(stored
strings MUST NOT contain unassigned code points,...", which, to
me, is about as close to "prohibited" as the IETF gets.
Obviously that might be different if StringPrep (and possibly
NamePrep) had been updated after Unicode 3.2, but they weren't,
so any claim for validity under IDNA2003 takes us into the vary
strange territory of local interpretations and extrapolations of
what the IDNA2003 tables might have been with speculative
updates.  That, in turn, leads us down the path to many
alternative IDNA alleged standards (not just IDNA2003, UTR#46,
and IDNA2008, but many, many, versions and interpretations of
what an IDNA2003bis might look like) -- a subject about which
Patrik, SSAC, and others have written at length (independent of
my concerns and, if you like, ranting on the subject).

>> as well as
>> IDNA2008, there should be no further pretense that it is about
>> "transition".  Instead, it is a third, orthogonal, standard.
 
> The Unicode IDNA mapping table such as
> https://unicode.org/Public/idna/12.0.0/IdnaMappingTable.txt
> but all versions down to 6.X  (see
> https://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt)
> actually make emojis « valid » for the purpose of TR46. So
> this is not so much news.

Yes.  I didn't mean to imply that it was new.  I have been
hoping that, with each recent version of UTR#46 and its tables
and additional evidence about their unsuitability for
identifiers (including, fwiw, their exclusion by UAX#31 [1])
that the decision to treat emoji as valid in UTR#46 would be
reversed or at least supplemented there by a strong cautionary
note.  While I gather that the stability rules would prefer an
actual change from "valid" to "disallowed" without some very
fancy footwork, nothing I know of would prevent such a warning.
Those changes haven't happened.   I suppose I might have
inserted "still" in my comment but, the more UTR#46 is pushed as
an alternate or supplemental standard relative to IDNA2008, the
more I think we (and ICANN, etc.) need to be aware of the
divergence.

best, 
   john

[1] Yes, I'm aware of the claim that different (and less
restrictive) rules are appropriate for IDns relative to UAX#31
"identifiers" because the latter are about programming language
identifiers, but I just don't buy it.  It would probably be
reasonable if one believes that DNS names are not identifiers
(or intended for use as identifiers) but are, instead, value or
vanity tokens to be bought and sold without concern about
identifier usability.   If one considers DNS names as
identifiers for use by end users, many of whom may have little
knowledge of what is going on, then there is a strong case that
rules for what is allowed in those identifiers should be more
restrictive, not less, than rules for identifiers to be used by
programmers-specialists in programming languages.