Re: [I18nrp] [art] Use Unicode if Using Unicode?

John C Klensin <john-ietf@jck.com> Thu, 11 October 2018 02:01 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 49DE2130DFF; Wed, 10 Oct 2018 19:01:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id odI_SPfFzZZ2; Wed, 10 Oct 2018 19:01:07 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DC4FA130DF9; Wed, 10 Oct 2018 19:01:06 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1gAQHh-0003ji-2a; Wed, 10 Oct 2018 22:01:05 -0400
Date: Wed, 10 Oct 2018 22:00:59 -0400
From: John C Klensin <john-ietf@jck.com>
To: Shawn Steele <Shawn.Steele=40microsoft.com@dmarc.ietf.org>, art@ietf.org, i18nrp@ietf.org
Message-ID: <FB4FE0D631E6F6D4C72B19A1@PSB>
In-Reply-To: <MW2PR2101MB0908F009734817997508274282E00@MW2PR2101MB0908.namprd21.prod.outlook.com>
References: <MW2PR2101MB0908F009734817997508274282E00@MW2PR2101MB0908.namprd21.prod.outlook.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/TsP3a7i2paP3toHn4KmA19Wb6nM>
Subject: Re: [I18nrp] [art] Use Unicode if Using Unicode?
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2018 02:01:10 -0000

Shawn,

I'm confused about what you are suggesting, so let me clarify
where I'm confused and then hope you can enlighten me...

--On Wednesday, October 10, 2018 19:02 +0000 Shawn Steele
<Shawn.Steele=40microsoft.com@dmarc.ietf.org> wrote:

> The draft states "It further suggests for the IETF a path
> forward regarding ensuring IDNA2008 follows the evolution of
> the Unicode Standard" and "this document requests that IANA
> update the tables to Unicode 11."

> Each Unicode version creates a data file with information from
> applying the IDNA2008 rules that can be used for IDNA mapping
> algorithms.  (Indeed, that's where Windows gets the data
> from). 

Taking a half-step back, IDNA imposes two requirements for new
versions of Unicode, both of which are addressed by the present
draft.  One is that the changes in the new version be examined
to ensure that nothing new has been done that requires changes
to IDNA (most likely adding exceptions or rules for particular
code points) or careful explanation somewhere.  No one I know of
expected that provision would be exercised very often but, for a
variety of reasons, the consensus was that it was an important
safeguard.   The other was that the IDNA ruleset be run against
the characters and properties in the Unicode Character Database
to produce a table reflecting the combination of that version of
Unicode and IDNA, as modified, at that time.  That table was
(and is) to be stored with IANA but, while we expect it to be
checked carefully for accuracy, it is not actually authoritative
-- only the rules and categories specified by IDNA (and the
Unicode properties used to support them) are.

The discussion in RFC 5895 notwithstanding, one of the
critically important properties of IDNA2008 is that U-labels and
A-labels are duals: one can get from one to the other and back
without any loss of information.  That reversibility is true in
only some cases for Unicode normalization (especially with
compatibility normalization) or case folding, much less for
other mapping scenarios.  So I don't understand what sort of
"mapping" you are talking about.

The only Unicode-created IDN data files I'm aware of are those
associated with the UTS#46 effort.  Because UTS#46 makes
recommendations that are inconsistent with IDNA2008, if
Microsoft is using those tables, its usage is non-conforming to
IDNA2008.  I certainly cannot prevent Microsoft from doing that
(and wouldn't try), but it would certainly not be consistent
with general interoperability of IDNs or what is known elsewhere
as Universal Acceptance of those domain names.

> If the goal is to "follow the evolution of the Unicode
> Standard" and the Unicode Standard is providing data that
> conforms to the IDNA rules, then why not just point directly
> to the Unicode derived tables?

The simplest answer to your question is that, unless I've missed
something, the conditional is false: the Unicode Standard is not
providing data that conforms to the IDNA2008 rules.  Instead,
the data they are providing is more like one of the creative
IDNA2003-IDNA2008 hybrids to which Patrik refers.

   best,
     john