Re: [I18nrp] [art] Use Unicode if Using Unicode?

Shawn Steele <Shawn.Steele@microsoft.com> Thu, 11 October 2018 07:48 UTC

Return-Path: <Shawn.Steele@microsoft.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DAF2D130E07; Thu, 11 Oct 2018 00:48:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.456
X-Spam-Level:
X-Spam-Status: No, score=-2.456 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.456, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=microsoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s3wH0LQltJI5; Thu, 11 Oct 2018 00:48:40 -0700 (PDT)
Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0100.outbound.protection.outlook.com [104.47.42.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15340130E43; Thu, 11 Oct 2018 00:48:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=y9I7ITzB/r2G14/ce+8mNlwmgtacWl4SPlhcghJdEQU=; b=BrPIQlhpnssaM6Hsyu7Q/d5ANW4rEdEvvy2X2aE8PhkfFuuAkLC9FkCGH7lrfEZmP4Z7rJT8d2AQJnyWR9c0TaOppB9Mo50hzYLwmfa9ILuT9Gat3AhBreq7t8moscW5rI+WAFK1TqunnjueCdLQeyr766eTJYqy+UMjg44kkjI=
Received: from MW2PR2101MB0908.namprd21.prod.outlook.com (52.132.152.28) by MW2PR2101MB0892.namprd21.prod.outlook.com (52.132.152.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1250.7; Thu, 11 Oct 2018 07:48:38 +0000
Received: from MW2PR2101MB0908.namprd21.prod.outlook.com ([fe80::c5c5:865d:9a44:6c89]) by MW2PR2101MB0908.namprd21.prod.outlook.com ([fe80::c5c5:865d:9a44:6c89%4]) with mapi id 15.20.1228.011; Thu, 11 Oct 2018 07:48:38 +0000
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: John C Klensin <john-ietf@jck.com>, "art@ietf.org" <art@ietf.org>, "i18nrp@ietf.org" <i18nrp@ietf.org>
Thread-Topic: [art] Use Unicode if Using Unicode?
Thread-Index: AdRgyyu11+nsTlEOSXCvakwfGZPatwAOxVeAAAt0f9A=
Date: Thu, 11 Oct 2018 07:48:38 +0000
Message-ID: <MW2PR2101MB0908D4D3EB13FFAA07AD610682E10@MW2PR2101MB0908.namprd21.prod.outlook.com>
References: <MW2PR2101MB0908F009734817997508274282E00@MW2PR2101MB0908.namprd21.prod.outlook.com> <FB4FE0D631E6F6D4C72B19A1@PSB>
In-Reply-To: <FB4FE0D631E6F6D4C72B19A1@PSB>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [50.35.70.68]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; MW2PR2101MB0892; 6:IUH8Wn0cNPxjeG+Lcx46wZpq8Dd4X7xgvwE38ySL/duTMcQfD2sjmP9+CDK7aJflgcgC3R8xQPdmJZ7SAaNxUiugYxPHtzp/PHLivPWDpL1O7NLo3amQKs/+v1XrfDUHDx7yuzu/e/vluzrAa2USIGzryYqWNaPID/hcIqq9J9O3P18vwiBfzjE54TD7GDsAWfuuimiOUQdsAm19qhtvgYRr1xRJDNX3aeA3Sv9KG5OF/xs2wSfebxmzA5Z7clfB67aw9Xg8aMoNJ+LR2sae/1b4wHHJ+vTZJyXkfC3Z1NN9A6yXfNtb6kziF7jSTz2hq8n8XKv+OIfmFyp+jYPWqGcWOLb3KSkTfnbDMO8vC2w0edv9snmhRP+uvvqVQ/ovPAIUiTvCJzYX+Dw0EVtDeF6saw5PoCFH+8z3tZhbOEb+vDj1tQCBoEK2YtfABrZ7+dSJXSJSkx3V/meRjI6DAg==; 5:HbxIJ/SVCDzTBHtGrdhwoEXAbsNYj6MJFA5woTDBilJFy/kABTtSupqcgNMpCKiu71OgSvmMzCZyVThBImhdKO0EqBG0Ntinhr8LQ6wfQFl1DerXPOw+hCo9NcA10RGViofxg4rVWnDifhkKZDUPWCB5wQ4DiJ3/eNrHANAGZik=; 7:JmGZh95elwezeamWTUgNbdq3UO0t8ewb4VkOoA8ev5561TBPb83yALjbhcgn2rgCv0vQKMlqUHmVBTqTKNvenIw3qr75ZqFMaGlXQb+ysygag4TF6VT6URZGzAh8j0TXU9uaW7OTmTZDbjdk9KIAoaznNCnhrBifKJOu9yW2czuC/R7EkwQfdVkRFXSIj+YfACmvS2Z2JIwsLsQTd23ms3hbEYdB1jnRVhUcuWmJZdqVnyufWgxssn2n5wZTGRHw
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-ms-office365-filtering-correlation-id: 2a5de695-9c6c-4aa5-6752-08d62f4df4c2
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7193020); SRVR:MW2PR2101MB0892;
x-ms-traffictypediagnostic: MW2PR2101MB0892:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shawn.Steele@microsoft.com;
x-microsoft-antispam-prvs: <MW2PR2101MB08927C2F6199866B5998BE4682E10@MW2PR2101MB0892.namprd21.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(100405760836317)(269456686620040)(28532068793085)(89211679590171);
x-ms-exchange-senderadcheck: 1
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(3002001)(3231355)(944501410)(52105095)(2018427008)(93006095)(93001095)(6055026)(149066)(150057)(6041310)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123560045)(20161123564045)(201708071742011)(7699051)(76991055); SRVR:MW2PR2101MB0892; BCL:0; PCL:0; RULEID:; SRVR:MW2PR2101MB0892;
x-forefront-prvs: 08220FA8D6
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(39860400002)(346002)(376002)(366004)(136003)(189003)(199004)(13464003)(74316002)(105586002)(11346002)(6246003)(478600001)(97736004)(446003)(25786009)(2201001)(86362001)(86612001)(55016002)(316002)(110136005)(7696005)(26005)(53936002)(186003)(9686003)(102836004)(99286004)(76176011)(22452003)(2906002)(71190400001)(3846002)(71200400001)(6116002)(2501003)(6436002)(33656002)(53546011)(68736007)(5660300001)(6506007)(5250100002)(486006)(14454004)(476003)(305945005)(7736002)(2900100001)(10290500003)(72206003)(106356001)(229853002)(8936002)(66066001)(10090500001)(8990500004)(8676002)(81156014)(81166006)(256004)(14444005); DIR:OUT; SFP:1102; SCL:1; SRVR:MW2PR2101MB0892; H:MW2PR2101MB0908.namprd21.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: P1mW7sux4TBJ0D/MrRwSJYKkFAtVKTevy5RqHzdSyRkYN/6kZlEtvAU7WjGQMQP4JnjcL4gKQ2YgZe7EsOx/SGn2E0MpSN/MQXNmLyTicbOUO9PT6yHU447DaSOrQuQD/t6jNWZk6E+k+r4qAg9lWLJiJ3zz/CkiijtMy3UTXxLuKbpG7ZXlHKhebqt7LGSw1QmqG4O4Om3n+tpN6zmOCJ/lzqPnQRDUiBEqLai5FnIvqr7jT+zaHL4BUeQvz6U2AB7jvaHRgfqRzJ08vsIufKdDxoBMcydhAh3mRis3d/f14h3lxd9VPsY+2dTK8Nc+SWoXDJBqIM6gLco9+sOB/iPzvC/0j/frULiDlt1e/Sw=
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: microsoft.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 2a5de695-9c6c-4aa5-6752-08d62f4df4c2
X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Oct 2018 07:48:38.4247 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR2101MB0892
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/7RigZj3R9uytNLpSXmCn_0zKiTE>
Subject: Re: [I18nrp] [art] Use Unicode if Using Unicode?
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2018 07:48:45 -0000

The file Unicode generates can be used either with the UTS46 compatibility characters or without.  To use your terms, it both provides data that conforms to the IDNA2008 rules... and hybrid versions.  If you choose the appropriate information when reading the IdnaMappingTable.txt you should have exactly the table expected.  In particular the NV8 and XV8 tagged characters I believe you'd prefer to exclude.

I'm unaware of other differences.  I could also be in error, I haven't actually done an exhaustive comparison of the IndaMappingTable.txt data vs generated datasets via the rules.

The IdnaMappingTable.txt file also isn't "authoritative," however since it's generated from the Unicode data and the rules in the RFCs, the file should be as accurate as any other reference.  

Unicode is aware of the importance of the stability of the valid character set, so probably the best time to review and vet the Unicode changes for new Unicode versions would be during the Unicode process, prior to release.  After a Unicode version was released, any disagreements would lead to problematic confusion (as we've seen before.)

It seems to me that using the IdnaMappingTable.txt - perhaps with guidance as to proper usage - would reduce delays in adopting new characters.

-Shawn

-----Original Message-----
From: John C Klensin <john-ietf@jck.com> 
Sent: Wednesday, October 10, 2018 7:01 PM
To: Shawn Steele <Shawn.Steele@microsoft.com>om>; art@ietf.org; i18nrp@ietf.org
Subject: Re: [art] Use Unicode if Using Unicode?

Shawn,

I'm confused about what you are suggesting, so let me clarify where I'm confused and then hope you can enlighten me...

--On Wednesday, October 10, 2018 19:02 +0000 Shawn Steele <Shawn.Steele=40microsoft.com@dmarc.ietf.org> wrote:

> The draft states "It further suggests for the IETF a path forward 
> regarding ensuring IDNA2008 follows the evolution of the Unicode 
> Standard" and "this document requests that IANA update the tables to 
> Unicode 11."

> Each Unicode version creates a data file with information from 
> applying the IDNA2008 rules that can be used for IDNA mapping 
> algorithms.  (Indeed, that's where Windows gets the data from).

Taking a half-step back, IDNA imposes two requirements for new versions of Unicode, both of which are addressed by the present draft.  One is that the changes in the new version be examined to ensure that nothing new has been done that requires changes to IDNA (most likely adding exceptions or rules for particular code points) or careful explanation somewhere.  No one I know of expected that provision would be exercised very often but, for a variety of reasons, the consensus was that it was an important
safeguard.   The other was that the IDNA ruleset be run against
the characters and properties in the Unicode Character Database to produce a table reflecting the combination of that version of Unicode and IDNA, as modified, at that time.  That table was (and is) to be stored with IANA but, while we expect it to be checked carefully for accuracy, it is not actually authoritative
-- only the rules and categories specified by IDNA (and the Unicode properties used to support them) are.

The discussion in RFC 5895 notwithstanding, one of the critically important properties of IDNA2008 is that U-labels and A-labels are duals: one can get from one to the other and back without any loss of information.  That reversibility is true in only some cases for Unicode normalization (especially with compatibility normalization) or case folding, much less for other mapping scenarios.  So I don't understand what sort of "mapping" you are talking about.

The only Unicode-created IDN data files I'm aware of are those associated with the UTS#46 effort.  Because UTS#46 makes recommendations that are inconsistent with IDNA2008, if Microsoft is using those tables, its usage is non-conforming to IDNA2008.  I certainly cannot prevent Microsoft from doing that (and wouldn't try), but it would certainly not be consistent with general interoperability of IDNs or what is known elsewhere as Universal Acceptance of those domain names.

> If the goal is to "follow the evolution of the Unicode Standard" and 
> the Unicode Standard is providing data that conforms to the IDNA 
> rules, then why not just point directly to the Unicode derived tables?

The simplest answer to your question is that, unless I've missed something, the conditional is false: the Unicode Standard is not providing data that conforms to the IDNA2008 rules.  Instead, the data they are providing is more like one of the creative
IDNA2003-IDNA2008 hybrids to which Patrik refers.

   best,
     john