Re: [DNSOP] DNSOP Digest, Vol 125, Issue 31

"Patrik Fältström " <paf@frobbit.se> Thu, 13 April 2017 19:59 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C23013160C for <dnsop@ietfa.amsl.com>; Thu, 13 Apr 2017 12:59:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.602
X-Spam-Level:
X-Spam-Status: No, score=-2.602 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 97fywE39S4hc for <dnsop@ietfa.amsl.com>; Thu, 13 Apr 2017 12:59:03 -0700 (PDT)
Received: from mail.frobbit.se (mail.frobbit.se [85.30.129.185]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E5E94129566 for <dnsop@ietf.org>; Thu, 13 Apr 2017 12:59:02 -0700 (PDT)
Received: from [192.165.72.22] (unknown [IPv6:2a02:80:3ffc::22]) by mail.frobbit.se (Postfix) with ESMTPSA id B08C122A40; Thu, 13 Apr 2017 21:59:00 +0200 (CEST)
From: Patrik Fältström <paf@frobbit.se>
To: Paul Vixie <paul@redbarn.org>
Cc: dnsop@ietf.org
Date: Thu, 13 Apr 2017 21:59:00 +0200
Message-ID: <EAFDE7BC-9833-40B6-AC53-4064FB796AFE@frobbit.se>
In-Reply-To: <58EFCE8B.3050905@redbarn.org>
References: <mailman.878.1492083018.3988.dnsop@ietf.org> <510A6A22-BCC3-45AB-909B-50BF6AF2F02D@dukhovni.org> <20170413190124.GM6422@mx4.yitter.info> <58EFCE8B.3050905@redbarn.org>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_5A188019-3D7C-4518-981A-EAE053FB2FFF_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
X-Mailer: MailMate (2.0BETAr6082)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/Gl6HRgIuPSp7majRX74O4AYUjMc>
Subject: Re: [DNSOP] DNSOP Digest, Vol 125, Issue 31
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Apr 2017 19:59:05 -0000

On 13 Apr 2017, at 21:16, Paul Vixie wrote:

> i think we can't use something in a standard that doesn't have a
> canonical format. if there is a F(name) that produces a minimal I18N
> result which is the same as any other F(name) that would look the same
> or mean the same if written, then we should outlaw on-the-wire strings
> that are not in that canonical format. if there is no such F(name) then
> we should not use I18N in DNS, punycode or no.

There are several issues here and I am sorry to say that trying to make I18N as simple as one f(x) will fool all of us.

IETF tried via IDNA2003 to create this f(x) for any unicode string x.

This was problematic for various reasons, and I am not going to go through all arguments again.

IETF then developed IDNA2008 which is a function that take any unicode string and say whether the string x is valid or not to be used as a domain name, given it is in its ACE encoded form, as an A-Label, compared to the U-Label which is the string in a standard Unicode encoding form.

We because of this do not have F(x) but instead three functions:

1. Mapping from what the end user types, copy and pastes etc...to a string that is to be used "as a domain name". This mapping is very locale dependent, and include things like case folding and what not. It is recommended that Unicode normalisation and what not is to be included here.

2. The function that says whether a Unicode string is possible to use as a domain name or not, and this is the core of IDNA2008. This function just must be, as you say, something we all agree on, or predictability on the ability to use a string will be uncertain. It is independent of locale, independent of Unicode Version. And similar things.

3. The function that convert from a U-label (the string in a standard unicode encoding, normally UTF-8) to an A-label (the ace encoding of the string, starting with 'xn--') and its inverse.

It is very important (we saw with IDNA2003) that (2) is global and (3) is separate, and that there is a 1:1 mapping between A-label and U-label so that it is possible to convert between the two forms.

Regarding (1), yes, it would be good if we could use one function, but given for example case mapping is locale dependent, it is very very hard to come up with one mapping function. We have heard many different examples of why this is a hard problem.

So I claim your F(x) exists for [2] and [3], but not [1]. And that is where I18N is hard. Very hard.

   Patrik