Re: [DNSOP] Status of IDNA

Andrew Sullivan <ajs@anvilwalrusden.com> Wed, 12 April 2017 23:18 UTC

Return-Path: <ajs@anvilwalrusden.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5D6811279E5 for <dnsop@ietfa.amsl.com>; Wed, 12 Apr 2017 16:18:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=yitter.info header.b=nFyIrZn6; dkim=pass (1024-bit key) header.d=yitter.info header.b=HjXXb54/
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hK2s-QZjudot for <dnsop@ietfa.amsl.com>; Wed, 12 Apr 2017 16:18:05 -0700 (PDT)
Received: from mx4.yitter.info (mx4.yitter.info [159.203.56.111]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A02112786A for <dnsop@ietf.org>; Wed, 12 Apr 2017 16:18:05 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mx4.yitter.info (Postfix) with ESMTP id 9518EBD996 for <dnsop@ietf.org>; Wed, 12 Apr 2017 23:18:04 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yitter.info; s=default; t=1492039084; bh=iE5H1UtrL+4e6NCsft6bpodoBQ0PLFv5WXDsiosxS7U=; h=Date:From:To:Subject:References:In-Reply-To:From; b=nFyIrZn6O3j0EXZ48n+vtDey2bp4OheTEeAJxWuSQhIHBLoAP1mfFfcbfa1Y4+WOh kw2QCF2NLo7vK+4+Y7/nb2964yyhm+yWuJW5Wv+b6nQZVHxzJfVwtAWTROWYe0o69I jnbmbnGUOmcm7MbifM2UAnHhtaAPR7OWFZ1BZiR0=
X-Virus-Scanned: Debian amavisd-new at crankycanuck.ca
Received: from mx4.yitter.info ([127.0.0.1]) by localhost (mx4.yitter.info [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Gco8zCdcWstN for <dnsop@ietf.org>; Wed, 12 Apr 2017 23:18:03 +0000 (UTC)
Date: Wed, 12 Apr 2017 19:18:04 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yitter.info; s=default; t=1492039083; bh=iE5H1UtrL+4e6NCsft6bpodoBQ0PLFv5WXDsiosxS7U=; h=Date:From:To:Subject:References:In-Reply-To:From; b=HjXXb54/Rw49uI6svFjV5hP/aD+Y+HWySaZw7TSvtRydFKp9udUqhcKOilEV1DC6s UScHCmT8QbbICD9ghCAP0qU5LXAmJ4fjwCtP5Z5lorys3phGTBSffo1Zg40pWYf6ja RRWZgSdVAKr50qfofc926rG+FhOsgPEbMsF9jDMo=
From: Andrew Sullivan <ajs@anvilwalrusden.com>
To: dnsop@ietf.org
Message-ID: <20170412231803.GC6085@anvilwalrusden.com>
References: <d087f7d0-5def-8700-c1f3-f0fb53adf698@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <d087f7d0-5def-8700-c1f3-f0fb53adf698@redhat.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/j-Ez00WfjTiJ260ulPWGlBZvuJk>
Subject: Re: [DNSOP] Status of IDNA
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Apr 2017 23:18:07 -0000

Accidentally sent this just to Florian.  Fixing that.

On Wed, Apr 12, 2017 at 01:36:49PM +0200, Florian Weimer wrote:
> What's the current standardization status of IDNA?

It's complicated.

> As far as I can tell, a lot of vendors are still stuck with the original
> IDNA standard (IDNA2003).

Well, not exactly.  IDNA 2003 is defined only for Unicode 3.2, which
nobody has used for a long time.  So strictly speaking, what most of
those vendors are doing is undefined, since there's no way to know
what version of Unicode you have installed.  Bur practically, of
course, this is what's happening.

> There are three or more competing successors,
> IDNA2008 as standardized by the IETF (without any tweaks), the Unicode IDNA
> standard TS46 (<http://www.unicode.org/reports/tr46/>, which is configurable
> and allegedly compatible with IETF IDNA2008, but is not because it yields
> different results than IETF IDNA2008), and the Mozilla/DENIC IDNA
> implementation in Firefox/Thunderbird
> (<https://bugzilla.mozilla.org/show_bug.cgi?id=479520> and other sources).

UTS#46 is allegedly a transitional technology that is supposed to do
mapping so that things that did work in IDNA2003 but that won't under
IDNA2008 will continue to work.  As a practical matter, it does not
appear to have a mechanism by which the transition can be brought to
an end, so it's hard to see it as an actual transition mechanism.
It's also hard to understand why it has so many characters marked as
"valid" that are not valid under any version of IDNA, including many
emojis.  

> These aren't compatible.  You can see this by visiting
> <https://www.buße.de/> in different browsers.

Yes.

> Is there an ongoing effort to reconcile application behavior?  Different
> TLDs appear to expect different IDNA implementations.

ICANN's rules require IDNA2008.  There is a new version of the
guidelines out -- I can't recall whether the public comment is closed.
Many ccTLDs voluntarily conform to the guidelines also, but ICANN
can't compel it.  And of course ICANN has no control of other things
down the tree.

Part of the trouble comes because the WHATWG is apparently opposed to
IDNA2008, partly on the grounds that it makes some domain names that
were valid under 2003 invalid.

> One practical problem with IDNA2003 is that it prevents having Hebrew domain
> names containing ASCII digits.  Any of the IDNA2008 variants mentioned above
> will fix that, I think, but it's difficult to pick a variant to implement.
> And I certainly don't want to implement per-TLD policies.

Another practical problem with IDNA2003 is that it loses information
round trip: Unicode can be sent into the algorithm, turned into
Punycode, and when it is turned back into Unicode the result is not
the same.  ß is mapped, for instance, so you get back "ss" whatever
you put in.  This issue is one of the main things that IDNA2008 fixes,
in my opinion: every valid A-label matches exactly one valid U-label,
and anything that does not have this property is not a valid A- or
U-label.

There are problems with IDNA that have prevented it being updated for
recent versions of Unicode.  Asmus Freytag, John Klensin, and I are
working on a draft to try to fix that.

Best regards,

A

-- 
Andrew Sullivan
ajs@anvilwalrusden.com