Re: [DNSOP] Status of IDNA

Andrew Sullivan <ajs@anvilwalrusden.com> Thu, 13 April 2017 11:33 UTC

Return-Path: <ajs@anvilwalrusden.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3D8D713151B for <dnsop@ietfa.amsl.com>; Thu, 13 Apr 2017 04:33:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=yitter.info header.b=E1hXGDQq; dkim=pass (1024-bit key) header.d=yitter.info header.b=i9KNp3vE
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tEMf-heh0xHS for <dnsop@ietfa.amsl.com>; Thu, 13 Apr 2017 04:33:18 -0700 (PDT)
Received: from mx4.yitter.info (mx4.yitter.info [159.203.56.111]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0F6E9126C0F for <dnsop@ietf.org>; Thu, 13 Apr 2017 04:33:18 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mx4.yitter.info (Postfix) with ESMTP id 70608BD996 for <dnsop@ietf.org>; Thu, 13 Apr 2017 11:33:17 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yitter.info; s=default; t=1492083197; bh=13+JtI44CgrBPv1WnYNTPZptEEECjOCFUWJDQZc3myw=; h=Date:From:To:Subject:References:In-Reply-To:From; b=E1hXGDQqZ3z7iwwv6KJDiNIFWcb9hmpDkFfa6dOhZj9mH72Q+ramz6V6JLe2P+y0x z2NrRt4JYfSqnKVSn80LVvJJwE/GtntFFrvKmGmI4Lg+l1vZ7nn/onIs864pum8z2B T4bxoQ6liMZHsluVTbvEOr92ItpVXRKmpjoF0pf4=
X-Virus-Scanned: Debian amavisd-new at crankycanuck.ca
Received: from mx4.yitter.info ([127.0.0.1]) by localhost (mx4.yitter.info [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z-ngyVLB4Dyx for <dnsop@ietf.org>; Thu, 13 Apr 2017 11:33:15 +0000 (UTC)
Date: Thu, 13 Apr 2017 07:33:13 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yitter.info; s=default; t=1492083195; bh=13+JtI44CgrBPv1WnYNTPZptEEECjOCFUWJDQZc3myw=; h=Date:From:To:Subject:References:In-Reply-To:From; b=i9KNp3vEgWts49EOMCUIeSuw2IMA76pr6qveR3tqhA1UyRvyGL7kZ0MRVANEJmQW8 kgh6qQrvdbGB7Cof/rZwgDWsvbS5j0tO4aRkWsiuArwvn1UPeKzLhuFkY4azaWCk9J 8GiNXZ5H+OATvHmR4Iq7zjXbCbkjK5W/IRE7N+GI=
From: Andrew Sullivan <ajs@anvilwalrusden.com>
To: dnsop@ietf.org
Message-ID: <20170413113312.GA6422@mx4.yitter.info>
References: <d087f7d0-5def-8700-c1f3-f0fb53adf698@redhat.com> <20170412154713.GX25754@mournblade.imrryr.org> <20170412234541.GE6085@mx4.yitter.info> <20170413023935.GA25754@mournblade.imrryr.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170413023935.GA25754@mournblade.imrryr.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/d6YgmRb1OGhoTvJrC9jvGZCQ9uI>
Subject: Re: [DNSOP] Status of IDNA
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Apr 2017 11:33:20 -0000

On Thu, Apr 13, 2017 at 02:39:36AM +0000, Viktor Dukhovni wrote:
> 
> Well, IIRC they sensibly converged on a case-folded normal form
> that ensures that https://Духовный.org maps to the same underlying
> wire-form domain as https://духовный.org, i.e. both result in
> queries for xn--b1adqpd3ao5c.org.  AFAIK, those would generally be
> different domains under IDNA2008.

They would be different domains because the first of them is
DISALLOWED.  But everyone knew, when making IDNA2008, that removing
the case mapping from the protocol meant that clients needed to do it
before starting.  That's what https://tools.ietf.org/html/rfc5895 was
all about (a document that could have moved faster if some
participants had collaborated more enthusiastically instead of, well,
going away and making their own protocol).

One of the problems we had with IDNA2003 was that the protocol did the
caseFold operation.  The difficulty there was that there was no way to
pay attention to locale or other information that might tell you the
right thing to do, because caseFold is not nearly as simple as ASCII
always pretended it was.  IDNA2008's answer was to kick this problem
out of the protocol and into user agents, which were supposed to do
this in a sensible way.  If UTS#46 had restricted itself to that kind
of job, it could well have been an enormous contribution to the
practical use of IDNs.  Unfortunately, it didn't do just that.

> While is true that UTS#46 maps <U+1F4A9>.org to xn--ls8h.org, (see

In the way I'm using the term, that's not mapping, that's re-encoding.
The idea of "mapping" is to substitute in some more or less
predictable way some set of Unicode code points for some other set of
Unicode code points.  Then you can run the resulting final string
through the IDNA2008 algorithm.  

The problem with treaing U+1F4A9 as an acceptable character for an
identifier is not in itself -- maybe it is fine on its own -- but that
it is part of a class of characters that do not have normalizations
and are not letters or digits.  The IETF took seriously the advice
from UTC that we should use the stable categories that UTC had
invented, and derive our properties from those.  We did so, and no
emojis are in the categories that we used for the derivation.  It is
therefore more than a little frustrating to see the same UTC now
recommending that such characters be used in identifiers on the
network.  Such use is particularly bad with emojis because they have
no normalization either, and they interact in some ways with ZWNJ.
They provide a completely new playground for attackers to use in
phishing and so on, and we already have _enough_ trouble with that
without inventing new ways to cause ourselves grief.

Anyway, I doubt very much that DNSOP is the list where this ought to
be discussed (idna-update is still an open list, as is the IAB's
i18n-discuss list even though the program is closed, and precis is
still an active WG last I checked).  But any sentence about
internationalization that involves the concept "just do _x_" is, I
think, already too naïve.

Best regards,

A

-- 
Andrew Sullivan
ajs@anvilwalrusden.com