Re: [I18nrp] Mappings for IDNA2008 ?

"Asmus Freytag (c)" <> Tue, 05 February 2019 00:51 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D42AF130DF1 for <>; Mon, 4 Feb 2019 16:51:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key); domainkeys=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9gnbGwYnzQNz for <>; Mon, 4 Feb 2019 16:51:37 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 60789130EE7 for <>; Mon, 4 Feb 2019 16:51:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dk12062016; t=1549327896; bh=9S8w4UiBqT9LR2/DWtIP0xfllQpTx+Pv8cQb gkkwtcc=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=WtKCPXNKpyTqdDYrwIMCRZjLqfOIGkunu 5Ef9wXzeFeDemmFQwM3Fst0kJ5wUrYjB3zR2GJHNbtfj0ROD696hePlFgU2vwdKAOI5 oM7rJRTZURNCeQk/UGUkagQ7/WF3bZBnOHWXk7o4iROZMB/M/LZjBIxZZx31Ebqb7CG DThLhML4O4zA258xmdkg8nd/2uuMqA9Wjy0IPfOqwo8MKbyPsoHjqP83A3kIFIfgThZ CBAOHF394v1P/4NVgE/nIj1uWmJcwNC+vl7kxmN8Zoi7a3GKtRL2bIN5N7NcmgsdxqP K66bPyglcFNe1R1wbdrw4MZ3J5IqmFH0WwAxAJS1w==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016;; b=GlkYjp7QdbRkWCWF/DQnWX196Sk1g7vhpEOZNEJRb1oxHy5p2iprVoGJ63fjz1RvzmRpoVAAHwE8qB/JTgyS7yAFNY3a5KQpxKVFQxDDRb2av7dUOZCakMkguajpSRQTDR/RgFyXHC+sHk6iYmIO6BfV8HuiJXc4Xj+znHKVvnpyKfTEpH50Vu6Xy1XKAK5IHjpeoFLR+pqIMDTuCwMRBlGWSvLg01LlI9KO3kzgqVW591qGlnRqTqroO4CkXaQJ2pxMStc6wce/+7336ZsIyMagIW6yESlDxijS/hzSYsCoHjeTfTjZYTZJeihgjznV7LA0P+fA/jifZ9e49qnKUw==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [] (helo=[]) by with esmtpa (Exim 4) (envelope-from <>) id 1gqoxb-000EGQ-Cn; Mon, 04 Feb 2019 19:51:35 -0500
To: John Levine <>,
References: <20190205002555.1AFBA200DC1DB1@ary.qy>
From: "Asmus Freytag (c)" <>
Message-ID: <>
Date: Mon, 4 Feb 2019 16:51:36 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0
MIME-Version: 1.0
In-Reply-To: <20190205002555.1AFBA200DC1DB1@ary.qy>
Content-Type: multipart/alternative; boundary="------------2C2DEEE2EC631B23D3BB9BCF"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b28d93432b0f0788b9bf96376fdfdffb5ea3546ca98bc0c3b2350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Archived-At: <>
Subject: Re: [I18nrp] Mappings for IDNA2008 ?
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 05 Feb 2019 00:51:40 -0000

On 2/4/2019 4:25 PM, John Levine wrote:
> In article <> you write:
>>> Are there any published IDNA2008 mappings?  As far as I can tell,
>>> everyone uses one from UTS46 by default, and it's not very good.
>> Examples of its badness, please.
> As I understand it:
> If you speak Turkish, the case folding is wrong.

The mapping says to take CAPITAL I WITH DOT to 0069 + 0307 (which is 
lowercase i + dot).

This is formally correct, because the sequence looks precisely like "i", 
but probably not helpful unless Turkish IDNs use that convention over 
just using lowercase 'i'.

In my personal view no registries should support any soft-dotted letter 
followed by dot above as that leads to an immediate homoglyph issue. So, 
yes, that mapping is indeed questionable.

> If you speak Persian, the joiners are wrong.

The Root Zone will be like IDNA2003 here, disallowing the joiners. Yes, 
stuff looks "wrong" without in some languages - worse for Sinhala (used 
in Sri Lanka) than for Persian, by the way.

This one is a no-win situation. Some zones may support these, others 
won't - what is a generic mapping to do?

> If you speak Arabic, the mapping or lack thereof between
> ASCII and Arabic digits is often wrong.
> If you speak Chinese, the whole thing is wrong because Chinese users
> expect their ASCII pinyin to be turned into Chinese.

That can't be a unique mapping. I imagine it would be more like the T9 
phone input.

Many, many Chinese characters share the same pronunciation (and 
therefore same pinyin).

And pinyin isn't ASCII - it has accents.

> IDNA 2008 said very clearly that good mappings depend on the user's
> context, with the language being a large part of that context.

None of these examples, except the first one using U+0307 strike me as 
"bad" -- only as not tailored.

I was hoping you had more examples where the mapping choices are 
inappropriate for most (or even all) languages.