Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>

John C Klensin <john-ietf@jck.com> Thu, 08 March 2018 17:11 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6057E1270A7 for <idna-update@ietfa.amsl.com>; Thu, 8 Mar 2018 09:11:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9A6HYpW2tc5b for <idna-update@ietfa.amsl.com>; Thu, 8 Mar 2018 09:11:57 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03CCE12702E for <idna-update@ietf.org>; Thu, 8 Mar 2018 09:11:56 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1etz58-0000yJ-F9 for idna-update@ietf.org; Thu, 08 Mar 2018 12:11:54 -0500
Date: Thu, 08 Mar 2018 12:11:48 -0500
From: John C Klensin <john-ietf@jck.com>
To: idna-update@ietf.org
Message-ID: <C4FBCF12821031786F472AA2@PSB>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/zx4C2qDocWdxAquXr47LWD55eoI>
Subject: Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Mar 2018 17:11:58 -0000

One more thought about something it may be helpful to remind
ourselves periodically.

One of the notions floated when IDNs were first being discussed
and raised a few times after that was to simply ban combining
forms of all types.  The theory was that there was no
entitlement to write any character form (e.g., any "word") in
DNS labels, that those labels were all about mnemonics and
nothing but mnemonics, and that a "no combining sequences" rule
would eliminate most issues about normalization and grapheme
clusters and make everything a lot easier to explain to
implementers and others who wanted to conform but were not
willing to make the investment to actually understand all of the
complex issues and rules with which we are now dealing.  

If that were the rule and someone really, really, wanted a
grapheme that could only be formed in Unicode with a combining
sequence, it would be up to them to convince the Unicode
Consortium that their favorite character (grapheme) needed to be
added to Unicode as a single code point.   However hard that
might be, it would not be our problem.

FWIW, a "nothing but precombined characters" rule is essentially
the recommendation for Arabic IDNs in RFC 5564 and, I
understand, in the emerging Arabic script rules for the root
zone.

We didn't go down that path, not only because of impassioned
pleas for some of the character forms that might be excluded but
because of precisely the reassurance that arguably led to the
non-decomposing characters thread -- assurances that no new code
points would be added to Unicode if there were already a
combining sequence that could reasonably substitute for it in
the same script except under very unusual circumstances and,
when those circumstances occurred, the new code points would
decompose to those sequences.

I don't suggest that we try to reverse that decision at this
point.   I assume that, if nothing else, it would just be too
disruptive.  However, it is worth pointing out that a "no
combining sequences" rule would eliminate the non-decomposing
character problem and at least a few other potential spoofing
and related cases.   It might also be worth examining as a
guideline or advice for registries who are interesting in
raising the safety level of what they allow to be registered
without having to understand the underlying issues more deeply.

best,
    john