Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>

John C Klensin <john-ietf@jck.com> Thu, 08 March 2018 19:11 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C13D3127076 for <idna-update@ietfa.amsl.com>; Thu, 8 Mar 2018 11:11:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0JBX_Vf0dms5 for <idna-update@ietfa.amsl.com>; Thu, 8 Mar 2018 11:11:32 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CE1861270AC for <idna-update@ietf.org>; Thu, 8 Mar 2018 11:11:28 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1eu0wp-000194-Ca; Thu, 08 Mar 2018 14:11:27 -0500
Date: Thu, 08 Mar 2018 14:11:21 -0500
From: John C Klensin <john-ietf@jck.com>
To: Andrew Sullivan <ajs@anvilwalrusden.com>, idna-update@ietf.org
Message-ID: <2D4E04E4B3BB56404560C142@PSB>
In-Reply-To: <20180308174703.q3bffw7anrvjwzym@mx4.yitter.info>
References: <C4FBCF12821031786F472AA2@PSB> <20180308174703.q3bffw7anrvjwzym@mx4.yitter.info>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/4vkxx8tNm5XwLzRptcIGKa0WGaM>
Subject: Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Mar 2018 19:11:34 -0000


--On Thursday, March 8, 2018 12:47 -0500 Andrew Sullivan
<ajs@anvilwalrusden.com> wrote:

> On Thu, Mar 08, 2018 at 12:11:48PM -0500, John C Klensin wrote:
>> If that were the rule and someone really, really, wanted a
>> grapheme that could only be formed in Unicode with a combining
>> sequence, it would be up to them to convince the Unicode
>> Consortium that their favorite character (grapheme) needed to
>> be added to Unicode as a single code point.   However hard
>> that might be, it would not be our problem.
> 
> I thought the argument at the time was not just that it was
> safe, but that some characters (in conceptual sense that
> Unicode uses it) were necessarily made of combining
> characters.  This was also I guess the reason that the two
> joiners ended up in under context rules, I think, yes?

Yes.  The joiners normally (and pre-emoji) had more to do with
what would ordinarily be considered phrase or sentence
construction than with character formation, but, yes, they were
included because of strong arguments that the use of some
scripts (at least with some languages) was impossible without
them.  The distinction is, IMO, worth making only because it
would have been plausible to prohibit combining sequences and
allow ZWJ and ZWNJ (with or without contextual conditions)anyway
== really separate issues.

> Moreover, there was the principle that we didn't want to make
> restrictions low in the tree even if we thought they were
> mostly a bad idea near the top (this was literally the
> reasoning for including hieroglyphs as I recall).

Certainly it was a key part of the reasoning for allowing
archaic scripts.  Whether hieroglyphs were the critical issue or
merely a common example probably depends on who you ask.

I should stress that I'm not at all convinced that we made the
wrong decision about combining sequences, nor that we would have
made a different decision had we had access to better and more
accurate information.   I would feel better now if we could look
back at the decision and its consequences and say "there was an
informed discussion based on full information and we decided to
do it and accept the consequential risks" -- just as we did
with, e.g., the characters that were classified as valid (PVALID
or CONTEXTx) in IDNA2008 but mapped to something or discarded in
IDNA2003.

On the other hand, the same kind of thinking  that argues for a
"troublesome characters" list (at least as I've understood it)
-- more "you SHOULD NOT use these unless you are extra-sure you
know what you are doing and what the implications might be" than
any sort of absolute prohibition -- might imply a similar
statement about combining sequences.  In other words, whether
they are prohibited by the requirement that putative labels be
in NFC form or not (and many are), one should not be using them
without really good reason.  Not a protocol prohibition, but an
extra layer of protection against edge cases and some
potentially-malicious behavior.

best,
    john