Re: [Idna-update] IDNA and combining sequences (was: Re: Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>)

John C Klensin <john-ietf@jck.com> Sun, 11 March 2018 14:45 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1434126CC4 for <idna-update@ietfa.amsl.com>; Sun, 11 Mar 2018 07:45:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.107
X-Spam-Level:
X-Spam-Status: No, score=-1.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_NONE=0.793] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0nJ0jNQu_hSt for <idna-update@ietfa.amsl.com>; Sun, 11 Mar 2018 07:45:06 -0700 (PDT)
Received: from bsa3.jck.com (unknown [65.175.133.137]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 621F7126BF0 for <idna-update@ietf.org>; Sun, 11 Mar 2018 07:45:06 -0700 (PDT)
Received: from hp5.int.jck.com ([198.252.137.153] helo=JcK-HP5.jck.com) by bsa3.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1ev2Dd-0004F2-HC; Sun, 11 Mar 2018 10:45:01 -0400
Date: Sun, 11 Mar 2018 10:44:56 -0400
From: John C Klensin <john-ietf@jck.com>
To: John Levine <johnl@taugh.com>
cc: idna-update@ietf.org
Message-ID: <6FAA2D68A20DF091B0E0086B@JcK-HP5.jck.com>
In-Reply-To: <20180310182908.56AE82216259@ary.qy>
References: <20180310182908.56AE82216259@ary.qy>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/JQRg4gGvnaWgkDlFFTPmiu0pvXk>
Subject: Re: [Idna-update] IDNA and combining sequences (was: Re: Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>)
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Mar 2018 14:45:08 -0000


--On Saturday, 10 March, 2018 13:29 -0500 John Levine
<johnl@taugh.com> wrote:

> In article <D26CE952D968BBEC0AB96A76@PSB> you write:
>> The latter would probably quickly reach the point that
>> attempts to apply the LGR to labels at the second level and
>> beyond would encourage non-compliance and daring ICANN to do
>> anything about it.     
> 
> I'm confused.  Every gTLD and many ccTLDS have language and
> script rules that limit what 2LDs they will accept.  When I
> was looking at gTLD applications, the rules all looked pretty
> reasonable, give or take what you think is a reasonable
> approach to handling variants.
> 
> What am I missing?  I realize that below the 2LDs anything
> goes.

This may not be worth beleaguering, but you are missing, or,
more likely, not looking hard enough, at least four things:

(1) Some of those rules have, at least historically, been no
more restrictive than "what IDNA2008 allows" and even (although
the new Board resolutions may gradually change that too) "what
IDNA2003 allows".    And, fwiw, lists of permitted code points
have been common; actual stated/specific rules much less so.
IIR, the latter were not even provided for in the initial
request for "character" (code point and code point sequence)
lists.

(2) Those tables started out as indications of intention, and
advice to others about what they might do or think about.  They
were not even intended as constraints on what the registering
zone was going to do.  Even though it was generally assumed that
a registry would not allow labels that did not conform to the
labels it registered, that was never a requirement that anyone
expected to be enforced at least up to the time those tables got
entangled with the new gTLD program.

(3) Perhaps in part because the tables that were registered with
IANA were optional, not "rules", there is a history of names
being registered as 2LDs that do not conform to the published
tables.  The distinction becomes important the moment the term
"variant" is mentioned, even as a blocking mechanism, because
even a simple FCFS rule such as "if you have 'google' registered
as an SLD, 'goog1e' is not allowed" is a rule with which an
allowed character list won't help.    That is actually an area
in which something could, in principle, be done immediately and
without any need to sort out the IETF review issues: ICANN could
update the general guidelines with advice to block additional
registrations in a given zone that reflect that sort of
relationship, using whatever information can be gleaned from the
LGR process, Asmus's efforts, and whatever else we know and not
excluding any scripts (as the example and dozens like it
indicate, this is not just an IDN problem).   I note that
variations on this idea have been proposed, several times, by
Andrew and others --I'm not inventing anything in this note.
For those of you who are spending the next several days in
Puerto Rico, that is probably a worthy discussion topic.  Going
into that discussion, you should, however, remember that rules
that are ultimately based on FCFS --preference for first party
to register a name from a set -- have had a rough history in
ICANN, setting off either claims of unfairness or unpleasant
"land rush" behavior or both.

(4) I don't know if you are "missing" it or not, but we need to
keep in mind that efforts to promote the use of DNAME records in
the root, at the second level, and elsewhere, have potential bad
side-effects.   If I have a zone "foo" at some level of the DNS
with an associated set of rules or policies, and nodes owned by
"bar" and "foobar" with DNAME records pointing at "foo", it is
not clear whether there is a reasonable expectation the foo's
rules will apply if bar and foobar can have their own rules or
no rules at all.  It is fairly clear, at least to me, what the
rules should be if the DNS RRSETs for foo, bar, and foobar all
exist at the same level in the same zone, e.g., no arrangements
like

   $ORIGIN zork.
   foo IN ...
   foobar.baz IN DNAME foo.zork.

but, as I trust everyone reading this understands, there is no
such requirement on DNAME -- there can, in principle, be a
record with RRTYPE DNAME pointing to "foo" from any zone and any
level of the DNS tree (and, btw, with any owner/registrant).
The rather unfortunate relationship between DNAME records in
arbitrary zone and DNSSEC doesn't help much here.  

That is, again, one of those "if we had known then what we know
now" bits of history: when DNAME was first designed and
deployed, it was intended for temporary, transitional, use and
DNSSEC was not even something that people were dreaming about.
Its evolution into a variant mechanism that is intended to be
stable (even if not "permanent") suggests the need for some
additional constraints but, even if we could introduce a new
RRTYPE with those constraints, we almost certainly could not
make DDNAME go away. 

best,
   john