Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>

John C Klensin <john-ietf@jck.com> Fri, 09 March 2018 17:13 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87BA7126DFB for <idna-update@ietfa.amsl.com>; Fri, 9 Mar 2018 09:13:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xxcQaoMMaOF9 for <idna-update@ietfa.amsl.com>; Fri, 9 Mar 2018 09:13:15 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD0DC126DD9 for <idna-update@ietf.org>; Fri, 9 Mar 2018 09:13:14 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1euLZx-00041o-36; Fri, 09 Mar 2018 12:13:13 -0500
Date: Fri, 09 Mar 2018 12:13:06 -0500
From: John C Klensin <john-ietf@jck.com>
To: Andrew Sullivan <ajs@anvilwalrusden.com>
cc: idna-update@ietf.org
Message-ID: <C6A42A1D62802037838AC6A0@PSB>
In-Reply-To: <20180308193744.rkyyz3omuxyd7ehg@mx4.yitter.info>
References: <C4FBCF12821031786F472AA2@PSB> <20180308174703.q3bffw7anrvjwzym@mx4.yitter.info> <2D4E04E4B3BB56404560C142@PSB> <20180308193744.rkyyz3omuxyd7ehg@mx4.yitter.info>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/iboDX1cU1Hd1TWjFuAuUpfEYU5Y>
Subject: Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Mar 2018 17:13:16 -0000


--On Thursday, March 8, 2018 14:37 -0500 Andrew Sullivan
<ajs@anvilwalrusden.com> wrote:

> On Thu, Mar 08, 2018 at 02:11:21PM -0500, John C Klensin wrote:
>> 
>> On the other hand, the same kind of thinking  that argues for
>> a "troublesome characters" list (at least as I've understood
>> it) -- more "you SHOULD NOT use these unless you are
>> extra-sure you know what you are doing and what the
>> implications might be" than any sort of absolute prohibition
>> -- might imply a similar statement about combining sequences.
> 
> That actually doesn't seem like a terrible idea to me.  Of
> course, that was _supposed_ to be the point of the general
> exhortation to have a clue about the code points you were
> putting in a zone.  And that doesn't seem to have panned out
> :-/

Well, and this may apply to Asmus's recent notes (including a
more nuanced view of combining sequences) too, I'm trying to
think in terms of, and maybe adjust, a distinction that goes
back to at least the earliest discussions of IDNs and IDN
policies.   The distinction is arguably closely related to the
difference between RFCs 1034/1035 and a number of subsequent
"this is how we are actually going to do this" statements, of
which RFC 1591 may be a high point.   The version that might be
relevant today is that there are three (or four) levels or
layers of IDNA-related rules affecting what domain names can be
registered:

(0) Nothing gets to violate the fundamental architecture of the
DNS, e.g., one doesn't get to have two labels that compare equal
in the same zone and it really is a tree.   By and large, we
expect this set of rules to be self-enforcing: it is hard to
even think about ways to break those rules and still have
anything work (although we've seen some speculation and wishful
thinking about aliases that would interleave branches of the
tree and that would therefore come close).

(1) The IDNA protocol collection specifies what is and is not
allowed and what operations work and how, and what is and is not
valid, at a fundamental level.  That isn't just
registration-time requirements.  IDNA2008 specifies checks to be
done at lookup time and we expect systems looking up names to
reject at least a subset of invalid strings before looking them
up.   There is some implementation flexibility there: while we
don't require, e.g., contextual checks at lookup time, I don't
imagine anyone would severely criticize an implementation for
rejecting a string outright that didn't conform to those rules
because, if lookup succeeded, the application would be safe in
concluding that either something was broken or that it was
likely that something malicious was going on.  That is true even
of IDNA2003, which effectively has no lookup-time requirements:
to take an extreme example, if an implementation were given
something in ACE (aka presumably "Puncode encoded") form;
decided to convert it back to native characters to be sure it
could generate native-character error messages if needed;
discovered it wouldn't convert or converted to something that
required mapping; and then generated an error message rather
than trying to look the thing up, I don't imagine anyone would
complain (other than perhaps a registrant who had bought such a
string in its Punycode encoded from -- something many registries
won't allow).  

The important thing about these protocol requirements is that no
sane person should expect the problem domain name or label to
actually work, or work predictably, for what we normally
consider DNS functions such as doing name to address
translation.  If someone were buying labels as trophies in the
hope that their trophy-value would increases over time that
would be another matter but I'm not sure that, from a protocol
standpoint, we really need to care.

(2) Advice, guidelines, or policy requirements applicable to a
broad variety of domains.  Unless we are going to have chaos,
such guidelines must allow only a subset of what is allowed
under the protocol limitations above.  For zones that are well
and carefully run, with name use and the best interests of
Internet users in mind, it is reasonable to expect that
well-designed guidelines will be followed, as least unless the
zone administrators understand things well enough to carefully
allow exceptions.   The meta-rule that zones are required to
understand what they are doing is one such guideline and, at
least to a first approximation, I think most registries
(including zones below the second level that are using IDNs)
try.   For those who are not inclined (for business or other
reasons), to follow such guidelines, the guidelines are only as
good as the existence of someone who is willing and able to
enforce them or otherwise discourage bad behavior.  

What has not panned out is the assumption that ICANN would make
at least some serious effort to either persuade or induce
registries, even second-level entities under contracted TLD
registries, to follow those guidelines, especially with
regard to taking active responsibility for what they are doing.
But that doesn't make establishing the guidelines a bad idea.
I'm not aware of it having happened yet with IDNs, but there is
precedent for people who are harmed by someone else's bad
behavior to cite violation of widely-accepted guidelines as
evidence that the behavior was bad or negligent.

(3) The rules that are used by the operator/ administrator of a
particular registry to determine what can or will be allowed in
that zone.  Noting that the rules for the root (including the
LGR rules but not limited to them, at least yet) are just a
special case of this category, all that is necessary to
application and enforcement of the rules is that the registry
decide to do so... and that the rules that are chosen be
sufficiently acceptable to whatever community(ies) are in a
position to hold the registry accountable that they don't either
turn into a source of never-ending strife or change often enough
to cause perceived instability.  

I think some of our discussions have confused the first two and
last two categories and then gone on to confuse the third
(general guidelines for the second level and below) and fourth
(registry-specific rules), assuming, for example, that what is
suitable for one registry should be applicable, either as a
guideline or as a protocol changes, to the others.

best,
   john