Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>

Asmus Freytag <asmusf@ix.netcom.com> Sun, 11 March 2018 07:28 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E713E1200F1 for <idna-update@ietfa.amsl.com>; Sat, 10 Mar 2018 23:28:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.721
X-Spam-Level:
X-Spam-Status: No, score=-2.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sHQYTgyDlbv9 for <idna-update@ietfa.amsl.com>; Sat, 10 Mar 2018 23:28:45 -0800 (PST)
Received: from elasmtp-kukur.atl.sa.earthlink.net (elasmtp-kukur.atl.sa.earthlink.net [209.86.89.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C91C8120227 for <idna-update@ietf.org>; Sat, 10 Mar 2018 23:28:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1520753325; bh=/HLhBylQqifRhBQbmNCWvwRjnA4n6Non3Wgu Wyy1Mxc=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type: Content-Transfer-Encoding:Content-Language:X-ELNK-Trace: X-Originating-IP; b=AXeK9NRVuYzzFSrdjzJpJuZvrHoNYJlG+j5EhiUHodEgCM tmPWdOZ0UEfbpw5/Gxc+ZllZwRQyC5EjKc45lrU638+IV42EUipH28iN/LWus96TL+E oG2IoDUNQerN2+KurPK91gbbp0dCTpRP9NX8Nd+xgKICf5/V4ulalWW+RfwudQzePlY 1sUvTBPr1gv4jWgeaFHaAzBqPKmdSIhzCx27fd/1Tcw/b+MWvr9lv+PiWLVf4q0tM00 enyWunmqUpfD873jkGRcTzmVoKTH8R4xsOk/Gk/O/mDEUduTILbhRkxyKvh/O2e5/oH BZjcgwTjnXXMpHl389bT8TbAVH+A==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=E+3AuN4/Lh4VIZlk1DZrnEl8RKacCedVhNsSFI948Z0/P3oGnlswJFiIibJk13GM/b4ngH3bFyMSRs1l3SzUGhb/AOE1yQJ4PKScBxxCGQQFHsls1nS6nm73LY016qyMoYQYFycdaMRRrNrqrzUcqIQwW4B+Vb0FlEpD7CZW7yIaaCLLwun1QcvyhKmYWd8nCrk/g8xbToWQ4S+58FV84pJ++AIJ8/3BibwP9DWgDOVyguJXgD0e+C+gznIbBZq2GnsNxQfjVInaTbZ1q/bJnAY1/o3NZuQx5biiaehqThy7BqySs0+Hrv+P0xCUwiQJqEhLX0B02g0ZkByn/3rSpg==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [46.21.151.107] (helo=[10.4.47.190]) by elasmtp-kukur.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1euvPP-000EqH-AG for idna-update@ietf.org; Sun, 11 Mar 2018 03:28:44 -0400
To: idna-update@ietf.org
References: <C4FBCF12821031786F472AA2@PSB> <20180308174703.q3bffw7anrvjwzym@mx4.yitter.info> <2D4E04E4B3BB56404560C142@PSB> <20180308193744.rkyyz3omuxyd7ehg@mx4.yitter.info> <C6A42A1D62802037838AC6A0@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <ef4f380c-c4f5-0fed-e02b-f7a12f6d74f6@ix.netcom.com>
Date: Sat, 10 Mar 2018 23:28:51 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <C6A42A1D62802037838AC6A0@PSB>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2c1627926350bb93f524352281340815f899841cc275f10cf350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 46.21.151.107
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/wkI2nv-53hDdyCz_lY9fuHTTZk8>
Subject: Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Mar 2018 07:28:48 -0000

On 3/9/2018 9:13 AM, John C Klensin wrote:
>
> --On Thursday, March 8, 2018 14:37 -0500 Andrew Sullivan
> <ajs@anvilwalrusden.com> wrote:
>
>> On Thu, Mar 08, 2018 at 02:11:21PM -0500, John C Klensin wrote:
>>> On the other hand, the same kind of thinking  that argues for
>>> a "troublesome characters" list (at least as I've understood
>>> it) -- more "you SHOULD NOT use these unless you are
>>> extra-sure you know what you are doing and what the
>>> implications might be" than any sort of absolute prohibition
>>> -- might imply a similar statement about combining sequences.
>> That actually doesn't seem like a terrible idea to me.  Of
>> course, that was _supposed_ to be the point of the general
>> exhortation to have a clue about the code points you were
>> putting in a zone.  And that doesn't seem to have panned out
>> :-/
> Well, and this may apply to Asmus's recent notes (including a
> more nuanced view of combining sequences) too, I'm trying to
> think in terms of, and maybe adjust, a distinction that goes
> back to at least the earliest discussions of IDNs and IDN
> policies.
... eliding discussion of elements of the protocol and stipulating for the
remainder of the discussion that protocol and its requirements are assumed
to be met.
>
> (2) Advice, guidelines, or policy requirements applicable to a
> broad variety of domains.

May not need to define precisely what the limits of that variety is, but 
should
note that zones where all labels are owned by the same entity may already
be restrictive in other ways, whereas zones that not only are "public" but
also support users of more than one script (and by extension multiple 
languages
are perhaps most in need of these).

> Unless we are going to have chaos,
> such guidelines must allow only a subset of what is allowed
> under the protocol limitations above.
Agreed.
> For zones that are well
> and carefully run, with name use and the best interests of
> Internet users in mind, it is reasonable to expect that
> well-designed guidelines will be followed, as least unless the
> zone administrators understand things well enough to carefully
> allow exceptions.

These zones will take active steps to mitigate issues caused by the fact
that IDNs are necessarily reflective of f the writing systems involved -
including their comparative complexity vis a vis the basic Latin alphabet.

>   The meta-rule that zones are required to
> understand what they are doing is one such guideline and, at
> least to a first approximation, I think most registries
> (including zones below the second level that are using IDNs)
> try.

Not my impression at all. For anything other than very simple scripts,
the IDNtables that can be easily inspected because they are available
online as tables are not reflective of a deeper understanding of the writing
system, beyond the most basic limitation of the repertoire to a given 
script.

> For those who are not inclined (for business or other
> reasons), to follow such guidelines, the guidelines are only as
> good as the existence of someone who is willing and able to
> enforce them or otherwise discourage bad behavior.

"willing" registries might benefit from more detailed guidelines that would
give them a metric of how successful their "attempts" turned out to be.
I think at least some of them might be surprised at the result.
>
> What has not panned out is the assumption that ICANN would make
> at least some serious effort to either persuade or induce
> registries, even second-level entities under contracted TLD
> registries, to follow those guidelines, especially with
> regard to taking active responsibility for what they are doing.

This is an empty letter as long as no guidelines exist that establish 
meaningful
metrics allowing to tell apart successful mitigation from simply copying 
some
basically insufficient IDN table from some other zone.
> But that doesn't make establishing the guidelines a bad idea.
> I'm not aware of it having happened yet with IDNs, but there is
> precedent for people who are harmed by someone else's bad
> behavior to cite violation of widely-accepted guidelines as
> evidence that the behavior was bad or negligent.

This would be a beneficial use of guidelines that actually cover meaningful
best-practice.
>
> (3) The rules that are used by the operator/ administrator of a
> particular registry to determine what can or will be allowed in
> that zone.

Note that for many zones what is allowed is not static, but depends on
what is already delegated. The rules effectively describe what is allowed
next, given a status quo.

This approach is something that is useful (or should be understood as
useful) for the majority of zones: in all zones registrants compete for
available labels, but in some zones, a delegated label does not only
prevent the same label, but also certain related labels from being 
registered.

Such related labels are called variants -- but the use of the term
variants does not necessary imply that more than one label gets to
be allocated. On the contrary, our experience in Root Zone has shown
that blocked variants are by far more common and useful tool.

>   Noting that the rules for the root (including the
> LGR rules but not limited to them, at least yet) are just a
> special case of this category, all that is necessary to
> application and enforcement of the rules is that the registry
> decide to do so... and that the rules that are chosen be
> sufficiently acceptable to whatever community(ies) are in a
> position to hold the registry accountable that they don't either
> turn into a source of never-ending strife or change often enough
> to cause perceived instability.

Again, the Root Zone project is informative.

So far, a single proposed script LGR has proven contentious -- with the 
wider
community objecting to it being too permissive, rather than the other
way around. As this has not reached the stage where an LGR has been
submitted, the community may yet come to some agreement.

Generally, the drafting panels appear to have been well anchored in their
respective communities and their submitted proposals perform well when
tested against lists of putative labels while mitigating known issues.

>
> I think some of our discussions have confused the first two and
> last two categories and then gone on to confuse the third
> (general guidelines for the second level and below) and fourth
> (registry-specific rules), assuming, for example, that what is
> suitable for one registry should be applicable, either as a
> guideline or as a protocol changes, to the others.

Very cogent observation.

I'm certainly keenly aware of the difference between the RZ-LGR project
which effectively results in registry-specific rules (for the root) and 
general
guidelines that can be used to define registry-specific rules on other 
levels.

The RZ-LGR is purposefully limited to the modern-use subset of modern-
use scripts. (It also includes hyphen and digits). It further makes the 
assump-
tion that the zone is shared by users of multiple scripts - and users of
"all" languages - at least those that are widely written for everyday 
purposes.

The individual script LGRs are also accompanied by very detailed 
description
and references to source materials.

That allows them to serve as examples/starting point for defining registry-
specific rules for other zones. The closer these zones match the other
assumptions for the RZ project, the closer to the RZ-LGR their eventual
rules would be expected to end up - unless the RZ-LGR erred in how to
account for the risks inherent in multi-script/multi-zone domains.

This simplifies the task of the guideline writer, without reducing the task
to simply imposing the RZ-LGR unexamined everywhere.

If I were to be asked by a registry to develop policy for the second 
level, I
might proceed as follows:

(0) Start with the RZ-LGR s for all the scripts to be supported in the 
new zone.

(1) Add all digits and the hyphen
(1a) Mitigate the issue of homoglyph digits (Arabic) by making them blocked
         in-script variants
(1b) Mitigate the issue of some digits being homglyphs of letters by making
        these blocked in-script variants

(2) Retain cross-script variants from RZ-LGR for all scripts in the new zone

(3) Optional, if a feature (restriction) in the RZ-LGR is documented as 
being
       motivated by the need to be particularly restrictive for the root,
       investigate the cost/benefit of removing the restriction. (If in 
doubt
       keep the restriction).

(4) If the zone is limited to certain languages, remove features 
(restrictions)
       documented as being necessary in a multilingual zone
(4a) investigate the cost/benefit of adding language-specific support 
for any
        language that isn't well-supported due to RZ restrictions.

(5) Make sure LGR follows guidelines on combining marks; (like the ones
       we discussed under separate cover.

(6) Make sure LGR follows guidelines on repertoire (TBD).

(7) Make sure WLE rules and context rules continue to work if repertoire
      expanded
     (7a) It may be appropriate to relax / tighten some of these rules if
      the mix of languages to be supported requires that or benefits 
from it.

and so on.

The guidelines would effectively focus on well-understood modern-use script,
because that's what is addressed by the RZ-LGR. Because (most) historical
scripts are supremely ill-understood and have not had the benefit of any
deeper analysis for IDN purposes, a general guideline would strongly
recommend against including them.

Some lesser-used and/or emerging scripts (with modern user communities)
would benefit from their communities following the RZ-LGR procedure
with appropriate adaptations.

Some very limited zones, e.g. Cyrillic only, or Polynesian only, might come
under pressure to support code points for characters that look like
punctuation marks; these code points were not ruled out in IDNA, but
are considered deeply troublesome (not least by RFC 6912) -- actual
guidelines would have to be written so as to settle the question whether
"secure" LGRs should always eschew them, or whether limited exceptions
are to be seen as justified.

The issues facing the various scripts are diverse enough that trying to 
write
fully general guidelines gets either meaningless (too vague) or 
bewilderingly
complex in no time. That's the reason I keep coming back to the use of the
RZ-LGR as a non-binding starting point.

A./