RE: Possible BofF question -- I18n (was: Re: Possible OBF question -- I18n)

John C Klensin <john-ietf@jck.com> Fri, 01 June 2018 06:17 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C20B312704A for <ietf@ietfa.amsl.com>; Thu, 31 May 2018 23:17:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.89
X-Spam-Level:
X-Spam-Status: No, score=-1.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, T_FILL_THIS_FORM_SHORT=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nIVe0Azbbhbx for <ietf@ietfa.amsl.com>; Thu, 31 May 2018 23:17:16 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 44A8D127010 for <ietf@ietf.org>; Thu, 31 May 2018 23:17:16 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1fOdN7-000GfL-Bj; Fri, 01 Jun 2018 02:17:09 -0400
Date: Fri, 01 Jun 2018 02:17:04 -0400
From: John C Klensin <john-ietf@jck.com>
To: Larry Masinter <masinter@adobe.com>, Nico Williams <nico@cryptonector.com>, Peter Saint-Andre <stpeter@mozilla.com>
cc: John Levine <johnl@taugh.com>, ietf@ietf.org, Patrik Fältström <paf@frobbit.se>
Subject: RE: Possible BofF question -- I18n (was: Re: Possible OBF question -- I18n)
Message-ID: <E85D51610704CAC0CE34C213@PSB>
In-Reply-To: <DM5PR0201MB3461D96B0526D648593D1424C3620@DM5PR0201MB3461.namprd02.prod.outlook.com>
References: <20180530231127.17198276FEE3@ary.qy> <071E6235FE7B088A2B56A238@PSB> <0093E2CD-670E-47B6-A286-4FDEB140FAD9@frobbit.se> <20180531172228.GF14446@localhost> <383c2404-7beb-63e9-b2b2-e75fd1b174f1@mozilla.com> <20180601041949.GH14446@localhost> <DM5PR0201MB3461D96B0526D648593D1424C3620@DM5PR0201MB3461.namprd02.prod.outlook.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/plOcpZbfT3Lz_ZLc8UNZtJhi_Z4>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jun 2018 06:17:19 -0000

Larry,

Other than disabusing people of ideas that this is easy, I
really do not want to talk about protocol details or solutions.
I want to talk about the procedural issue of how we move
forward, given that we have an interesting array of symptoms,
e.g., 

* drafts have been initiated to address specific and important
problems, or even to start identifying them in depth, and the
IESG (fwiw, I believe correctly but others may not agree) has
been unwilling to address them because there has not appeared to
be sufficient in-depth expertise and energy/ commitment to form
a working group.

* the IAB put together an I18N Program effort to try to get on
top of some of the issues.  I would claim it did some useful
work in its early days, but the IAB shut it down a year ago
after concluding that it wasn't doing anything, had not done
anything for some time, and was showing no signs of that
changing (IAB, if you don't like that summary, suggest another
one, but I think that is reasonably accurate).  That decision
was announced, rather than being discussed with the the
community or even with the set of Program members who were not
on the IAB, something the IAB clearly has the right to do.

* As far as I know, we have no plan to get unstuck from the
above.  I note Nico's suggestion of

> I mean, here's how you do this:
> 
> a) you get a couple of ADs with I18N experience,
> b) also someone on the IAB with I18N experience,
> c) add a mandate for an I18N Considerations section,
> d) add an I18N directorate

And note that we've had ADs and IAB members with I18N experience
and it hasn't appeared to make any long-term difference, that
there have been efforts to explain the importance of these
issues to several recent Nomcoms and either they could find no
candidates with significant levels of the right skills or they
didn't consider the issues very important.   We've also had a
mandate for what was then called "multilingual Considerations"
for over 20 years (see Section 8.1(I) of RFC 2130).   That IAB
Program wasn't a directorate as we usually define the term, but
it had a lot of the same properties.

I'm not interested in casting blame, but comparing that list
with things we have tried already certainly reinforces my sense
that we have gotten stuck.  Or, if you prefer, that we are at
the bottom of a hole and all we can think to do is to stand
there and do nothing or keep digging.

Relative to your proposal I also note that we made decisions
(also more or less referred to in RFC 2130 that actual protocol
elements should stay in ASCII unless they was compelling reason
to "internationalize" them.  We've learned a lot since then and
the world has changed, but, at least IMO, we have never really
reexamined that principle.   As to an "identify which names can
cause problems" service, that has been tried too.  A number of
script-specific efforts, starting with the JET work reflected in
RFC 3743 and extending forward to include at least ICANN's LGR
effort, have focused on identifying characters (or Unicode code
points) that might be problematic in various ways and
combinations.  Others have looked at potentially confusing
relationships, both due to accidental confusion (hard problem
for the general case) and malice (nearly impossible case, IMO).
It just simply isn't as easy as you think, especially if the
IETF does not abandon the principle that we will not go through
Unicode one code point at a time looking for problems, possibly
in comparison with every other code point.   It may also be
worth pointing out that one of the "stuck" (and now-expired)
drafts was devoted to the identification of troublesome code
points.   This simply isn't as easy as whipping up a few
programs ... and getting consensus that those programs do the
right thing (even more or less) would be very difficult because
counterexamples keep coming out of the proverbial woodwork or
crawling out from under proverbial rocks.

Personally, I think that "troublesome character" approach would
be helpful as long as we can be very clear about its
limitations, including remembering that it isn't the Final
Ultimate Solution to anything or a replacement for skilled and
knowledgeable human judgment, and we can figure out a reliable
and sustainable way to maintain the table.  We don't have an
obvious way forward with either of those requirements and that
is, again, a reason I thought this discussion was worth starting.

best,
    john



--On Friday, June 1, 2018 04:51 +0000 Larry Masinter
<masinter@adobe.com> wrote:

> A modest proposal (I'm sure this is controversial so flame
> away...)
> 
> A big part of the problems in i18n in IETF protocols have to
> do with extending protocol elements from ASCII to Unicode, and
> how to avoid difficulties when that happens. 
> 
> Protocol elements include domain names, URLs, email addresses,
> file names
> 
> But where do these Unicode names come from? They're not
> arbitrarily generated by automated processes, they're
> constructed from strings that are selected, typed in,
> registered. So focus on encouraging people to choose strings
> that won't give problems. 
> 
> A large specification of all of the use cases to avoid is very
> difficult to write and hard to review. There are very many
> special cases (final sigma, umlauts, private name characters,
> non-normalization of combined forms) with expertise widely
> distributed. I'm not sure the solution is "more specs"; in
> fact, there are many obscure special cases, and the specs are
> very difficult to write and review.
> 
> I wonder if there's any interest in building an open-source
> service that would, when given a proposed domain name or URL
> or email address, tell you what problems various subsets of
> users would have when trying to deploy that name (e.g., names
> that don't display properly on popular platforms, names that
> can't be reliably typed in correctly even if they can be
> viewed, those that are likely to get confused with other
> similar but different names).
> 
> Perhaps get started at a Hackathon? 
> 
> I did reserve the domain name "caniuse.name" that I will offer
> to any sincere effort.
> 
> 
> 
>