Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC

Larry Masinter <LMM@acm.org> Sat, 08 December 2018 08:00 UTC

Return-Path: <masinter@gmail.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DBE651310DE for <i18nrp@ietfa.amsl.com>; Sat, 8 Dec 2018 00:00:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.637
X-Spam-Level:
X-Spam-Status: No, score=-1.637 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p6Hgy4fWqsJ6 for <i18nrp@ietfa.amsl.com>; Sat, 8 Dec 2018 00:00:34 -0800 (PST)
Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 74BF1127598 for <i18nrp@ietf.org>; Sat, 8 Dec 2018 00:00:34 -0800 (PST)
Received: by mail-pg1-x52b.google.com with SMTP id y4so2709235pgc.12 for <i18nrp@ietf.org>; Sat, 08 Dec 2018 00:00:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:references:in-reply-to:subject:date:message-id :mime-version:thread-index:content-language; bh=ZTySN+JeJH3bSBHEj1wtEFNtjG794zxXUVIstSwMJbk=; b=oGn/B+BAFyL3eB327n+wuAaaOX3PuYL8XJlqYEnGgPr/2xWgNQIOwuRYOSwzyFgNdP DpclspM3dEN/zjAS1N0xkAJ117+cGH9ARu9FUxcH/RDpj6PzptSqFOjYKrNCsV4WnLvq /1B3viDPvcNzzgQiCePK2z8g9xBtiuMuMKoV/M109NcWOriP1K9PaDXHHrHRkXZwrbbB 4Z9BhpC+9LphMytfkON8yXf4qHQKWv+oNpiGZ3hIjDlqUP7+5p82S89xVsfRPf6O54Qp T8sPfTEI25DdlE6xd3R9hqK8YfMLhpGd+h0WbElf11L4p6Hv/BQ0UldcLOG4evHdjoV/ Uh6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:references:in-reply-to:subject :date:message-id:mime-version:thread-index:content-language; bh=ZTySN+JeJH3bSBHEj1wtEFNtjG794zxXUVIstSwMJbk=; b=ckH4jUaOBCgynTOYSOKD50h1JCneci7GcuijIRkCYi0h6Gi5sUXmLuH46vHMxsdhDN oyM3LCxz/hxep2OKD51/qOF5ZBBSZohgEFieG1mPfO9jMAHTtozn3hR0fQwqyfmrZGnS BTPVtGf3y5va1YehmewhimBsNA/RQsQANDpJO0CRzcXZzprLmdvLTVuDoGR1Bc9L0DAC siB+Zrh5C9oOvIOz9sNpTJy+n5ITshtZeJ6VKMgOedAyMAqQDJXshhubXc2EI03viSSH OyvlOe88J+qhdk1DWaD+peLKpHFFEkjULJmznx0PYp16Da/AXHZOVwrqnxcDXKWnmzrS TkgQ==
X-Gm-Message-State: AA+aEWZCUPR+bYq9qNdMyHWAacO1lf71D2mL0VmSypGepomFVPHdr0Py Ap1BTiv+QCdboY5BV5ZjqvHDdWZRti8=
X-Google-Smtp-Source: AFSGD/VW21tj7YCcgXqe0SwhNL3ibWXSgRPk8D19PnqBbt5qw6Z43I+WnWQyYe2whCn1VWoAzCsz5g==
X-Received: by 2002:a65:6542:: with SMTP id a2mr4425131pgw.389.1544256033761; Sat, 08 Dec 2018 00:00:33 -0800 (PST)
Received: from TVPC (c-24-6-174-39.hsd1.ca.comcast.net. [24.6.174.39]) by smtp.gmail.com with ESMTPSA id y29sm6162958pga.59.2018.12.08.00.00.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Dec 2018 00:00:32 -0800 (PST)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: "'Asmus Freytag (c)'" <asmusf@ix.netcom.com>, 'John C Klensin' <john-ietf@jck.com>, 'Patrik Fältström' <paf=40frobbit.se@dmarc.ietf.org>
Cc: i18nrp@ietf.org, 'Paul Hoffman' <paul.hoffman@vpnc.org>
References: <154385119878.18333.5085298134102919486.idtracker@ietfa.amsl.com> <FF6F9EB9-C73B-4EC0-AC4F-3E3BFBABA0AB@vpnc.org> <8E20D432-01B0-4B52-80BB-3348C5FE73AF@vpnc.org> <CC73FC25-92FC-4822-B267-15C41CE450F2@frobbit.se> <D81CDFF3-8CDF-4168-9CEA-E8DC3A133B73@vpnc.org> <217ede0e-ea1f-bb31-a276-f8c618c71278@ix.netcom.com> <8885EE4C-412E-4337-A099-66354A36CEA1@vpnc.org> <EC12FDAE-4ABD-4AD3-A35A-B39D2C8A0AE0@frobbit.se> <f4417f80-fa86-11e6-baf7-2365981e18b1@ix.netcom.com> <48A2A546-4FEA-4060-8706-34D210B2ABAF@frobbit.se> <055301d48dc8$0ea95120$2bfbf360$@acm.org> <07CB0B3B-E48A-40CD-BBC9-E6CAA2FB29F0@frobbit.se> <001d01d48dee$82b415c0$881c4140$@acm.org> <1f879380-f586-cddf-ae4b-62cfc106308a@ix.netcom.com> <00f301d48e63$071e9be0$155bd3a0$@acm.org> <0D2335F6D932D325C3FBA91E@PSB> <6a8c84c4-a7af-9398-e706-199a6ec61d81@ix.netcom.com>
In-Reply-To: <6a8c84c4-a7af-9398-e706-199a6ec61d81@ix.netcom.com>
Date: Sat, 08 Dec 2018 00:00:33 -0800
Message-ID: <009f01d48ecc$18d853d0$4a88fb70$@acm.org>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_00A0_01D48E89.0AB5FE30"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQLuHH7V14krpZsNZ4Fo+NzUu7802gGsbo1pAc2pzv8Bi4XfCAIpje2WAfWJ+7wBzjyAXAG8g2e4AkfNvv0DFpL47wJLefS0Ak+F4gIB+9VstQH5lg4FAYYc/oUDL4NQrQJXIzW6ojP4E9A=
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/zR2nw3ZxZRaYL9vRes05lOa8_Vw>
Subject: Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Dec 2018 08:00:37 -0000

We were discussing Patrik’s document and what advice or rules to give to Registrars about being conservative,

And some questions about the motivation or intelligence of Registrars and their Clients.

 

It would help if the Registrars could pass off responsibility to their Clients; the only reason not to do so would be that the clients shouldn’t be expected to know the complex rules.

 

The transcription problem is relatively easy to explain and understand, and covers most other ways in which a name could be “bad”. 

These names don’t fall from the sky. Someone chooses them. Give them a clear motivation.

 

From: i18nRP <i18nrp-bounces@ietf.org> On Behalf Of Asmus Freytag (c)
Sent: Friday, December 7, 2018 12:36 PM
To: John C Klensin <john-ietf@jck.com>; Larry Masinter <LMM@acm.org>; 'Patrik Fältström' <paf=40frobbit.se@dmarc.ietf.org>
Cc: i18nrp@ietf.org; 'Paul Hoffman' <paul.hoffman@vpnc.org>
Subject: Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC

 

On 12/7/2018 12:02 PM, John C Klensin wrote:

 
 
--On Friday, December 7, 2018 11:28 -0800 Larry Masinter
 <mailto:LMM@acm.org> <LMM@acm.org> wrote:
 

...
The reason for emphasizing transcription is not that there
aren't other operations that are possibly more frequent
(copy/paste a URL from one context to another, remember on a
bookmark list) but rather that transcription is the most
stringent requirement – if a user can transcribe a name
resulting in the same sequence of Unicode codepoints, then
they can display the name, distinguish the name from other
(transcribable) names. 

 
Larry, I hope Asmus will respond further to this because he is
far more expert on relevant issues across a very wide range of
scripts and writing systems than I am, but I don't believe what
you write above is true. 

Globalization is often misunderstood - it's neither the magic bullet that
makes everything equally accessible to everyone, nor is it exclusively
about presenting everything to every user in their native context.

Instead, it's an interesting mixture of both.

For IDNs the goal is to allow full native support of mnemonic identifiers
in the context of the language/script native to the user - for all such
combinations of languages and scripts so that, in principle all users
can be supported.

But it's also about enabling the accessing of resources outside your
native bubble - Unicode is a big step in that direction, because before it
you were often locked into a native character set baked into your system.

However, a label in a foreign script will not be "mnemonic" for you, 
and there's nothing we can do about that in this context. However,
we should expect that labels in your script (and where relevant your
language) are indeed able to be mnemonic.

Otherwise, why not use digits like telephone numbers?

I think that is ultimately what Larry has in mind and what I cover by
"typable". You should be able to read a label, understand it, and 
reproduce it by typing, or be able to tell someone about it.

For the Latin script (because of the features that you are attempting
to outline below) even arbitrary sequences of code points do not
generally compromise the ability for a label to be mnemonic and to
support dealing with it other than by clicking/pasting.
(The above is strictly true only for ASCII, once you add the full range
including combining marks a bit more care must be taken - as will
be evident when the draft for the Latin Root Zone LGR becomes
available, until then, I'll spare myself the digression into details)

For other scripts, there's a wide variety of issues.

One example from Arabic: Quite a few letters in Arabic share positional
forms, that is, they are only distinct if, for example, at the end of a word.

Also, some languages using the Arabic script support keyboards that
only have one of these letters, but not the "standard" Arabic-language
one. As a result, while users can "read" an Arabic-language name,
a geographical name, for example, they could not type that name.
If they did, it may even look right, but be different.

Therefore, a registry that attempts to support both languages in
the Arabic script, but does not support variants, will have a fraction
of its labels that are de-facto unusable for some users.

Understood in that sense, "typable for a native user" is a valid goal
that informs necessary features of label generation rules (registry
policies) that enable the use of labels as mnemonic, in the same
way we take for granted for non-IDN labels.

A./

 Some scripts --of which the Roman
Script (Basic Latin of some millennia ago) is a good example--
simply have characters that are more easily distinguishable from
either other by people who cannot actually read the script or
associated language(s) that others.  Put a hypothetical person
who has never seen either before in front of a short string of
characters in that script and in front of a script written with
connected characters, complex use of ligatures, and subtle
distinctions among characters (look at Arabic digits for two and
three for an example of what I mean by "subtle" ... or look at
"O" and, in many contemporary type designs, "Q").  Then supply
that person with character pickers for the two scripts.  You
would almost certainly get accurate transcription of the Roman
characters and probably would not get it with the other script.
In neither case would be ability to transcribe imply the ability
to render and display (I'd encourage observation of five and six
year olds learning to write), nor would it imply that ability to
accurately copy and paste.
 
So, ability to transcribe may be a useful goal, but it isn't,
IMO, close to Patrik's global requirements list.  
 
best,
   john