Re: [I18n-discuss] Comments on "troublesome-characters" from Arabic script

"Abdulaziz H. Al-Zoman" <azoman@citc.gov.sa> Sun, 30 July 2017 08:38 UTC

Return-Path: <azoman@citc.gov.sa>
X-Original-To: i18n-discuss@ietfa.amsl.com
Delivered-To: i18n-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1D8101293E1 for <i18n-discuss@ietfa.amsl.com>; Sun, 30 Jul 2017 01:38:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iK_E_VJEASqY for <i18n-discuss@ietfa.amsl.com>; Sun, 30 Jul 2017 01:38:00 -0700 (PDT)
Received: from ry0iron1.citc.gov.sa (mx1.citc.gov.sa [IPv6:2001:67c:18c8:20::70]) by ietfa.amsl.com (Postfix) with ESMTP id 63D9612708C for <i18n-discuss@iab.org>; Sun, 30 Jul 2017 01:37:58 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2DqAQDjmX1Z/wYNCwpdGQEBAQEBAQEBAQEBBwEBAQEBgxEBSIFRJwefaJYZggSFRwKDdFcBAgEBAQEBAmsohRgBAQEBAzo/DAQCAQgOAwQBAQEeCQcyFAkIAgQBDQUIiii0MYs5AQEBAQEBAQMBAQEBAQEBIYMog02BYYMnhEEBCwYCASCDQYIxBYcsgjyWBwKIWYYPhUmCAIlKhmeVcld/CygMWIdiQDaHb4EygQ4BAQE
X-IPAS-Result: A2DqAQDjmX1Z/wYNCwpdGQEBAQEBAQEBAQEBBwEBAQEBgxEBSIFRJwefaJYZggSFRwKDdFcBAgEBAQEBAmsohRgBAQEBAzo/DAQCAQgOAwQBAQEeCQcyFAkIAgQBDQUIiii0MYs5AQEBAQEBAQMBAQEBAQEBIYMog02BYYMnhEEBCwYCASCDQYIxBYcsgjyWBwKIWYYPhUmCAIlKhmeVcld/CygMWIdiQDaHb4EygQ4BAQE
X-IronPort-AV: E=McAfee;i="5700,7163,8606"; a="14013710"
X-IronPort-AV: E=Sophos;i="5.40,435,1496091600"; d="scan'208";a="14013710"
Received: from ry0cas1.citc.gov.sa ([10.11.13.6]) by mx1.citc.gov.sa with ESMTP; 30 Jul 2017 11:37:52 +0300
Received: from RY0MAIL1.citc.gov.sa ([2002:aca1:101::aca1:101]) by ry0cas1.citc.gov.sa ([::1]) with mapi id 14.03.0361.001; Sun, 30 Jul 2017 11:37:52 +0300
From: "Abdulaziz H. Al-Zoman" <azoman@citc.gov.sa>
To: 'Andrew Sullivan' <ajs@anvilwalrusden.com>, Raed Al-Fayez <rfayez@citc.gov.sa>
CC: "'i18n-discuss@iab.org'" <i18n-discuss@iab.org>, "Abdulaziz H. Al-Zoman" <azoman@citc.gov.sa>
Thread-Topic: [I18n-discuss] Comments on "troublesome-characters" from Arabic script
Thread-Index: AQHTBx0s5uX39HF4o063RBF2feNUQKJsD7HA
Date: Sun, 30 Jul 2017 08:37:52 +0000
Message-ID: <EDEC5B615F83D44981FA2D0DCA997167013170FA78@ry0mail1.citc.gov.sa>
References: <043D7B5CFC1AB8469108EA7BB5F68BB00124076934@ry0mail1.citc.gov.sa> <EDEC5B615F83D44981FA2D0DCA9971670131709366@ry0mail1.citc.gov.sa> <20170727211320.dkano7pdmjxoj62h@mx4.yitter.info>
In-Reply-To: <20170727211320.dkano7pdmjxoj62h@mx4.yitter.info>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.10.117.11]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18n-discuss/IGDuBtsW52C-mDWg0oZhEH4t-qg>
Subject: Re: [I18n-discuss] Comments on "troublesome-characters" from Arabic script
X-BeenThere: i18n-discuss@iab.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Internationalization Program Open Discussion List <i18n-discuss.iab.org>
List-Unsubscribe: <https://www.iab.org/mailman/options/i18n-discuss>, <mailto:i18n-discuss-request@iab.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18n-discuss/>
List-Post: <mailto:i18n-discuss@iab.org>
List-Help: <mailto:i18n-discuss-request@iab.org?subject=help>
List-Subscribe: <https://www.iab.org/mailman/listinfo/i18n-discuss>, <mailto:i18n-discuss-request@iab.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Jul 2017 08:38:03 -0000

Good day Andrew,

Thanks for your follow-up email to my feedback.


I do understand the goal of the internet draft as you've stated:
     "to create the conditions for guidance
      for operators and applications, so that
      it is possible for (for instance) my
      user agent to work globally, even when
      some parts of the global linguistic
      environment is foreign to me and when
      there are no in-protocol clues about
      the language I'm facing."

but I'm worried by the inclusion of "basic" code points (22 out of 28 Arabic language alphabet) to the repository instead of only the problematic code points (non-spacing marks). This would make the Arabic language unusable (or troublesome) to operators and developers. While, I would like to see some encouragement to support Arabic IDNs rather than shun them away from it.

Therefore,  I would suggest that the registry includes only the problematic sequence of code points that incorporate non-spacing marks but not the basic (and essential) characters. 

So I would suggest that the repository table looks like the following table (for example) where: Column 1 represent the problematic (sequence of) code points and Column 2 contains the Reasons and Comments. The Column 1 should NOT contain a basic code point alone without non-spacing mark causing the problem.

Column 1             | Column 2
---------------------+------------------
062F, 065C           | Identical in appearance to ...
---------------------+------------------
062F, 06EC           | Identical in appearance to ...
---------------------+------------------
0631, 06EC           | Identical in appearance to ...
---------------------+------------------
0633, 06DB           | Identical in appearance to ...


Yours,
Abdulaziz Al-Zoman






> -----Original Message-----
> From: Andrew Sullivan [mailto:ajs@anvilwalrusden.com]
> Sent: 28/Jul/2017 12:13 AM
> To: Abdulaziz H. Al-Zoman; Raed Al-Fayez
> Cc: 'i18n-discuss@iab.org'
> Subject: Re: [I18n-discuss] Comments on "troublesome-characters" from
> Arabic script
> 
> Greetings, and thank you both for the comments on the draft.
> 
> I am keen to understand something in what you are both saying, because it
> suggests to me that our intent in this draft is not coming across clearly.
> 
> On Mon, Jul 24, 2017 at 04:41:34AM +0000, Abdulaziz Al-Zoman wrote:
> 
> > For example, the registry includes some essential characters
> > (letters) that may result at the end useless identifiers if these
> > characters are restricted or blocked (because they are part of the
> > repository).   For instance, with respect to the Arabic language,
> > the registry consists of a large portion of the Arabic basic alphabet
> > that may result to a limited character set for creating identifiers
> 
> On Tue, Jul 25, 2017 at 08:20:28AM +0000, Raed Al-Fayez wrote:
> 
> > I regret to inform you that I completely opposing of including
> > essential code points of the Arabic language (which are also used by
> > many other languages in the Arabic script) to the repository tables of
> > this standards track internet-draft. As a reader of this
> > (later-to-be) RFC may consider them as "Troublesome Characters"
> > while they are not!
> >
> > The majority of the problematic cases listed in the internet-draft's
> > table (part of the Arabic code points) were due to the misuses of
> > non-spacing marks. Also, most of the cases are not used by any
> > language in the Arabic script and some of the others do not make any
> > sense. Therefore, it is not wise and not practical to risk essential
> > code points because of not solid cases.
> >
> 
> Do you both agree that the characters to which you are referring can
> properly be used for identifies only by people who actually understand how
> they relate to each other, and not without substantial care and
> understanding on the part of whoever is permitting the use or registering
> the identifier?  If not, do you think instead that the characters can be
> used without any policies at all?
> 
> The aim of the draft is to create the conditions for guidance for
> operators and applications, so that it is possible for (for instance) my
> user agent to work globally, even when some parts of the global linguistic
> environment is foreign to me and when there are no in-protocol clues about
> the language I'm facing.  (Examples of this sort of identifier are things
> like DNS names, mail names, XMPP chatroot names, and so on.  Web pages and
> the like have the ability to negotiate language, which makes the problem
> somewhat less acute.)  One thing we can do to make those conditions a
> little bit safer, I think, is to have evidence that whoever is in charge
> of the registration permissions actually has some policy or set of rules;
> and so my user agent could check to see that, even if I can't read the
> identifier reliably, the person responsible for its creation has followed
> some set of rules that prevents serious confusion from arising.  In other
> words, the registry is not intended to "block" all of the included code
> points, but rather to indicate that these are code points for which it is
> even more important than usual that extra care has been taken.
> 
> I am a little worried that the above does not seem to be the conclusion
> you drew from the draft, which suggests to me that we have not made
> ourselves clear enough.  Is that a fair assessment, or are you opposed to
> the inclusion of these code points instead because you do not think they
> warrant special attention?
> 
> Thanks and best regards,
> 
> A
> 
> --
> Andrew Sullivan
> ajs@anvilwalrusden.com