Re: [I18n-discuss] Comments on "troublesome-characters" from Arabic script

Andrew Sullivan <ajs@anvilwalrusden.com> Thu, 27 July 2017 21:13 UTC

Return-Path: <ajs@anvilwalrusden.com>
X-Original-To: i18n-discuss@ietfa.amsl.com
Delivered-To: i18n-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 921FE12EC46 for <i18n-discuss@ietfa.amsl.com>; Thu, 27 Jul 2017 14:13:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=yitter.info header.b=Gc3oRYrq; dkim=pass (1024-bit key) header.d=yitter.info header.b=eh2kGNXi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1bsZ-q_TKpaz for <i18n-discuss@ietfa.amsl.com>; Thu, 27 Jul 2017 14:13:24 -0700 (PDT)
Received: from mx4.yitter.info (mx4.yitter.info [159.203.56.111]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E52AA129A9F for <i18n-discuss@iab.org>; Thu, 27 Jul 2017 14:13:23 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mx4.yitter.info (Postfix) with ESMTP id 8DD31BE6F3; Thu, 27 Jul 2017 21:13:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yitter.info; s=default; t=1501190002; bh=U/ClmldQQLBDRllWfob1JFneXIgTMM6B6bj10y1l8es=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=Gc3oRYrqtvENw7Pnc+7PG8hBPGstg/SIAL1/Xejqgl4z0gw+Uc/aoA/pVsV3NSEYE uLZ2AlP8EBPfPr2BMB+mDkGnl7VU/dwvxBPKlDoPmxdjgTcRyJ3SVNS0Qd5dOOCf+k 6Wd156pRFgW56oe0kxBl9nEhrNIEBsFVTZ9WHOmQ=
X-Virus-Scanned: Debian amavisd-new at crankycanuck.ca
Received: from mx4.yitter.info ([127.0.0.1]) by localhost (mx4.yitter.info [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xc7Crnyedkre; Thu, 27 Jul 2017 21:13:21 +0000 (UTC)
Date: Thu, 27 Jul 2017 17:13:20 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yitter.info; s=default; t=1501190001; bh=U/ClmldQQLBDRllWfob1JFneXIgTMM6B6bj10y1l8es=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=eh2kGNXimHvq/1Nl+c0atX12YlForBTo5P2UtTMyH2Zqz0Uw2xuBUJwBwW3jRYcnC 3lJf2UNXcWdp5qMhP8Z4V+fcwjQt9/kU9YdEEgLY71usbKWY4EH3E+Bvu4c+BprZ4G cy7wH+bOgmcDr6fD7AgLvXYDriCmR31T2RGXMSPw=
From: Andrew Sullivan <ajs@anvilwalrusden.com>
To: Abdulaziz Al-Zoman <azoman@citc.gov.sa>, Raed Al-Fayez <rfayez@citc.gov.sa>
Cc: "'i18n-discuss@iab.org'" <i18n-discuss@iab.org>
Message-ID: <20170727211320.dkano7pdmjxoj62h@mx4.yitter.info>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <043D7B5CFC1AB8469108EA7BB5F68BB00124076934@ry0mail1.citc.gov.sa> <EDEC5B615F83D44981FA2D0DCA9971670131709366@ry0mail1.citc.gov.sa>
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18n-discuss/0t6tpegmmaZACWHHk7bCyYpchrQ>
Subject: Re: [I18n-discuss] Comments on "troublesome-characters" from Arabic script
X-BeenThere: i18n-discuss@iab.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Internationalization Program Open Discussion List <i18n-discuss.iab.org>
List-Unsubscribe: <https://www.iab.org/mailman/options/i18n-discuss>, <mailto:i18n-discuss-request@iab.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18n-discuss/>
List-Post: <mailto:i18n-discuss@iab.org>
List-Help: <mailto:i18n-discuss-request@iab.org?subject=help>
List-Subscribe: <https://www.iab.org/mailman/listinfo/i18n-discuss>, <mailto:i18n-discuss-request@iab.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Jul 2017 21:13:25 -0000

Greetings, and thank you both for the comments on the draft.

I am keen to understand something in what you are both saying, because
it suggests to me that our intent in this draft is not coming across clearly.

On Mon, Jul 24, 2017 at 04:41:34AM +0000, Abdulaziz Al-Zoman wrote:

> For example, the registry includes some essential characters
> (letters) that may result at the end useless identifiers if these
> characters are restricted or blocked (because they are part of the
> repository).   For instance, with respect to the Arabic language,
> the registry consists of a large portion of the Arabic basic
> alphabet that may result to a limited character set for creating
> identifiers

On Tue, Jul 25, 2017 at 08:20:28AM +0000, Raed Al-Fayez wrote:
 
> I regret to inform you that I completely opposing of including
> essential code points of the Arabic language (which are also used by
> many other languages in the Arabic script) to the repository tables
> of this standards track internet-draft. As a reader of this
> (later-to-be) RFC may consider them as "Troublesome Characters"
> while they are not!
> 
> The majority of the problematic cases listed in the internet-draft's
> table (part of the Arabic code points) were due to the misuses of
> non-spacing marks. Also, most of the cases are not used by any
> language in the Arabic script and some of the others do not make any
> sense. Therefore, it is not wise and not practical to risk essential
> code points because of not solid cases.
> 

Do you both agree that the characters to which you are referring can
properly be used for identifies only by people who actually understand
how they relate to each other, and not without substantial care and
understanding on the part of whoever is permitting the use or
registering the identifier?  If not, do you think instead that the
characters can be used without any policies at all?

The aim of the draft is to create the conditions for guidance for
operators and applications, so that it is possible for (for instance)
my user agent to work globally, even when some parts of the global
linguistic environment is foreign to me and when there are no
in-protocol clues about the language I'm facing.  (Examples of this
sort of identifier are things like DNS names, mail names, XMPP
chatroot names, and so on.  Web pages and the like have the ability to
negotiate language, which makes the problem somewhat less acute.)  One
thing we can do to make those conditions a little bit safer, I think,
is to have evidence that whoever is in charge of the registration
permissions actually has some policy or set of rules; and so my user
agent could check to see that, even if I can't read the identifier
reliably, the person responsible for its creation has followed some
set of rules that prevents serious confusion from arising.  In other
words, the registry is not intended to "block" all of the included
code points, but rather to indicate that these are code points for
which it is even more important than usual that extra care has been
taken.

I am a little worried that the above does not seem to be the
conclusion you drew from the draft, which suggests to me that we have
not made ourselves clear enough.  Is that a fair assessment, or are
you opposed to the inclusion of these code points instead because you
do not think they warrant special attention?

Thanks and best regards,

A

-- 
Andrew Sullivan
ajs@anvilwalrusden.com