Re: [I18ndir] Writing direction

John C Klensin <john-ietf@jck.com> Tue, 17 May 2022 14:07 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B6D4C159525; Tue, 17 May 2022 07:07:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.897
X-Spam-Level:
X-Spam-Status: No, score=-6.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f2LDdfOa8D12; Tue, 17 May 2022 07:07:02 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F0999C159521; Tue, 17 May 2022 07:07:01 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1nqxqm-000PGL-3W; Tue, 17 May 2022 10:07:00 -0400
Date: Tue, 17 May 2022 10:06:54 -0400
From: John C Klensin <john-ietf@jck.com>
To: Patrik Fältström <patrik@frobbit.se>
cc: i18ndir@ietf.org, art-ads@ietf.org
Message-ID: <39F2CBAA1F19DB765CC59369@PSB>
In-Reply-To: <D59F50F7-A266-48F3-AA78-DA46023033BD@frobbit.se>
References: <4C4A249559BA1E86B17E53FE@PSB> <D59F50F7-A266-48F3-AA78-DA46023033BD@frobbit.se>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/Oc6eIUJcwSkSUW8Y3dpnRbe9nCQ>
Subject: Re: [I18ndir] Writing direction
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2022 14:07:07 -0000

Patrik,

Yes.  The problem(s) are complicated any way one cuts them.  But
this month's problems are a bit different (some ways better,
some worse) from the discussions you cite.  First, today's
issues are associated with free-text strings and not just with
identifiers (which were the focus of your examples).  Second,
there is (finally?) fairly general recognition at this point
(except, perhaps, in the IETF -- a failure of this directorate
effort, IMO) that even identifying a script (much less a
language) as inherently RtoL (or Bidi) can be problematic.
Major scripts (by any reasonable definition) are a fairly
minimal problem: "everyone" knows what they are.  But, as
scripts appear in Unicode that are used by a tiny fraction of
the world's population, it gets harder.. and, at least in
principle, can be made harder yet by a few scripts that can be
written either way (or vertically, or serpentine, or...).  So it
has become clear that there are many situations in which
explicit identification of directionality, as well as language,
is needed.

The situation that prompted my note was an IETF protocol that is
under development.  It uses free text strings in one context,
strings that may need to be shown to users, but the protocols
has no provision for markup or its relatives.  The WG has
concluded it needs to identify language but has questions about
how to handle identifying directionality, ideally using a
{language, string} or {language,direction,string} tuple rather
than sliding toward markup.  And this was the third or fourth
case like this to come to my attention in the last couple of
years so "figure out something ad hoc" is no better an answer
then trying to do IDNA without Unicode Bidi or even some
variations on it.

best,
   john




--On Tuesday, May 17, 2022 08:08 +0200 Patrik Fältström
<patrik@frobbit.se> wrote:

> Let me link to a few blog posts I did a few years ago (2008):
> 
> <https://www.paftech.se/node/681/>
> <https://www.paftech.se/node/682/>
> <https://www.paftech.se/node/683/>
> 
>    Patrik
> 
> On 16 May 2022, at 17:35, John C Klensin wrote:
> 
>> Hi.
>> 
>> I have recently been asked (not for the first time even this
>> year) about whether it is really, really, necessary to
>> specify directionality with a language specification for
>> non-identifier strings that might be displayed for non-expert
>> human use.   And, also not for the first time, the person
>> asking has expressed a preference that getting to an answer
>> not involve a flame-fest on the IETF, Last Call, or other
>> list, especially among people with more opinions than
>> knowledge.
>> 
>> I have a few questions and, because progress on alternatives
>> seems to be moving forward at a pace that would make "glacial"
>> seem speedy, want to at least pretend that this directorate
>> list is a place where strategy can be discussed, not just
>> reviews performed.
>> 
>> This is just an impression but, from the questions I've been
>> asked, it appears that we have at least three problems:
>> 
>> (1) The oral tradition and rumors that seem to be
>> substituting for the checklist (draft presumably still in
>> queue somewhere) or general I18N advice from "us" have gotten
>> to the point of including advice that language information
>> (sometimes specifically in the form of BCP 47 tags [RFC
>> 5646]) should be specified.  However, directionality is not
>> mentioned.   So we leave folks in the position of trying to
>> put language tags in and then, during the review process, hit
>> them with a requirement for directionality.  By then, the
>> potential for adding that information means changing whatever
>> was planned for the protocol, sometimes in a drastic way.  In
>> addition, that process often feels to those doing the design
>> or document-writing as if we are playing "gotcha" with them.
>> 
>> (2) PRECIS (RFC 8264) discusses the need for special cases
>> for languages that are unusual in one way or another, and
>> discusses directionality as a profile issue in Section 5.2.5.
>> However, for the FreeformClass, it does not move past Unicode
>> bidi (UAX9).  More generally, it does not even provide for
>> language tagging, much less directionality indications in the
>> strings it specifies rather than relying on the IDNA-derived
>> model of "this stuff is dangerous, don't use it".   That may
>> (still?) be reasonable for identifiers, especially for
>> identifiers that we hope will not be seen by users, but it is
>> not for arbitrary strings that are likely to be displayed to
>> end users (whether that is explicitly intended or not.
>> 
>> In addition, PRECIS and other work (including IDNA,
>> charmod-norm [1], and even the rules for matching language
>> tags [2]) have tended to focus on string matching/comparison,
>> i.e., determining whether two strings are equivalent, rather
>> than on display and presentation.  Sometimes, probably more
>> often than we might wish, the latter turns out to be
>> important.
>> 
>> (3) The assumption that directionality can be deduced from
>> knowing the language, or even the language and script, is
>> definitely not sufficient.  For those not familiar with the
>> issues, a W3C document [3] makes good reading.
>> 
>> 
>> Suggestion: It is probably time to extend BCP 47 to include a
>> directionality extension.   That would allow us to tell
>> people, not just that they should provide language
>> information (probably via BCP 47 tags) and then figure out
>> how to express directionality, but to specify that the
>> directionality SHOULD (and in selected cases, MUST) be
>> included along with the primary language subtag. While the
>> W3C document cited above [3] recommends against adding such
>> an extension (except as private use), I suggest that advice
>> is more applicable to upgrades of existing protocols,
>> especially markup-based ones, than to new, or nearly new,
>> work being developed in the IETF and that, without a
>> (public)extension or equivalent, things are getting (or have
>> long-ago gotten) out of hand.
>> 
>> Thoughts from others about this would be appreciated.  If it
>> seems like a reasonable idea, input from both the directorate
>> and the ADs about how to proceed efficiently would be welcome.
>> I'm willing to do some writing (some of the text above might
>> appear in an I-D), but would need others to carefully check
>> what I produce.  Equally important, we'd need a plan about
>> how to move things along: I doubt that the WGs and authors
>> who have been asking, and who have deadlines of their own,
>> would consider waiting until July, bringing this up in
>> DISPATCH, and then going around in circles for a few
>> additional months (or longer) to be helpful or constructive.
>> 
>> best,
>>    john
>> 
>> p.s. Some of what appears above is the result of several
>> conversations, including one with Addison Phillips, but also
>> with people who seem to be uncomfortable being identified at
>> this point.   Any bad ideas or misunderstandings are probably
>> mine; others will get more explicit credit for the good ones
>> as (and if) things move along.
>> 
>> 
>> [1] https://www.w3.org/TR/charmod-norm/
>> 
>> [2] RFC 4647
>> 
>> [3]
>> https://www.w3.org/International/questions/qa-direction-from-
>> language
>> 
>> -- 
>> I18ndir mailing list
>> I18ndir@ietf.org
>> https://www.ietf.org/mailman/listinfo/i18ndir