Re: [I18ndir] Writing direction

Asmus Freytag <asmusf@ix.netcom.com> Tue, 17 May 2022 16:50 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 72401C15E3E9 for <i18ndir@ietfa.amsl.com>; Tue, 17 May 2022 09:50:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.752
X-Spam-Level:
X-Spam-Status: No, score=-3.752 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-1.857, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2XRK-ZH-IHx9 for <i18ndir@ietfa.amsl.com>; Tue, 17 May 2022 09:50:00 -0700 (PDT)
Received: from nmtao101.oxsus-vadesecure.net (mta-101b.oxsus-vadesecure.net [51.81.61.61]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BCBB0C159824 for <i18ndir@ietf.org>; Tue, 17 May 2022 09:50:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=ctiaxxRA1IUYAa9rIiwEG69aWO3kgWOzVrs9Aq FX4to=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1652806197; x=1653410997; b=jDbeHOksA2RXT4ghrtWEzDGLjRIg0imujKFlSGtXkfKv/+VgKxODAkw ZPDfJUhBjhPZUHXsIXv/kP5u+wA1VX87d4kfaxD1+ea92oFnX00fPcPl+4MBMtKZWa3bPe9 OHI3uuw14Cr++/Pc7jjCfgtidRTpJVzQ9RVKEMar8Xx1+0CuJeibux/qRbNdPL2eLdabIrN HZM+BK5ItpbBHCfSKbMtHbokubMfdTmKDjl+K6mVzKOGc+dQnVE3WRudW29nsSdBOorZZPu Y5T3KQZdyPmgPzzOuV02cvnVO1vfqKE3cRkeREgv1NJgbWsoOCNU7nXcT+XDvcGoUHh82Zr NwQ==
Received: from [10.71.219.206] ([142.147.89.249]) by smtp.oxsus-vadesecure.net ESMTP oxsus1nmtao01p with ngmta id 008dc696-16eff27acd40711e; Tue, 17 May 2022 16:49:57 +0000
Message-ID: <6e8cd035-230f-0603-f45e-1328c673aa5e@ix.netcom.com>
Date: Tue, 17 May 2022 09:49:56 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0
Content-Language: en-US
To: i18ndir@ietf.org
References: <4C4A249559BA1E86B17E53FE@PSB> <D59F50F7-A266-48F3-AA78-DA46023033BD@frobbit.se> <39F2CBAA1F19DB765CC59369@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <39F2CBAA1F19DB765CC59369@PSB>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Authentication-Results: oxsus-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/1V0Bv6CYtDE8dHKwZV85WGM_PNY>
Subject: Re: [I18ndir] Writing direction
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2022 16:50:04 -0000

John,

I'm not sure the directionality extension should cover "vertical" or 
"serpentine". In both of these, there's no ambiguity where any code 
point fits in the display order relative to the backing store order.

The same is not true of bidi text. Changing the "paragraph direction" 
affects the bidi algorithm in ways that are not a simple order reversal 
of the entire text. Therefore, the bidi direction can affect the 
reader's understanding of the text, which is what this extension should 
focus on, not general layout instructions (which more properly belong to 
a rich-text environment).

A./


On 5/17/2022 7:06 AM, John C Klensin wrote:
> Patrik,
>
> Yes.  The problem(s) are complicated any way one cuts them.  But
> this month's problems are a bit different (some ways better,
> some worse) from the discussions you cite.  First, today's
> issues are associated with free-text strings and not just with
> identifiers (which were the focus of your examples).  Second,
> there is (finally?) fairly general recognition at this point
> (except, perhaps, in the IETF -- a failure of this directorate
> effort, IMO) that even identifying a script (much less a
> language) as inherently RtoL (or Bidi) can be problematic.
> Major scripts (by any reasonable definition) are a fairly
> minimal problem: "everyone" knows what they are.  But, as
> scripts appear in Unicode that are used by a tiny fraction of
> the world's population, it gets harder.. and, at least in
> principle, can be made harder yet by a few scripts that can be
> written either way (or vertically, or serpentine, or...).  So it
> has become clear that there are many situations in which
> explicit identification of directionality, as well as language,
> is needed.
>
> The situation that prompted my note was an IETF protocol that is
> under development.  It uses free text strings in one context,
> strings that may need to be shown to users, but the protocols
> has no provision for markup or its relatives.  The WG has
> concluded it needs to identify language but has questions about
> how to handle identifying directionality, ideally using a
> {language, string} or {language,direction,string} tuple rather
> than sliding toward markup.  And this was the third or fourth
> case like this to come to my attention in the last couple of
> years so "figure out something ad hoc" is no better an answer
> then trying to do IDNA without Unicode Bidi or even some
> variations on it.
>
> best,
>     john
>
>
>
>
> --On Tuesday, May 17, 2022 08:08 +0200 Patrik Fältström
> <patrik@frobbit.se> wrote:
>
>> Let me link to a few blog posts I did a few years ago (2008):
>>
>> <https://www.paftech.se/node/681/>
>> <https://www.paftech.se/node/682/>
>> <https://www.paftech.se/node/683/>
>>
>>     Patrik
>>
>> On 16 May 2022, at 17:35, John C Klensin wrote:
>>
>>> Hi.
>>>
>>> I have recently been asked (not for the first time even this
>>> year) about whether it is really, really, necessary to
>>> specify directionality with a language specification for
>>> non-identifier strings that might be displayed for non-expert
>>> human use.   And, also not for the first time, the person
>>> asking has expressed a preference that getting to an answer
>>> not involve a flame-fest on the IETF, Last Call, or other
>>> list, especially among people with more opinions than
>>> knowledge.
>>>
>>> I have a few questions and, because progress on alternatives
>>> seems to be moving forward at a pace that would make "glacial"
>>> seem speedy, want to at least pretend that this directorate
>>> list is a place where strategy can be discussed, not just
>>> reviews performed.
>>>
>>> This is just an impression but, from the questions I've been
>>> asked, it appears that we have at least three problems:
>>>
>>> (1) The oral tradition and rumors that seem to be
>>> substituting for the checklist (draft presumably still in
>>> queue somewhere) or general I18N advice from "us" have gotten
>>> to the point of including advice that language information
>>> (sometimes specifically in the form of BCP 47 tags [RFC
>>> 5646]) should be specified.  However, directionality is not
>>> mentioned.   So we leave folks in the position of trying to
>>> put language tags in and then, during the review process, hit
>>> them with a requirement for directionality.  By then, the
>>> potential for adding that information means changing whatever
>>> was planned for the protocol, sometimes in a drastic way.  In
>>> addition, that process often feels to those doing the design
>>> or document-writing as if we are playing "gotcha" with them.
>>>
>>> (2) PRECIS (RFC 8264) discusses the need for special cases
>>> for languages that are unusual in one way or another, and
>>> discusses directionality as a profile issue in Section 5.2.5.
>>> However, for the FreeformClass, it does not move past Unicode
>>> bidi (UAX9).  More generally, it does not even provide for
>>> language tagging, much less directionality indications in the
>>> strings it specifies rather than relying on the IDNA-derived
>>> model of "this stuff is dangerous, don't use it".   That may
>>> (still?) be reasonable for identifiers, especially for
>>> identifiers that we hope will not be seen by users, but it is
>>> not for arbitrary strings that are likely to be displayed to
>>> end users (whether that is explicitly intended or not.
>>>
>>> In addition, PRECIS and other work (including IDNA,
>>> charmod-norm [1], and even the rules for matching language
>>> tags [2]) have tended to focus on string matching/comparison,
>>> i.e., determining whether two strings are equivalent, rather
>>> than on display and presentation.  Sometimes, probably more
>>> often than we might wish, the latter turns out to be
>>> important.
>>>
>>> (3) The assumption that directionality can be deduced from
>>> knowing the language, or even the language and script, is
>>> definitely not sufficient.  For those not familiar with the
>>> issues, a W3C document [3] makes good reading.
>>>
>>>
>>> Suggestion: It is probably time to extend BCP 47 to include a
>>> directionality extension.   That would allow us to tell
>>> people, not just that they should provide language
>>> information (probably via BCP 47 tags) and then figure out
>>> how to express directionality, but to specify that the
>>> directionality SHOULD (and in selected cases, MUST) be
>>> included along with the primary language subtag. While the
>>> W3C document cited above [3] recommends against adding such
>>> an extension (except as private use), I suggest that advice
>>> is more applicable to upgrades of existing protocols,
>>> especially markup-based ones, than to new, or nearly new,
>>> work being developed in the IETF and that, without a
>>> (public)extension or equivalent, things are getting (or have
>>> long-ago gotten) out of hand.
>>>
>>> Thoughts from others about this would be appreciated.  If it
>>> seems like a reasonable idea, input from both the directorate
>>> and the ADs about how to proceed efficiently would be welcome.
>>> I'm willing to do some writing (some of the text above might
>>> appear in an I-D), but would need others to carefully check
>>> what I produce.  Equally important, we'd need a plan about
>>> how to move things along: I doubt that the WGs and authors
>>> who have been asking, and who have deadlines of their own,
>>> would consider waiting until July, bringing this up in
>>> DISPATCH, and then going around in circles for a few
>>> additional months (or longer) to be helpful or constructive.
>>>
>>> best,
>>>     john
>>>
>>> p.s. Some of what appears above is the result of several
>>> conversations, including one with Addison Phillips, but also
>>> with people who seem to be uncomfortable being identified at
>>> this point.   Any bad ideas or misunderstandings are probably
>>> mine; others will get more explicit credit for the good ones
>>> as (and if) things move along.
>>>
>>>
>>> [1] https://www.w3.org/TR/charmod-norm/
>>>
>>> [2] RFC 4647
>>>
>>> [3]
>>> https://www.w3.org/International/questions/qa-direction-from-
>>> language
>>>
>>> -- 
>>> I18ndir mailing list
>>> I18ndir@ietf.org
>>> https://www.ietf.org/mailman/listinfo/i18ndir
>