Re: [I18ndir] Writing direction

John C Klensin <john@jck.com> Tue, 31 May 2022 12:33 UTC

Return-Path: <john@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55801C157B57 for <i18ndir@ietfa.amsl.com>; Tue, 31 May 2022 05:33:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.904
X-Spam-Level:
X-Spam-Status: No, score=-6.904 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3mdNACpKx7T7 for <i18ndir@ietfa.amsl.com>; Tue, 31 May 2022 05:33:43 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4F5BEC15790B for <i18ndir@ietf.org>; Tue, 31 May 2022 05:33:42 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john@jck.com>) id 1nw148-000JgS-4S; Tue, 31 May 2022 08:33:40 -0400
Date: Tue, 31 May 2022 08:33:34 -0400
From: John C Klensin <john@jck.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, i18ndir@ietf.org
Message-ID: <173B8D62F1466BAC0E0D9BC6@PSB>
In-Reply-To: <62afc020-eeb8-f880-531a-e0d4e2a81a05@it.aoyama.ac.jp>
References: <4C4A249559BA1E86B17E53FE@PSB> <D59F50F7-A266-48F3-AA78-DA46023033BD@frobbit.se> <39F2CBAA1F19DB765CC59369@PSB> <F6E64852-5CA0-432C-90D3-9DA7D3CCCE69@frobbit.se> <F3072E6B0F1EF9E2951E4D3D@PSB> <CA6F6D68-D83F-46CC-B949-218915ACD116@frobbit.se> <d0a966fd-b947-8d40-29dc-eed88a8a64c9@ix.netcom.com> <62afc020-eeb8-f880-531a-e0d4e2a81a05@it.aoyama.ac.jp>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/RR5ED0HAx1cqmwJ3_QVW8b-HG8I>
Subject: Re: [I18ndir] Writing direction
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 May 2022 12:33:47 -0000

Martin,

I'll need more time to absorb the details of your comments and
respond... and this topic may be dead for procedural reasons,
but two very quick comments:

(1) This is not about "protocol elements" as the IETF typically
defines that term at all, nor about single tokens.  It is about
defining/marking the basic/ primary/ initial directionality of
blocks, typically multi-"word" blocks of essentially free text.

(2) My guess --but, without further thought it is only a guess--
is that referring to this as "bidi" is likely to add to the
confusion of those with little knowledge and/or unwary.

Again, unless something changes dramatically in some positive
direction for i18n work in the IETF, the proposal is dead
regardless of its merits.   AFAICT, the trend is in the other
direction and this review team (misnamed "directorate") is not
part of the solution and may be part of the problem.  So this
discussion is for our mutual education, not about any possible
piece of protocol work.

best,
   john


--On Tuesday, May 31, 2022 18:05 +0900 "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:

> Hello everybody,
> 
> Sorry I'm quite late with my reply. My conclusion is that
> it's most probably not worth creating a bidi extension for
> language tags. See below for the reasons.
> 
> On 2022-05-18 14:32, Asmus Freytag wrote:
>> On 5/17/2022 9:57 PM, Patrik Fältström wrote:
> 
>>> Ok, obviously without knowing the complete context here I
>>> would say  that first of all the big problem is mixing
>>> protocol parameters with  display. I call this "leakage". We
>>> see this in DNS where a domain name  is visible to the user.
>>> We see it in other parts of a URI, an email  address etc.
>>> Oh, email address is a perfect example. It does have a 
>>> "free text" name, and then an address. But many people want
>>> to use  their name as an email address which leads to
>>> collisions and other  things. This while applications that
>>> only show the name have similar  security risks like text
>>> that is a link that people click on might  have a
>>> destination that is not what the end user guesses or
>>> believes.
>>> 
>>> To the "free text".
>>> 
>>> To me there are two issues here:
>>> 
>>> 1. Display is very important to the end user. We have the
>>> context  within which the sender of the text has, and the
>>> context of the  receiver of the text. If a text is to be
>>> displayed we even without  talking about general
>>> directionality (that do impact rendering) we  have the issue
>>> of mixing two contexts. Even if I have some clue about  i18n
>>> I have very to no knowledge about the same text, same
>>> script,  same language is possible to display with different
>>> directionality. I  believe some asian scripts can do this,
>>> and for example hebrew. So the  first question is what
>>> problem is to be solved. I guess it is "to have  the
>>> receiver understand what general directionality the sender
>>> of the  text decides". The receiver can then display in
>>> whatever  directionality context the sender wants.
>>> 
>>> 2. Second question is whether general directionality is a
>>> degree of  freedom that is really needed in this protocol. I
>>> think it is really  really really important to agree this
>>> *is* important. And I mean that  it is much more important
>>> than deciding that "the free text in this  protocol has a
>>> directionality context that is R2L", or L2R for that 
>>> matter. I.e. that this protocol element (because it is a
>>> protocol  element after all, even if the element contains
>>> "free text"). If the  string is short, I claim one can
>>> create a string with the help of  directionality is like if
>>> the general directionality was the opposite  of what the
>>> general directionality is.
>>> 
>>> 3. If the answer to the second question is that one can
>>> absolutely not  have a given directionality, I still think
>>> one should not give up. One  can still say that "the
>>> directionality of the free text element is  R2L", with the
>>> addition "if the free text element is to be a L2R  context,
>>> then the first character of the element MUST be U+2066 
>>> "Left-To-Right Isolate".
> 
> A side remark: Because LTR is much more widely used, I think
> the default should be LTR, and RTL should be marked with
> RIGHT-TO-LEFT ISOLATE (U+2067). Also, if you put that in, then
> please also put in a POP DIRECTIONAL ISOLATE
> U+2069 at the end.
> 
> But in very many cases, simply requiring the use of
> first-strong (auto) isolation when including the protocol
> element in a bigger context may be okay. The use of
> first-strong isolation guarantees that all the text is kept
> together in one junk (possibly flowed across multiple lines
> depending on length).
> 
> <snip>
> 
>> embedding bidi controls into protocol text data is ugly,
>> because they  end up, sooner or later, embedded in the
>> plain-text backbone of an HTML  page. (I'm sure that's a law
>> that's already named by someone out there).
> 
> Yes, fully agreed, it's ugly. But it's not uglier than a
> language tag which includes directionality information ending
> up in an HTML page. In addition to that, having one kind of
> ugliness is clearly better in my eyes than having two separate
> kinds of ugliness. Also, the W3C Internationalization activity
> already has an article discussing this case:
> https://www.w3.org/International/questions/qa-bidi-unicode-con
> trols.
> 
> Also, please note the following: If we introduce some kind of
> bidi extension for language tags, whenever the protocol
> elements in question are put in some context, we have to
> remove that extension. Either the protocol element gets added
> to plain text, in which case we have to add bidi controls, or
> it gets added to (HTML) markup, in which case, we have to add
> markup (and remove the extension). It doesn't necessarily look
> like good protocol design to create something that has to be
> removed as soon as it's actually used :-(.
> 
> <snip>
> 
>> Unlike all the other presentational markup that exists to
>> affect  text-layout, the bidi direction is special in that it
>> affects things  like the order of first and last name or any
>> other elements where "order  in the sentence" affects the
>> meaning (and not just the appearance) of  the text.
> 
> I agree on this point, of course.
> 
> <snip>
> 
>> What are the types of texts that show up in IETF protocols?
> 
> That's indeed the most important question in this discussion.
> Things may range from protocol elements that occasionally have
> to be checked by a human to stuff that's essentially intended
> for humans only. They may also range from very short
> word-like stuff to very long texts.
> 
> Solutions range from restricting the protocol element to using
> a single script (similar to the restrictions we have on DNS
> labels) to requiring that the protocol element be included in
> <bdi></bdi> when added to markup (see the example in
> https://html.spec.whatwg.org/#the-bdi-element) to the advice
> "you'd better allow markup here anyway".
> 
> This means that the range of 'protocol elements' where a bidi
> extension to language tags might make sense is actually not
> very wide. Combining this with the fact that it has to be
> removed as soon as it's used somewhere, the idea of a bidi
> extension to language tags looks pretty much like a nonstarter
> to me. Sorry it took me so long to get to that conclusion.
> 
> Please also note that depending on the nature of the protocol
> element, language information may not be needed. As an
> example, there were discussions about adding language
> information to domain name labels when domain name
> internationalization was undertaken. But it was easy to
> convince people that it doesn't make sense. There is no
> difference between an English version of the label 'chat' and
> a French version, although their pronunciation and meaning
> differs depending on the language. What will happen is that
> each user will interpret and pronounce the label depending on
> their language abilities and the context it is in, and that's
> it. That may apply to cases where bidi information (or at the
> least, bidi isolation) is still highly desirable. Another
> reason for not adding a bidi extension to language tags.
> 
> 
> Regards,    Martin.