Re: [I18ndir] Writing direction

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Tue, 31 May 2022 09:05 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D0318C14F724 for <i18ndir@ietfa.amsl.com>; Tue, 31 May 2022 02:05:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.786
X-Spam-Level:
X-Spam-Status: No, score=-3.786 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-1.876, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7g3V-oiaL9nm for <i18ndir@ietfa.amsl.com>; Tue, 31 May 2022 02:05:46 -0700 (PDT)
Received: from JPN01-TYC-obe.outbound.protection.outlook.com (mail-tycjpn01on20713.outbound.protection.outlook.com [IPv6:2a01:111:f403:7010::713]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF4BBC14F722 for <i18ndir@ietf.org>; Tue, 31 May 2022 02:05:46 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ACxUjmuIiFb3a2R/eeMA2X8J2qBB3zK9pJp2INPlRw6VFepMyx8j3OzXcrYyZl8EQ+mTEYKTmmUyPpBKfRfeD3XqQrU57WK+EpF4/UHpk054KUiYHSm7NscHDwNOJRyQCMmNEroEA6vx4Pk1hkbxowq5v4V67YQTohsLKtgT1kLv6ip3bbupaxR5cUgIX3YGJER5dSbJX/qq5LuqUOKP8ixanhMQ+w7gkKyejqLJCkoHbYcJC4xz3NBMO1hhgGJcSNmnsK7xNZsKd9hHjKO6mBdb8BmZLl7TrkAXQ1qptlHOxB5xuiZ1il4Pipf9Ptq0PH0+Yf8dw7lEY8RhjOQ9eA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yT5pL/9jDizq23kK6u4MJEhQ3x5Yn69VYvPiLhN8URg=; b=RZjCe5iYNVi8fChM41z2o8oppTFvDO5Rd3FahvSHQN89DFQVtv7Il4a9237wdArd9izf51DURbQWwLhFIrSiLoQoHrxqQ3qcUWCkFId67Y/0S7wFytiqq97UAwCGh1luNOIz7hn+MN3diSKYXU4nWDIMpcVeqogc9b3zz71X2PvMOX+2t/NERs1XapFiicWZIAbE+EaLyJxSAc6sfwPDk9aXXnlP7U5ksEdFmhonlPd/19rhhMfXGPgH9QXspt9mDuRtFmfv4Aw/Jv/qXBnTp943Ij/5yQvBhgrfxZDStPGRjFjPKYTtGJBXGA3iDFi0K6tuF4PKxhy3ZrSKauZp8A==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=it.aoyama.ac.jp; dmarc=pass action=none header.from=it.aoyama.ac.jp; dkim=pass header.d=it.aoyama.ac.jp; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector2-itaoyama-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yT5pL/9jDizq23kK6u4MJEhQ3x5Yn69VYvPiLhN8URg=; b=jxZ05qlMqDnxTZ+5eP1kEp2i9dopGS4kGIn3y/lJpWazPkPtetzEqE84JXSCuFmNGXRqgO6gIqb8dr3XY+I2yo3U6G4ojEH689ndcVaSiakpn9T4hmE56dLmjnRELlPu+6msYYoQmkmf27TWypGu+kf4RQfBMHQDa775gVD5tGA=
Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=it.aoyama.ac.jp;
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7) by TYAPR01MB4432.jpnprd01.prod.outlook.com (2603:1096:404:124::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5293.13; Tue, 31 May 2022 09:05:41 +0000
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::e587:9d9a:d780:ef39]) by TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::e587:9d9a:d780:ef39%8]) with mapi id 15.20.5293.019; Tue, 31 May 2022 09:05:41 +0000
Message-ID: <62afc020-eeb8-f880-531a-e0d4e2a81a05@it.aoyama.ac.jp>
Date: Tue, 31 May 2022 18:05:39 +0900
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0
Content-Language: en-US
To: i18ndir@ietf.org
References: <4C4A249559BA1E86B17E53FE@PSB> <D59F50F7-A266-48F3-AA78-DA46023033BD@frobbit.se> <39F2CBAA1F19DB765CC59369@PSB> <F6E64852-5CA0-432C-90D3-9DA7D3CCCE69@frobbit.se> <F3072E6B0F1EF9E2951E4D3D@PSB> <CA6F6D68-D83F-46CC-B949-218915ACD116@frobbit.se> <d0a966fd-b947-8d40-29dc-eed88a8a64c9@ix.netcom.com>
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
In-Reply-To: <d0a966fd-b947-8d40-29dc-eed88a8a64c9@ix.netcom.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: TYCPR01CA0117.jpnprd01.prod.outlook.com (2603:1096:405:4::33) To TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 6475a264-ec7d-4449-d262-08da42e4bc58
X-MS-TrafficTypeDiagnostic: TYAPR01MB4432:EE_
X-Microsoft-Antispam-PRVS: <TYAPR01MB443236E5BD3FC07EE6BF1700CADC9@TYAPR01MB4432.jpnprd01.prod.outlook.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: AU4w1ue4kELdSkL/ETHbrutguz7glZZPeQ0L1P0DdUkPE7QCVfG1a3hL3rpsdl8u7FC5Gu0/fmVAFZMhlO8IB1pP4Vfd3s0uEck8PYt7XkckBxh84C5bn0lcnN0mtR6A82lr87HbI8v6yEkQxTzfAtaAf4XH3NpjdIbIBH4aHDOhTt8Mx0e8rPNdc+e9uBC+6WXq7CM7Rg5qzSDt6Z07w6hBcgXlDPoET0MfcZDG1m6prrV2hB3GfLBpNxToQiS9pbGmBjV5pUTMTSf9cTemdHZH/zim6xgV+MNlL2wsYmpqHwesewHmbXz2YJWUxAeTyZSBnERti0nIIoUV3K7hFp5ksSNcRuM9orb6B+TSDVtYRyPD8DjWtzKVIUB0r+lhwklk0zvtmWsI+zMDwI3Q4ddKv5DMLEE/Up7vMigJkcsnvQfpFgyIHBn+Yc+XNi5495XwlPXFWRkNx9xQLYrH15dxZKNH+19WLDw/S/ySnGXKiHz9mpu1j2K6mPb1siqPeTWTFoUn5/Gi6kTud84TPj/vL7J7mS3aq7T6oeVIFfXmfgXEsx5QNJpmXGXFP0FlItKVvAo3LaLKHgWilnAS5oaEHKNfPS3UMMLHae7aB/AwO264WvIzjh8rLrnqVIajhwTmPggkJAtgLlqqn839T+9odTdvyTjCavmlDwwqiua93hCe6fWytXwcV/Ss0zpKGPYCqMyA7ehopEzK9wKcURsaqTuHr2xy13YDiKNmvqq6AeMRXcLoqINSB9BmQUqjUdBbuM6UdIiZ7iaj6n0AIMzVKrrRTmfJLenxSKdcLgfTm/vEnPpaSYcYHCuz9lKOyAodbaPIXVcqDI+v/g9Wio0f4gG8QbOfYU1ZhNERt3E=
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB5689.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(376002)(366004)(396003)(39850400004)(346002)(136003)(6512007)(5660300002)(508600001)(6486002)(8936002)(966005)(26005)(52116002)(186003)(31696002)(6506007)(2616005)(41300700001)(53546011)(86362001)(2906002)(66574015)(8676002)(83380400001)(36916002)(6916009)(38100700002)(31686004)(38350700002)(66946007)(66476007)(66556008)(316002)(786003)(41320700001)(45980500001)(43740500002); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: ISJlrxjVjT1hUxRHrflOFGTWq0EIrBejQF8Hy9pRwpAwxKMD2qq7aWz6J/eQ+c2sy6BaffdA7Br3bSNoM+Y1WKw51hZvPOCdvOg1ry0P1pwrrTSFROgvsBXwreu7bYtKLe1XYUeV5Dk73ATRx+eG94FwOeSgawdAv36V22TUGq+lLmzzU8JWx7CItxTqVTptqZFilTUZKrO71QoCVhXFNzr/00WpyGIpruY7O9pR538mJP4uY+7tW4Mek97giMtBV6IC3TZ86FrnYEk7tYGcOOw8xM5dxyIWq+S+7o8pvzPBvDvko+pTFWORW3zCkdoz6gy4ND5YT2GoXe+sgosxBrOgn63Tgeszgr1AEiDV2nvKLQZWtIvGyiKpzDkKC7cE8TAQ8rBrOTuiCMMdyFNlAiGh8hs3rgDglPGbf14rOwyqK8CL/uaQ8jkw7/lf+3XMcZcYy0AT5F2O+9/ykYyp56loO+Hklr4vQZbT6VfH4pLUx2DOzK0EbHFa2z6WUyeTRuHUi8dqXoBFoX8+90jJ3XtYVKUP09MccH7Q76NdCMQWED1hDGEBnlCCTmvEcgQR+DPs/1WS/5wtuirEQ9TOWy1HJByD+hHcx0pcIXq04pBMxWeg7J+LxlLxHAvf0mNZK1ANCl7MXNj3svg6EmdcyjWg3Cslk/hf/kt6KXfcfov2pukcGhtwEU/E+gXbdfRoiZsS7123P7yzgDzDDs1u+U4zu5gHfiT2zdgXia206hFw6XqrNdFlMTbryojRjWdA/f7afnP0o0ZCflTVUM3yqdgmykOz3DMKtAp6kz3al2BfoQh4MT1bIydYS2q5BVEmwVx+x5S3Rm1Vwo88xVLJkirVcM3eD1S6Ey7US8KXxl2hhdOjfm3zF4FV3i4rHIT8lvqNMMrrYcL+jomLH0ugILheSeJm1A+3xgTJybWJglEZwvDApC2eQqZuFRXjARmSCTbTkAfG133cJDN9uisTduNWbEWixTqTzGLrvy/7hCyBBF1rB/Jp7bslSQc0XyufLCvM+N8YjZMFx5dxrDIB1WPACHu+E8yJZp1Oq/e2cG3bgSvG+Oxx57iM0A6KnOYxukOvKEJ4t/xVxebRp58ZOPb4OwLsWrQshA7qD5CWfN+q3VcVx9fERzbT8QOh8PY9WL+Zw16JncQwJtLAT+cYNSE29/W0plu2gI3dT44VsoKacTU6KpBneLKsUNYpBYAF56Pr7XaJJnlsSEFd8bl6o0fNp+Br5kL+fxUuCmkA6h0i0jF35yfh5Ubp26wxoJtuoL3yBCZe8r8rQdtsECJT29Mnn7HpYfx+wCosdml4tt687VP1dJHXnHeyqyUAxaSFT+j61Mq5sjYdnliWTaKpjA0Ao5EthNbUaXLlAdWFqZkI6sFr9CzbaGEYojZYTi14SEXit5RaiDDmbXnbi9Nj93d4ZxsWEDxYq2Web1KyU9VugX9/DYiJ+W+gyrOzoUbbDxIMu+c/k848QWI7YDZieoMyyeRFHISjjtcIMelfc1cxhK7Wh2ZyhBMIkoz5idscrnm3+s7RTNrdun5EsgERnRezqvZM1uXCnZQBHZbhA4NafJ4VT6aMi+SSoGZHkeGIMbbpFE8UyY/sgY5VrtWf7a+Uv8NFcYtcfc9Jm5PNTNyVzfscb38YChzNDvOus2phbeArfsyxgiFo9KAfvIXes6BCaIlyViL4S8QhuXpGbQ8o+JoT8UR3oPYEHLWycPuIPFezFoRVk8DyloUXrGZMhSp5/ij+KXgSsxSYUE0kb0Q=
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: 6475a264-ec7d-4449-d262-08da42e4bc58
X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB5689.jpnprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 May 2022 09:05:41.5477 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: E9RHqEHnvIK7KGKiOs7uU4Dl1AJN7Az73sqnVar5S9K4BqLM/NEmoDorkOlThnd7PB0WtwtsgpJmIx48xvv3QA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB4432
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/aamjt23xGvmWgMll_E25SvfX9rg>
Subject: Re: [I18ndir] Writing direction
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 May 2022 09:05:50 -0000

Hello everybody,

Sorry I'm quite late with my reply. My conclusion is that it's most 
probably not worth creating a bidi extension for language tags. See 
below for the reasons.

On 2022-05-18 14:32, Asmus Freytag wrote:
> On 5/17/2022 9:57 PM, Patrik Fältström wrote:

>> Ok, obviously without knowing the complete context here I would say 
>> that first of all the big problem is mixing protocol parameters with 
>> display. I call this "leakage". We see this in DNS where a domain name 
>> is visible to the user. We see it in other parts of a URI, an email 
>> address etc. Oh, email address is a perfect example. It does have a 
>> "free text" name, and then an address. But many people want to use 
>> their name as an email address which leads to collisions and other 
>> things. This while applications that only show the name have similar 
>> security risks like text that is a link that people click on might 
>> have a destination that is not what the end user guesses or believes.
>>
>> To the "free text".
>>
>> To me there are two issues here:
>>
>> 1. Display is very important to the end user. We have the context 
>> within which the sender of the text has, and the context of the 
>> receiver of the text. If a text is to be displayed we even without 
>> talking about general directionality (that do impact rendering) we 
>> have the issue of mixing two contexts. Even if I have some clue about 
>> i18n I have very to no knowledge about the same text, same script, 
>> same language is possible to display with different directionality. I 
>> believe some asian scripts can do this, and for example hebrew. So the 
>> first question is what problem is to be solved. I guess it is "to have 
>> the receiver understand what general directionality the sender of the 
>> text decides". The receiver can then display in whatever 
>> directionality context the sender wants.
>>
>> 2. Second question is whether general directionality is a degree of 
>> freedom that is really needed in this protocol. I think it is really 
>> really really important to agree this *is* important. And I mean that 
>> it is much more important than deciding that "the free text in this 
>> protocol has a directionality context that is R2L", or L2R for that 
>> matter. I.e. that this protocol element (because it is a protocol 
>> element after all, even if the element contains "free text"). If the 
>> string is short, I claim one can create a string with the help of 
>> directionality is like if the general directionality was the opposite 
>> of what the general directionality is.
>>
>> 3. If the answer to the second question is that one can absolutely not 
>> have a given directionality, I still think one should not give up. One 
>> can still say that "the directionality of the free text element is 
>> R2L", with the addition "if the free text element is to be a L2R 
>> context, then the first character of the element MUST be U+2066 
>> "Left-To-Right Isolate".

A side remark: Because LTR is much more widely used, I think the default 
should be LTR, and RTL should be marked with
RIGHT-TO-LEFT ISOLATE (U+2067). Also, if you put that in, then please 
also put in a POP DIRECTIONAL ISOLATE U+2069 at the end.

But in very many cases, simply requiring the use of first-strong (auto) 
isolation when including the protocol element in a bigger context may be 
okay. The use of first-strong isolation guarantees that all the text is 
kept together in one junk (possibly flowed across multiple lines 
depending on length).

<snip>

> embedding bidi controls into protocol text data is ugly, because they 
> end up, sooner or later, embedded in the plain-text backbone of an HTML 
> page. (I'm sure that's a law that's already named by someone out there).

Yes, fully agreed, it's ugly. But it's not uglier than a language tag 
which includes directionality information ending up in an HTML page. In 
addition to that, having one kind of ugliness is clearly better in my 
eyes than having two separate kinds of ugliness. Also, the W3C 
Internationalization activity already has an article discussing this 
case: https://www.w3.org/International/questions/qa-bidi-unicode-controls.

Also, please note the following: If we introduce some kind of bidi 
extension for language tags, whenever the protocol elements in question 
are put in some context, we have to remove that extension. Either the 
protocol element gets added to plain text, in which case we have to add 
bidi controls, or it gets added to (HTML) markup, in which case, we have 
to add markup (and remove the extension). It doesn't necessarily look 
like good protocol design to create something that has to be removed as 
soon as it's actually used :-(.

<snip>

> Unlike all the other presentational markup that exists to affect 
> text-layout, the bidi direction is special in that it affects things 
> like the order of first and last name or any other elements where "order 
> in the sentence" affects the meaning (and not just the appearance) of 
> the text.

I agree on this point, of course.

<snip>

> What are the types of texts that show up in IETF protocols?

That's indeed the most important question in this discussion. Things may 
range from protocol elements that occasionally have to be checked by a 
human to stuff that's essentially intended for humans only. They may 
also range from very short word-like stuff to very long texts.

Solutions range from restricting the protocol element to using a single 
script (similar to the restrictions we have on DNS labels) to requiring 
that the protocol element be included in <bdi></bdi> when added to 
markup (see the example in 
https://html.spec.whatwg.org/#the-bdi-element) to the advice "you'd 
better allow markup here anyway".

This means that the range of 'protocol elements' where a bidi extension 
to language tags might make sense is actually not very wide. Combining 
this with the fact that it has to be removed as soon as it's used 
somewhere, the idea of a bidi extension to language tags looks pretty 
much like a nonstarter to me. Sorry it took me so long to get to that 
conclusion.

Please also note that depending on the nature of the protocol element, 
language information may not be needed. As an example, there were 
discussions about adding language information to domain name labels when 
domain name internationalization was undertaken. But it was easy to 
convince people that it doesn't make sense. There is no difference 
between an English version of the label 'chat' and a French version, 
although their pronunciation and meaning differs depending on the 
language. What will happen is that each user will interpret and 
pronounce the label depending on their language abilities and the 
context it is in, and that's it. That may apply to cases where bidi 
information (or at the least, bidi isolation) is still highly desirable. 
Another reason for not adding a bidi extension to language tags.


Regards,    Martin.