Re: [I18ndir] Writing direction

Asmus Freytag <asmusf@ix.netcom.com> Tue, 17 May 2022 02:05 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CDAF8C073729 for <i18ndir@ietfa.amsl.com>; Mon, 16 May 2022 19:05:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.755
X-Spam-Level:
X-Spam-Status: No, score=-3.755 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-1.857, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v4Z2bK-1P2eV for <i18ndir@ietfa.amsl.com>; Mon, 16 May 2022 19:05:51 -0700 (PDT)
Received: from nmtao202.oxsus-vadesecure.net (mta-202a.oxsus-vadesecure.net [51.81.232.240]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6811DC079B56 for <i18ndir@ietf.org>; Mon, 16 May 2022 19:05:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=hhRPiT2tsw3P5a7MMHbgrS6IfW47XveUiNg88x okePU=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1652753144; x=1653357944; b=EXUzgW8X6HabdGzKRIlu3QlyE+++hE7osMH+4b/bSeTqxWYxetyNoLw a99UVHE/L55jdD+eHjjq6liRmTawF6momFkPPHJo3M2W7QicYhXM2kvBtkQ0sKNefoT8ibT JPFalaVSVMAY4yPI0w7MdrNHV0Zujpq7u1KNGeX6Ed4cQ6FAHPXHWvrFtE6r5GxZtmr4Wo2 uX7Z+33GZH7QyVBGTEvpoJZ+n1HATnfY0WqAR5ZroMGuwUQGf/xzou32Qm0KjGogzw3YO0B 3CYeMDpV5/SoWRQs6xSRo/lVZuHn+zcqq0ZXTfCkBKGoXA4dawVFkYgNDKrt8o9FWZa+f4F MZA==
Received: from [10.71.219.206] ([142.147.89.204]) by smtp.oxsus-vadesecure.net ESMTP oxsus2nmtao02p with ngmta id 7c6bc57d-16efc23a630ddc25; Tue, 17 May 2022 02:05:44 +0000
Content-Type: multipart/alternative; boundary="------------rNSKtycf6PhjWSk7wrvqmm3z"
Message-ID: <30716a96-6126-6621-b927-c3962b223477@ix.netcom.com>
Date: Mon, 16 May 2022 19:05:42 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0
Content-Language: en-US
To: John C Klensin <john-ietf@jck.com>, i18ndir@ietf.org
References: <4C4A249559BA1E86B17E53FE@PSB> <26ca6aba-eb4a-6bc6-af96-8c7db9b3631d@ix.netcom.com> <EDBC11DA94E825A663D89119@PSB> <d49001bf-057d-8eb5-a92c-fc37d96ab864@ix.netcom.com> <432996D42894CF2EA0A1B17C@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <432996D42894CF2EA0A1B17C@PSB>
Authentication-Results: oxsus-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/SCFBDDwLIwjnJvBF_SZQeS9eQts>
Subject: Re: [I18ndir] Writing direction
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2022 02:05:56 -0000

On 5/16/2022 4:40 PM, John C Klensin wrote:
>
> --On Monday, May 16, 2022 14:17 -0700 Asmus Freytag
> <asmusf@ix.netcom.com>  wrote:
>
>> On 5/16/2022 12:25 PM, John C Klensin wrote:
>>> --On Monday, May 16, 2022 11:33 -0700 Asmus Freytag
>>> <asmusf@ix.netcom.com>   wrote:
>>>
>>>> John,
>>>>
>>>> you are correct that any operation that's limited to the
>>>> backing store does not need to know the directionality.
>>>>
>>>> Anything you want to display, on the other hand, benefits
>>>> from knowing the display direction.
>>>>
>>>> A directionality tag on BCP47 would have the advantage that
>>>> you don't need an extra field that's needed only some times.
>>>> However, it opens the issue of what happens when a protocol
>>>> uses the existing BCP47 *and* has a separate field for
>>>> specifying directionality.
>>> I actually think that is fairly straightforward although I
>>> agree it should be spelled out.  Cases:
>>>
>>> (i) Protocol exists, specifies directionality via some
>>> separate field or markup.
>>>       --> keeps doing what it has been doing since it will not
>>> recognize the extension.
>> OK
>   
>>> (ii) New protocol comes along and specifies directionality by
>>> use of the extension.
>>>       -->  Obviously (I hope) uses the extension field.
>> OK -- that is: SHOULD use the extension field if data is
>> displayed as part of the protocol or by some process that uses
>> the protocol fields to figure out parameters for display.
> If one were designing a new protocol and decided to use
> directionality as an extension part of the language tag, why
> would one want to encourage poking around through [other]
> protocol fields or parameters to try to guess?  Or are you
> saying that there might be circumstances in which a new protocol
> might want to use other information just like the "existing"
> protocols in (i)?  I'm not sure whether or why that should be
> encouraged but don't see a good reason to try to ban it?

The question is whether it's better to get the info from a BCP47 tag or 
from some other field (in a new protocol). Thinking about it, the use of 
a dedicated field (if not optional) makes sure that it's always clear 
what the intended directionality is supposed to be. In the tag extension 
situation, it's more normal to have a way of handling a missing 
extension, so you get squishy on whether omitting the extension is 
intentional.

But it boils down to optional vs. not, not really where the info is 
presented.

Also, not all protocols deliver single strings. As in when you serve up 
an HTML file over HTTP. Then you get to handle the issue of whether the 
language and script part may be global to the data, but directionality 
may not be (or, theoretically vice versa).

A new protocol has to spell out those issues.

>
>>> (iii) Older protocol, with directionality specified as in (i),
>>> is revised.   Now it seems to me that those doing the revision
>>> have three choices:
>> Unless data is never displayed,...
>>> (a) Stick with the status quo/old way to do
>>> things,
>> OK
>>>    (b) deprecate the old method and specify use of the
>>> tag extension, or
>> OK (with your caveat)
>>> (c) allow either and specify what happens if
>>> both are provided and not consistent.  Absent plans about a
>>> flag day or sufficient changes to really create two separate
>>> protocols (the "old" and "new" ones) in the same general
>>> space, I don't know of a practical way to do (b), but it is
>>> logically possible.
>> OK: specifically, may designate one as "default" and the other
>> as "override".
> >From the standpoint of good protocol design, it seems to me a
> bad idea to tell something to look in both sets of places and
> then compare.  So this should be "look for X.  If a value is
> found/ can be determined that way, believe it without even
> figuring out whether other information might be available.  Iff
> X is not present, look for Y."  Are we saying the same thing in
> different words or do I not understand your suggestion?

Almost. For a single string you'd say "explicit direction" beats info in 
the tag extension. And you'd be fine.

But what for complex/compound data.

If each piece in the collection can have a direction attribute and 
language attribute, but the collection as a whole can also have these 
attributes.

If you override the language (locally) with a null-extension and don't 
provide an explicit direction, I posit that retaining a direction from a 
global BCP tag for a different language is a mistake.

Think document vs paragraph language in HTML. I wouldn't want every 
English paragraph in an HTML document tagged as Hebrew + RTL to show in 
RTL even if I leave out the direction extension on the language override.


>
>> Consider HTML. An obvious choice would be to allow such an
>> extension to set a default direction, so that directional
>> markup becomes redundant (only required to override the
>> default).
> Ok.  I think that is what I said above (with "X" as the
> extension) but I, personally and so far, believe the argument in
> the last part of
> https://www.w3.org/International/questions/qa-direction-from-language
> against use of an extension with HTML (and CSS, SVG, XML,
> etc.)... more or less because it would introduce confusion and
> cause a _very_ long transition.  I see the extension model as
> being far more interesting for new protocols that do not have
> separate direction metadata.
For HTML as such, the "win" would be limited of allowing extensions. But 
what about new protocols that are similar "collections" where each 
member can be tagged and displayed independently (at least what concerns 
the direction).
>
>> That would seem to increase the chances that display is
>> correct.
> It actually doesn't.   If the two ways to specify things are
> consistent with each other in a given document, then it doesn't
> increase anything, merely wastes space and/or time.    If they
> are not consistent, either one is almost always better than the
> other (little or no improvement either), or the application has
> to make an arbitrary choice between two specifications, one of
> which is wrong.
>
> That would turn into even more bad news if the markup metadata
> allowed only "LTR" or "RTL" but the extension also allowed some
> coding for "top to bottom" or "serpentine", perhaps in
> combination with one of the first two and some fallback rules.

I think you could reason yourself to a definition with reasonably clear 
precedence rules. But yes, users would need to run those in their heads 
to be sure they get what they intend. That may be a good reason to argue 
against them.

Look, I'm not actually proposing one over the other, I'm more interested 
in pointing out the need to address such issues in discussing best 
practice for protocol design.

Let me suggest one more issue: tags that are imported from some other 
part of the system.

If your model is that of an author, where the use of directionality and 
language tag can be curated by someone who can look at the text and 
decide it's correct, that's different from when you've been told some 
data is in language x (or created by a user working in a UI that's set 
to language x).

If the user profile has room for a language tag (w/ use of extensions 
not limited) then that's all you got. You could never set an explicit 
direction field based on that. if the language tag happens to cover 
direction, that's great.

You could always look for an extension, strip it off and then supply to 
the protocol in a separate field.

Anyway, not sure where I'm going with this, but I can see that usage 
scenarios will depend on your assumptions about the nature of the 
(strings in the) protocol and the source of the data used for language 
and direction tags.


>
>> Would have to address http vs. html specification of language
>> ID, does a language ID w/o extension override a later ID that
>> lacks an extension? Only if it's the same language would be my
>> guess. Otherwise you'd end up defaulting to RTL when English
>> is embedded in Arabic, but the English language tag doesn't
>> carry direction.
> Yes, I think so.  But, if anyone asked me for advice about HTTP
> or HTML, it would be to pick (iii)(a), ignore the extension if
> it appeared and the application noticed, and go merrily on one's
> way.  The existing mechanisms work well and, at least as
> important, are well understood and the size of the installed
> base is such that trying to retrofit handling of a language tag
> extension --especially if one needed to consider cases like the
> above-- would almost certainly not be worth it.

Best to understand these examples as intended for "protocols that are 
similar to" rather that suggestions to change said protocols.

But even HTML would have to state what will happen if the extension is 
used. Will there be a guarantee that it will be ignored going forward?

All good questions.

Some of them could be answered in a draft.

A./

>
> best,
>      john
>