Re: [I18ndir] Writing direction

John C Klensin <john-ietf@jck.com> Tue, 17 May 2022 13:09 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E32D1C157B50 for <i18ndir@ietfa.amsl.com>; Tue, 17 May 2022 06:09:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5DqZNDtLNIR7 for <i18ndir@ietfa.amsl.com>; Tue, 17 May 2022 06:08:58 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 943C4C14F725 for <i18ndir@ietf.org>; Tue, 17 May 2022 06:08:58 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1nqwwV-000PBr-IO; Tue, 17 May 2022 09:08:51 -0400
Date: Tue, 17 May 2022 09:08:46 -0400
From: John C Klensin <john-ietf@jck.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Asmus Freytag <asmusf@ix.netcom.com>, i18ndir@ietf.org
Message-ID: <E3B619B15C3757059BF78085@PSB>
In-Reply-To: <7fd52123-7ef4-baee-c3a4-0cd8b9bc6f88@it.aoyama.ac.jp>
References: <4C4A249559BA1E86B17E53FE@PSB> <26ca6aba-eb4a-6bc6-af96-8c7db9b3631d@ix.netcom.com> <EDBC11DA94E825A663D89119@PSB> <d49001bf-057d-8eb5-a92c-fc37d96ab864@ix.netcom.com> <432996D42894CF2EA0A1B17C@PSB> <7fd52123-7ef4-baee-c3a4-0cd8b9bc6f88@it.aoyama.ac.jp>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/MeazcSGGIJpnHpSy8zgtKk0SL-0>
Subject: Re: [I18ndir] Writing direction
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2022 13:09:01 -0000

Martin,

(top post)
Thanks.

I completely agree with your reasoning and believe it is
consistent with (and obviously more detailed than) the reasoning
in the W3C "Can we derive base direction from language?"
document that has been cited several times.

Rather than continue to tune what a specification of such an
extension should say, can we back up to what I think are the two
core questions?  To state them more explicitly (somewhat
informed by your comments and the notes back and forth with
Asmus - thanks again):

(1) If we were doing this from scratch, with no history of using
markup separate from BCP47 tags, we would probably put in the
extension and just go for it.  That is obviously not the
situation in which we find ourselves.  If the only relevant
applications were HTML, CSS, SVG, and XML (contexts in which
markup arrangements were already being used to indicate
direction) or, probably, protocols in which markup was already
being used, we would be better off sticking with the markup for
directionality or, in some protocol-specific cases, using a
private BCP 47 extension for tht purpose.  That is, at least as
I understand it, almost exactly what the "Can we derive base
direction..." document says.  However, we have evolving IETF
protocols that use free text but where markup is not present.
For them, there seem to be three choices:

	(i) To force all of them into a markup-like architecture.
	(ii) To treat each one as a separate, special case, most
	likely using a private-use BCP47 extension for at least
	some of them.
	(iii) To standardize a BCP47 extension to specify
	directionality.

None of those options is ideal.  The first almost certainly
won't happen -- some of the protocols involved have space
constraints that mitigate against markup.  Others may have other
considerations such as being an update to a deployed protocol
for which switching to a markup architecture would be nearly
impossible.  For either the second or third, collateral damage
is likely.  If two or three applications end up using the same
"private" extension and, to save time, define it the same way,
that extension will rapidly become de facto public no matter
what assorted documents say (we have ample, and painful,
experience with that problem: see RFC 6648, which prohibits
attempts to use "x-" constructions).   With a public extension,
we get reusability and probably better definitions than the ones
we might get for private ones, but we end up with the issues
that you and Asmus have described.

So, from my point of view, at least so far, we have a "pick the
least harmful option" situation, not an "ideal choice" one.

And that brings me to the other question:

(2) After the messages of the last few days, I am confident
that, with continued collaboration by Asmus and yourself (and,
ideally, that of others with additional perspectives), it would
be possible to write a reasonable specification for a BCP47
directionality extension, including a good discussion about
difficulties when both that extension and systems based on
markup (or separate commands or parameters, etc.) are out there
in the wild.  But, if we can't process that specification and
get it onto the standards track in a timely fashion, there is no
point putting in the effort.  The advice that should be then be
given the WGs who are asking (and whose protocols don't provide
markup mechanisms) is to invent local mechanisms (or kludges),
possibly involving private-use extensions and the problem
mentioned in conjunction with (1)(ii). above.

So (in no particular order) Francesca?  Murray? Pete?  Barry?
We await your advice but I'm sure that one thing the ADs don't
need is a Last Call objection when the relevant protocols get to
that stage, one that points out the problems with their choice,
that better solutions were suggested but never formally
documented or proposed because the ART Area declined to engage
on the topic.

best,
   john

p.s. I'll respond to Patrik's note soon, I hope later today.


--On Tuesday, May 17, 2022 18:21 +0900 "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:

> Hello John, others,
> 
> On 2022-05-17 08:40, John C Klensin wrote:
>> 
>> 
>> --On Monday, May 16, 2022 14:17 -0700 Asmus Freytag
>> <asmusf@ix.netcom.com> wrote:
>> 
>>> On 5/16/2022 12:25 PM, John C Klensin wrote:
>>>> 
>>>> --On Monday, May 16, 2022 11:33 -0700 Asmus Freytag
>>>> <asmusf@ix.netcom.com>  wrote:
> 
>> 
>>>> (iii) Older protocol, with directionality specified as in
>>>> (i), is revised.   Now it seems to me that those doing the
>>>> revision have three choices:
>>> Unless data is never displayed,...
>>>> (a) Stick with the status quo/old way to do
>>>> things,
> 
>>> Would have to address http vs. html specification of language
>>> ID, does a language ID w/o extension override a later ID that
>>> lacks an extension? Only if it's the same language would be
>>> my guess. Otherwise you'd end up defaulting to RTL when
>>> English is embedded in Arabic, but the English language tag
>>> doesn't carry direction.
>> 
>> Yes, I think so.  But, if anyone asked me for advice about
>> HTTP or HTML, it would be to pick (iii)(a), ignore the
>> extension if it appeared and the application noticed, and go
>> merrily on one's way.  The existing mechanisms work well and,
>> at least as important, are well understood and the size of
>> the installed base is such that trying to retrofit handling
>> of a language tag extension --especially if one needed to
>> consider cases like the above-- would almost certainly not be
>> worth it.
> 
> I think this would indeed be the right answer. And it might as
> well work out, at least for some time. But the Web and
> browsers have some weird dynamics.
> 
> First, we'd see some of these directionality extensions show
> up in HTML,
> because data from places where there's only language
> information and no separate directionality attribute gets
> copied over. Ideally, on the copy-over, language information
> proper and directionality would be separated and noted in the
> relevant HTML attributes, but we know that some programmers
> may be a bit sloppy (and already having them copy over the
> language information may be considered a success).
> 
> Next, we may have some 'smart' users complain to browser
> makers that they don't respect the directionality information
> even if it's there (as an extension of the language tag) and
> clearly needed and correct.
> 
> Next, or even sooner, we may have some proactive developer at
> a browser vendor who isn't aware of the whole story but thinks
> it's quite obviously a good idea that such directionality
> extensions get honored by the browser display logic. Once one
> browser does it that way, there will be pressure on other
> browsers to do it the same way, because some pages render
> better in the browser with the change. As we know, it's
> virtually impossible for browsers to blame page authors (or
> programmers of software sticking together pages from some
> data) or to point users to a specification and tell them that
> they are more correct that the other browser.
> 
> So in summary, if you can guarantee that the directionality
> extension never leaves the protocols/formats where there's no
> separate directionality information field, it may be a good
> idea. But we know very well that no protocol/format is an
> island.
> 
> Regards,   Martin.