Re: Last Call: 'Tags for Identifying Languages' to BCP

"JFC (Jefsey) Morfin" <jefsey@jefsey.com> Mon, 29 August 2005 16:26 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E9mSS-0002pU-Kb; Mon, 29 Aug 2005 12:26:04 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E9mSP-0002pC-He for ietf@megatron.ietf.org; Mon, 29 Aug 2005 12:26:03 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA19241 for <ietf@ietf.org>; Mon, 29 Aug 2005 12:25:58 -0400 (EDT)
Received: from montage.altserver.com ([63.247.74.122]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1E9mTl-00050u-KL for ietf@ietf.org; Mon, 29 Aug 2005 12:27:26 -0400
Received: from ver78-2-82-241-91-24.fbx.proxad.net ([82.241.91.24] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1E9mSC-0005Lp-M7; Mon, 29 Aug 2005 09:25:50 -0700
Message-Id: <6.2.3.4.2.20050829151116.04deec00@mail.jefsey.com>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4
Date: Mon, 29 Aug 2005 18:11:00 +0200
To: Peter Constable <petercon@microsoft.com>, ietf@ietf.org, iesg@iesg.org
From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
In-Reply-To: <F8ACB1B494D9734783AAB114D0CE68FE06EDF1AB@RED-MSG-52.redmon d.corp.microsoft.com>
References: <F8ACB1B494D9734783AAB114D0CE68FE06EDF1AB@RED-MSG-52.redmond.corp.microsoft.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - montage.altserver.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jefsey.com
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 6a817af60e4281a101681ecb646dffff
Cc:
Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Sender: ietf-bounces@ietf.org
Errors-To: ietf-bounces@ietf.org

Dear all,
at this stage I think it is clear that the langtags issue represents 
a strong opposition between two visions of the Multilingual Internet. 
These visions  for the worse or the better are embodied by Peter 
Constable's friends and me.


There is an affinity group gathered by circumstances or by talent to 
support Peter's approach. Its kernel happens to be formed by English 
mother-tongue people employed by large corporations or interests 
(from history it seems it formed in the course of international 
meetings). A few Members are included by personal dedication or as 
consultant. There are no academic searcher, no publicly funded 
contributing project, no cultural organisation sponsoring. The 
Members of this affinity group share a comon culture. It is based 
upon different levels of technical involvement of the structures and 
individuals involved. There is no R&D involved in the network area 
which is not sponsored by commercial interests, with the con and pro 
meaning of RFC 3869. In that sense it can be said it is an US 
industry lead group. This is at least the way non-US interest, 
organisations, Government officials I discussed with identify them 
with no exception. True or not, this is the perception. It is to be 
related to the definition of an IETF affinity group be RFC 3774.

This group proposes a tagging of all the languages of the world, it 
perceives as a commondity (a well known trait of the English mother 
tongue people who share their own language with other people round 
the world). This way certainly suits e-commerce and basic 
interoperability and library classification of foreign books. The 
idea is that a standard and a central registry will constrain the 
world to follow a common useful rule, if it cannot continue using 
ASCII English. This is named "internationalisation". This unliteral 
standardisation is seen as the only warranty of stability and of 
unicity of the network. Being unique for the entire world this 
tagging must be simple and based upon simple information. This 
information is made of three elements the commerce needs for 
practical reasons: the written language, the script being used, the 
applying law.

This vision A addresses specific urgent needs of the printing and 
libraries industries to reduce costs to face the competition of other 
media and the printing capacity of every user (a problem less 
documented but as important as the Music industry'sproblem), with a 
larger financial turn-over. World concentrations and specialisations 
can be expected from a unique normative system. With all the 
reluctances one can be expected and the strategy one may imagine).


There is a tissue of relations I weaved among people engaged in 
network research, operations management, cultural life, government 
administration, international entities, lingual oriented interests 
and activities, and local industry, from various parts of the world, 
in particular through an Internet test-bed named dot-root (responding 
to the ICANN ICP-3 call), a long involevement in @large and ccTLDs, 
and from an national internet community and governement think tank I 
started one year ago and which develops unexpectedly. The strength of 
this relational group is that no money is engaged, what warranties 
its independance. But this is also its weakness as it leaves it no 
other alternative than to rely on voluntaries to represent it - often 
only one when the task is as demanding as this one; or to call on the 
personal involvement of concerned people, with the risk of 
overwhelming the Internet standard process by scores of irritate new 
commers. The common culture of this group is common sense support 
towards a user-centric multilingual architecture and strong 
sustainable innovation

This group sees no need to tag the languages but the need to document 
relations, which - among other things - use languages, but also many 
other parameters. It thinks that every human being, machine and 
service is specific and different from other, and that surety, 
security, stability and innovation capacity is based upon the best 
seamless support of these differences for a strong unity of the 
network. It experimented that the computing generalisation and a 
pervasive networking support a realistic, commercially rewarding and 
humanly exciting set of possibilities. This concerns relations, 
culture, economy, social, political development everyone, every 
economy, every country may share in, on an equal opportunity basis. 
It also sees a global convergence of R&D, civil society, economy and 
political spheres in that direction (for example at WSIS, but also at 
IETF) expressed in various directions, one being the information 
conceptual networking (ISO 11179 R&D) and another a fluid refencing 
system (URI tags) which give new possibilities; specially when added 
to physical and services networking.

This vision B calls for an open description system/language of 
languages, and of many other relational parameters. Obviously it is 
still in infancy as everything started in the early 80s has been 
delayed by the furthor OSI and then Internet vision, hardware and 
bandwidth limitations and costs. It is only resuming now.

The vision A has difficulty (and lack of competence Peter helped 
documenting yesterday) to understand vision B. And as usal in that 
cases it fights the messenger. No big deal: the messenger is used to it.

Vision B has no problem in accepting vision A as a "default" for 
those wanting it. However vision A is centralised and vision B is 
distributed. Xo, vision A thinks it needs to be unique to exist and 
fullfil its purpose. This is why Vision B proposed several things:

- to define a Vision A exclusive area of application. This was made 
from the second Last Call in proposing the authors to add wording 
telling  that the area of application was the areas already covered 
by RFC 3066 and documented further on.
- to protect Vision A from confusion. This was made in pushing the 
authors into a very strict ABNF avoiding tag-creeps.
- there may be other propositions to sudy. This is however not easy 
to uncover as Vision A has difficulty with the architectural 
evolutions (network, content, relational elements) all this 
technically implies.

As I explained, there are three scenarii:

1. Vision A is denied by the IESG. Progressively vision B imposes 
itself through new RFCs or from a grassroots (international) process. 
The current basic needs are not properly addressed. Credibility of 
the IETF is engaged like in spam, IDNA, etc. This is delaying.

2. Vision B is denied by the IESG. But vision B is already accepted 
through the URI-tags RFC. It will develop in opposition to Vision A. 
This will cost money and delays to everyone, Multilingual Internet 
will switch outside of the IETF or balkanise.

3. Vision B is included in Vision A as a community private use. This 
scheme is simple to understand and to include in the RFC 3066 Bis 
document in two lines. It does not break any of its principles.

     - the document is unchanged and addresses the general need, 
whatever it may be.
     - "x-" is unchanged. Its role is to support private use schemes, 
within private spaces.
     - "0-" is added from the reserved singleton pool. Its role is to 
support community private use schemes. This means, when a user 
community wants to document languages their own way. The need is to 
support in a non conflicting ways two informations:
        - the community scheme identification
        - the identification within that scheme.

I think this respect all the requirements of Vision A and permits a 
full developement of Vision B. There are two possibilities to support 
the "0-" space: either to develop a new system or to use an existing system.

I have no particular opinion except that the solution MUST be 
decentralised (community centralised). I started thinking we had to 
develop a new one, waiting for tge review of the WG-ltru charter both 
to make sure the proposition would fully respect Vision A and to 
learn Vision B points we would have overlooked (there probably are 
many). This created problems to the WG wich only wanted to block 
Vision B it still does not uinderstand or opposes.

Then we found the not yet numbered URI-tag RFC. It seems to address 
all the needs, but more than the needs, except the 
multilingualisation. My intent is therefore to document an IRI-tag 
along the URI-tag lines when this debate has stabilised and the 
URI-tag RFC has been published. I have no problem working on it 
within the WH-ltru.

What next? The Vision A alone is harmfull to all. If it was accepted 
it would be appealed. To IETF Chair for common architectural common 
sense. To IESG for lack of compatibility with the Charter and other 
RFCs. To IAB if necessary to obtain guidance on the implementation of 
the Multilingual Internet. Then appeals would continue in the outside 
world. The target is not to oppose the Vision A. It is to the 
contrary to make sure it is viable. As the only solution permitted, 
it will NOT survive because it is not able to resist all what one can 
expect people will do with it out of control. We had a very similar 
case with IDNA. The only response to hommograph phishing was "we 
discussed it"....

I will document a few of these points in responding last Peter's mail.

At 14:11 29/08/2005, Peter Constable wrote:
> > From: Bruce Lilly <blilly@erols.com>
> > > This
> > > is all what this proposition is about. This proposition is to give
> > > _one_shot_ in a _standardised_ way the language, the script and the
> > > country.
> >
> > This was discussed during Last Call of the previous non-IETF
>(individual
> > submission) attempt.  IIRC David Singer brought up several examples of
> > other pieces of information (e.g. legal/copyright variations) that
>could
> > also be negotiated and which might affect the presentation of content
>(or
> > choice among alternative content).  Lumping all of these separate
>items
> > into
> > one tag is a poor design as it impedes negotiation and tends toward
> > lengthy
> > tags which are incompatible with fixed-length mechanisms such as MIME
> > encoded-words.
>
>I agree that it would be poor design to incorporate other pieces of
>information such as legal/copyright variations into language tags, but
>as such pieces of information are not supported by the draft, this
>appears to be irrelevant.

This is inexact. There is no problem in having the Draft compliant tag:

fr-Latn-fr-gayssot

to indicate a French language text fully respecting the "Loi 
Gayssot", the anti-racist law used against Yahoo. There is no 
warranty that an ISP or the French law does not filter out pages from 
suspected sites not wearing that tag, transfering Host legal 
responsibilities to the Author.

The problem in believing that one can rule the world is that the 
world may not accept to be ruled.

>We should rather focus on whether it is good design to incorporate
>information related to linguistic and written-form attributes, as
>supported in the draft, into a single tag. The consensus of the LTRU
>working group is that it is.

Let phrase it a more exact way: the affinity group which formed the 
WG has been gathered around that idea.

1. basic written mode attributes should not be specific in the 
description of a language ... while in addition most of them are oral
2. in what manner the country code is related to a specific 
information? Nowhere in the Draft this attribute is documented: is it 
the location where the text has been written, the location of the 
lingual community of the author, or of the lingual community of the 
reader ??? Where is that location definition documented so both side 
of the relation can understand each other when negociating?

>  For instance, the use of separate tags for
>language and script were considered and rejected

this has not been considered and rejected. This was a predefined 
faith and every question on this has been defeated.

The problem is that it is meaningless and conflicting with the charset!!!
Until you associate a "script" with a charset, a script has no meaning ....

I asked the simple question: "does fr-Latn-FR means that Latn permits 
me to properly write French?" To know that, I need to know what are 
the characters associated to "Latn". No response. Same question on 
the Unicode list. Non-French mother tongue members said "yes" (but no 
one was able to demonstrate it). French mother tongue experts said 
"no" and explained that Unicode lacks a particular space needed to 
properly type typical French sentences an one accentuated character. 
This was then disputed. My problem as a user, as a network 
standardiser is not to be concerned by these details. I need 
certitudes and warranties the Draft does not provide.

>on the basis that the two are not entirely orthogonal. Clear 
>examples of this was considered:
>while the intent of
>
>Accept-Language: ar, az-Cyrl, ru
>
>is clear, the intent of
>
>Accept-Language: ar, az, ru
>Accept-Script: Cyrl
>
>or of
>
>Accept-Language: ar, az, ru
>Accept-Script: Arab, Cyrl
>
>is not clear, nor is it obvious how rules could be specified that would
>make the intent clear, or that would permit expressing the preferences
>reflected in the first instance.

This kind of example is absurd. There is no more information and more 
confusion with the proposed system if a page or a part of a document 
is also assigne different conflicting langtags ...

> > Tagging identifies characteristics of a particular piece of content.
>For
> > that purpose alone, it makes little difference (other than regarding
>the
> > aforementioned compatibility issues with existing IETF mechanisms)
>whether
> > the characteristics are lumped or separate.
>
>On the contrary, it makes little difference only if the characteristics
>in question are completely orthogonal. As pointed out above, the
>characteristics of linguistic variety and written form are not
>orthogonal, particularly when it comes to expressing user preferences,
>and that it *does* make a difference if they are split into separate
>metadata attributes or they are lumped together into a single metadata
>attribute.

Explain.

I will go your way however you have not defined what is a script. The 
author is a Rusian, siting in NY and writing a page in Urkainian and 
wanting the texts to be repeated in Latn and Cyrl scripts, so 
everyone there is able to read it. A very common proposition.

Please precisely document the langtags. And show what is not 
orthogonal in them.

> > While that may be used to infer something about the content
> > provider, such inferences may be unreliable...
>
>Quite so. This point was discussed in the WG.

The question is to know if the solution is acceptable. This LC is the 
LC of the document, not the of the WG or mine;

> > Negotiation of separate characteristics is much
> > simpler than that of a combined conflation of characteristics; each
> > characteristic can be assigned separate preference values, and
>irrelevant
> > characteristics (e.g. script w.r.t. spoken language) can be easily
>ignored.
>
>Negotiation of separate attributes involving inter-related
>characteristics is *not* simpler, as pointed out above. The draft fully
>allows for irrelevant characteristics (e.g. script wrt audio content) to
>be ignored. Again, what has been provided in the draft is in accordance
>with the charter of the WG.

Charter speaks of languages. You made clear the Draft was language 
and not written language oriented. I am glad to learn that the mode 
is an irrelevant characteristic.

Most of the languages are oral. Their rendering in a written form is 
therefore a important information ...

> > As negotiation and related issues represent a critical technical issue
>for
> > the design of language tags (viz. keeping separate characteristics out
>of
> > *language* tags), it is essential that such negotiation issues be
> > considered
> > carefully before specifying the format of tags.  Unfortunately, that
>has
> > not
> > been done, and considering the published WG milestones it appears that
> > that
> > issue has not been taken into consideration...  However, it
> > appears that the WG has not considered the issues, with the effect
>that
> > the
> > WG product lacks the "particular care" expected of BCP documents (RFC
> > 2026).
>
>It is unclear on what basis it is asserted that these issues have not
>been considered by the WG. I believe most of the WG members would feel
>that they have been reasonably taken into consideration.

I agree with that. But, the question is where was the related 
decisions taken. I would tend then to fully agree with Bruce.

> > Note that it is not the registration procedural issues that are
>typical of
> > BCP documents that are problematic; rather it is the conflation of
> > separate
> > characteristics into a single tag syntax, specified in the same
>document,
> > which raises problems related to content negotiation.
>
>Bruce asserts (a) that there is conflation of separate characteristics,
>and that (b) this creates problems in content negotiation. The WG
>determined that the characteristics conflated into a single tag are not
>independent, and that it would be *separation* into separate attributes
>that would result in problems in content negotiation, not their
>combination into a single attribute.

Govermental authority over content is not an orthogonal information 
to language in some parts of the world. Question is to know if this 
is to be addressed as a general or a specific issue.

> > Another large part of
> > the problem is WG management; in addition to the issues raised by John
> > Klensin the last time that LTRU participation was discussed on the
>IETF
> > discussion list -- and with which I wholeheartedly agree -- it appears
> > that
> > management of WG participant conduct has been rather lax; proponents
>of
> > the
> > individual submission effort who are participating in the WG tend to
> > resort
> > to ad-hominem attacks when a problem is identified or when an
>alternative
> > approach is raised, with no visible intervention by the WG co-chairs.
> > That
> > has also (i.e. in addition to the factors which John identified) had
>the
> > effect of limiting WG participation by individuals.
>
>It's unclear what bearing this has on what improvements can be made to
>the drafts in fulfillment of the WG charter. I believe several WG
>participants felt that management of conduct was lax, particularly in
>relation to a very small number of participants with a penchant for
>certain behaviours that would have challenged the best of moderators.

I suffered most of that: various innuendo on my age, my need of 
English teachers, the despise of my colleagues as "end users" vs. 
"IETF members" and "developers", "physical allusions to my possible 
broken nose", anonymous phone calls, loss of clients due to abusive 
mails they read under partners coporate name, accusations of 
ignorance by ... documented ignorant, rumours, etc.

I agree that one of the moderator actively engaged in that process. 
But these are the risks of opposing big interests. When it went too 
far, I appealed to the AD. The problem was corrected in minutes. The 
AD decided to pursue the appeal and ruled in a good way for the 
stability of the WG. It is true that from then on, insults against me 
did not result anymore in banning or warning or insulting me.

We all are grown boys. I am in that kind of business for nearly 30 
years. I saw worse :-) (but usually more competent). I invited 
without problem all my opponents to have a drink in Paris (but none 
came to the IETF meeting, or told me). It would have been nice.

>As for the accusation that proponents of an earlier individual
>submission engaged in ad-hominem attacks that went without intervention
>by the WG co-chairs, resulting in the limitation of participation in the
>WG by other individuals, in the absence of specific evidence,

Please refer yourself to the mailing list. However, this is not a 
Last Call of the WG management, but a Last Call of the Document. The 
reasons why the document is incomplete should not be discussed so 
much, just what is missing or to correct.

But it is true that several have been rebuked by the attitude of the 
authors. I would say that this was evaluated very early. And that the 
debate is better served when people overcome this. One judges a tree 
to its fruits. The deliverable is not perfect: this is what matters today.

>  this
>appears itself to be no more than an ad-hominem attack on those
>individuals and on the WG co-chairs. To my knowledge, there was only one
>individual in relation to whom other members of the WG acted in any way
>that might discourage or hinder his participation,

Two disclosed. Two implied. This is mostly because I accepted to 
represent others. But what would have been the use of making the WG a 
battle field? This is what the author wanted so the "best" would 
"win". This is not my vision of the IETF.

>and such actions
>arose only in response to repeated provocation from that individual

archives are here.

> > Specification of "language" tag syntax which conflates other content
> > characteristics prior to open and professional discussion of
>negotiation
> > issues and alternative approaches would be a premature lock-in of a
>design
> > choice.  As the document under discussion specifies a conflation of
>such
> > characteristics without open discussion
>
>It is asserted that there has been no open discussion of the matter of
>conflation. This is untrue. It is asserted that there has been no open
>discussion of alternatives; the only concrete alternative presented for
>discussion was to have separate language and script tags, which
>alternative was considered and rejected due to problems that arise in
>content negotiation. The drafts submitted for review are in accordance
>with the charter, and I believe I can say that in the opinion of WG
>members matters of conflation and of negotiation issues were taken into
>consideration, and were discussed in an open and professional manner.

total disagreement on the outcome so far. But I hope we can overcome 
that with the help of the IETF/IESG.

A lot of things have already changed in what some say ....
jfc


_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf