Re: Last Call: 'Tags for Identifying Languages' to BCP

"JFC (Jefsey) Morfin" <jefsey@jefsey.com> Mon, 29 August 2005 00:33 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E9Xax-0003j7-16; Sun, 28 Aug 2005 20:33:51 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E9Xas-0003iM-Lj; Sun, 28 Aug 2005 20:33:48 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA19636; Sun, 28 Aug 2005 20:33:45 -0400 (EDT)
Received: from montage.altserver.com ([63.247.74.122]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1E9Xc7-0003Lo-71; Sun, 28 Aug 2005 20:35:03 -0400
Received: from ver78-2-82-241-91-24.fbx.proxad.net ([82.241.91.24] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1E9Xaj-0003G7-7g; Sun, 28 Aug 2005 17:33:39 -0700
Message-Id: <6.2.3.4.2.20050829000620.0595d0b0@mail.jefsey.com>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4
Date: Mon, 29 Aug 2005 02:33:30 +0200
To: ietf@ietf.org, ietf@ietf.org
From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
In-Reply-To: <200508281015.01096@mail.blilly.com>
References: <4amb37$ak3l4l@mx02.mrf.mail.rcn.net> <200508281015.01096@mail.blilly.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - montage.altserver.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jefsey.com
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 32029c790f79bd4a84a26bd2915c54b9
Cc: iesg@ietf.org
Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Sender: ietf-bounces@ietf.org
Errors-To: ietf-bounces@ietf.org

Dear Bruce,
I will try to quickly comment/respond/suggest on some of your well made points.

At 16:15 28/08/2005, Bruce Lilly wrote:
> >  Date: 2005-08-25 20:55
> >  From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
> > the privacy problem is the "what you read, who you are" intelligence
> > leak.
>
>That is to some extent true of any negotiation mechanism and negotiated
>value.

True. The problem are:
- the unecessary accumulation of orthogonal information
- the easily identified characteristic format: an enormous difference 
between "xx-xx-xxx-xx" (Draft) and "xxxx" (ISO 639-6)
- the lack of alternative (are we sure there are no other 
architectural way to address the same need without information leak)
- the lack of encryption
- the "spam" aspect: I am imposed to receive the langtag.

> > Today langtags are not yet much used (say the W3C people in the
> > WG-ltru) when compared with  what they should in XML, HTML, etc.
>
>XML, HTML, etc. are not IETF protocols and should not be the main
>consideration in IETF work on IETF documents,

They are specifically quoted by the Charter. Also is CLDR a private 
proposition to unify "locale" file which has interest but also competition.

>especially as language tags
>are heavily used by IETF protocols, notably MIME (RFCs 2045, 2047, 2231,
>3282) and widely-deployed core IETF application protocols which use MIME
>(e.g. the Internet Message Format and its applications (email, news, voice
>messaging, EDI, etc.) and HTTP and applications using HTTP as a substrate.

RFC 2231 is among the reference quoted. I more interested in R&D. My 
concern is that OPES have been disregarded.

> > This
> > is all what this proposition is about. This proposition is to give
> > _one_shot_ in a _standardised_ way the language, the script and the
> > country.
>
>This was discussed during Last Call of the previous non-IETF (individual
>submission) attempt.  IIRC David Singer brought up several examples of
>other pieces of information (e.g. legal/copyright variations) that could
>also be negotiated and which might affect the presentation of content (or
>choice among alternative content).  Lumping all of these separate items into
>one tag is a poor design as it impedes negotiation and tends toward lengthy
>tags which are incompatible with fixed-length mechanisms such as MIME
>encoded-words.  While there is some mention of this issue in the document
>under discussion, its treatment and resolving the underlying issue in a
>manner that would minimize the problems are lacking.

The work we carried on language in a common reference center (where 
are stored the common parameter of a relation) shown us that must be 
included in negociation two classes of additional information. The 
parameters in the community (we call referent: i.e. dictionary, etc) 
and the context of the exchange (style, personal meanings, 
circumstances, etc.). These elements are necessary for OPES call-out 
servers supervising a relation. These elements are by default used by 
... Word (language, script, country, dictionary, style).

The Draft proposes a system which permits to evaluate the locale the 
computer should support for end to end interoperability purposes. It 
does not necessarily permit to establish, maintain and serve a brain 
to brain interintellibility.

>Let's separate three issues:
>1. privacy
>2. tagging
>3. negotiation
>
>The privacy issue exists whenever any information is conveyed; the user
>needs to balance privacy concerns with facilitation of communication.
>Mechanisms such as TLS can be used to limit the visibility of the information
>to the end points of communication; ultimately it boils down to a matter of
>trust in the end-point partner in the communication exchange.  I believe
>that the issue is dealt with adequately in the security considerations
>section of the document under discussion (some mention of transport-level
>protection of privacy would be welcome).

Not really: see above. The concept is an help to privacy violation:
- more secure alternatives should be investigated and proposed
- the danger is not worth the result, necessary information is missing.

>Tagging identifies characteristics of a particular piece of content.  For
>that purpose alone, it makes little difference (other than regarding the
>aforementioned compatibility issues with existing IETF mechanisms) whether
>the characteristics are lumped or separate.  There are existing IETF
>mechanisms which permit handling of either lumped or individual 
>characteristics
>(e.g. the extensible header field mechanism of RFC 2045 and the 
>"feature/filter"
>mechanism of RFC 2533/2738/2912).  Tagging per se identifies characteristics
>of content.  While that may be used to infer something about the content
>provider, such inferences may be unreliable, particularly for providers that
>support a wide variety of characteristics for the content in question.

This confusion will be an increasing problem. More and more the 
"architext" we use (the data from which we infer the text we read) 
become intelligent and multilingual. I currently use a site 
multilingual generator. This means that it uses multilingual texts to 
generate unilingual version of a web site. It uses a default langtag 
scheme (:xxx) to indicate the language of the lingual parts.

>Negotiation of characteristics is where several issues arise.  One such
>issue, as discussed here in December 2004/January 2005 relates to an
>algorithm for matching content characteristics (e.g. between a particular
>piece of content and a specified range of acceptance (as in an RFC 3282
>Accept-Language field).  RFC 3066 skirted that issue as it stopped short of
>specification of an algorithm, and as it specified a mere two particular
>characteristics (language per se, and country) which could be combined in
>a tag.  That was not true of the individual submission, which combined at
>least 5 characteristics and specified an algorithm.  As a result of issues
>with that approach, the LTRU WG was established with a charter to produce a
>BCP (for registration procedures) and a separate Standards Track document
>for topics such as algorithms which are unsuitable for BCP.  A related issue
>is the interaction of the established negotiation mechanism (viz. the RFC
>3282 Accept-Language field) and potential use of the other (feature/filter)
>mechanism for negotiation.  The Accept-Language field provides for
>specification of language ranges and for associating a preference value
>with specific languages (as defined in RFC 3066) or ranges.  The proposed
>mechanism in the individual submission of late last year (essentially
>unchanged in the LTRU product (see discussion below)) does not address the
>language range issue, and that issue is greatly complicated by conflating
>separate characteristics into a single tag.  Addressing the language range
>issue is not a WG work item and, unfortunately, the algorithm issue is
>scheduled to be a later work item than the registry issue.

The language negociation issue is independent from any language 
identfier format. But obviously langtag formats may or not better 
serve language negociation.

>Added to that is the fact that the specification of the tag format 
>is mixed with
>registration procedures.  Negotiation of separate characteristics is much
>simpler than that of a combined conflation of characteristics; each
>characteristic can be assigned separate preference values, and irrelevant
>characteristics (e.g. script w.r.t. spoken language) can be easily ignored.

At this stage many negociation elements are missing. The elements 
related to the referent and to the context are missing. For example a 
traveler will accept more easily a foreign language when it comes to 
the location he tours (context). And a professional when it comes to 
a technical discussion (referent). All the more than terminology OPES 
services or on the fly traduction assistance can be provided

>As negotiation and related issues represent a critical technical issue for
>the design of language tags (viz. keeping separate characteristics out of
>*language* tags), it is essential that such negotiation issues be considered
>carefully before specifying the format of tags.  Unfortunately, that has not
>been done, and considering the published WG milestones it appears that that
>issue has not been taken into consideration.  It should be pointed out that
>such issues have been raised, both in the discussion during Last Call of the
>individual submission and as a result of subsequent work.  However, it
>appears that the WG has not considered the issues, with the effect that the
>WG product lacks the "particular care" expected of BCP documents (RFC 2026).

It is to note that ISO 639-4 work is about discussing guidelines in 
that area. This work is under way and was not considered.

>Note that it is not the registration procedural issues that are typical of
>BCP documents that are problematic; rather it is the conflation of separate
>characteristics into a single tag syntax, specified in the same document,
>which raises problems related to content negotiation.
>
>Part of the problem is the scheduling of WG work items as noted above
>(viz. negotiation issues are critical to design of tag syntax, and should not
>have been deferred until after syntax specification).  Another large part of
>the problem is WG management; in addition to the issues raised by John
>Klensin the last time that LTRU participation was discussed on the IETF
>discussion list -- and with which I wholeheartedly agree -- it appears that
>management of WG participant conduct has been rather lax; proponents of the
>individual submission effort who are participating in the WG tend to resort
>to ad-hominem attacks when a problem is identified or when an alternative
>approach is raised, with no visible intervention by the WG co-chairs.  That
>has also (i.e. in addition to the factors which John identified) had the
>effect of limiting WG participation by individuals.

I will not object that remark. The advantage was that proposing an 
alternative approach resulted in an improvement of the ABNF to 
impeach it. The result is a relatively clean default ABNF which now 
permits to avoid confusion with specific solutions introduced by 
reserved singleton. This permits to support:
- my Draft as a default proposition
- to specify easily other formats and conceptions (such as based upon 
ISO 639-6, or ISO 11179 conformant, etc.) without risking conflicts.

>Specification of "language" tag syntax which conflates other content
>characteristics prior to open and professional discussion of negotiation
>issues and alternative approaches would be a premature lock-in of a design
>choice.  As the document under discussion specifies a conflation of such
>characteristics without open discussion -- indeed hampered by unchecked
>unprofessional conduct -- it should not be approved as BCP in its current
>form.  Separation of syntax specification to a separate document,

Yes!!!

>to be specified after due consideration of negotiation issues, leaving purely
>procedural issues of registration,

Yes!!! supporting multimodal competences (not only script, but also 
signs, voice, icons, moods, style, etc.)

>  would be one approach to enable making
>a decision on BCP registration procedures independently of an in advance of
>a concrete specification of negotiation issues and tag syntax.  However,
>as it stands, the document cannot be evaluated for soundness of the tag
>syntax design in the absence of a specification that addresses negotiation
>issues (in a backwards-compatible manner with the existing negotiation
>mechanisms (viz. MIME Content- and Accept- fields and feature/filter
>negotiation).
>
>Therefore, at minimum, I recommend that the IESG defer a decision on the
>subject document until such time as the full impact of the early design
>choice to conflate multiple characteristics into a single tag can be fully
>evaluated w.r.t. proposed matching algorithms and impact on existing
>IETF-approved negotiation mechanisms.

At that time we should have running services. ISO 639-6 authors just 
announced that sample table will be available in Novembre. And ISO 
639-3 author expects it to be published by the end of the year. The 
we can start experimentation. Locking the multilingual internet core 
system into a final ABNF seems premature.

>  Revision to move the syntax
>specification to a separate document, as mentioned above, would permit
>evaluation of the registration procedures per se independently of such
>concerns, and would be one way to move forward on those registration
>procedures quickly (i.e. independently of analysis of the syntax design)
>if that is deemed desirable.
>
>Aside form that, the IESG (via the cognizant ADs) should address the issues
>of WG charter work items and milestones as they relate to consideration of
>negotiation issues prior to locking down a tag syntax specification, should
>emphasize the importance of backwards compatibility with established,
>approved, and widely deployed IETF protocols and mechanisms,

and documented efforts such as OPES, document the way langtags will 
be used and their applications documented.
jfc


_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf