Re: Last Call: 'Tags for Identifying Languages' to BCP

Bruce Lilly <blilly@erols.com> Wed, 31 August 2005 02:00 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAHtr-0000fU-32; Tue, 30 Aug 2005 22:00:27 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAHtn-0000e8-EC for ietf@megatron.ietf.org; Tue, 30 Aug 2005 22:00:24 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA23843 for <ietf@ietf.org>; Tue, 30 Aug 2005 22:00:21 -0400 (EDT)
Received: from ns1a.townisp.com ([216.195.0.132] helo=ns1.townisp.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1EAHvS-0004jd-5o for ietf@ietf.org; Tue, 30 Aug 2005 22:02:06 -0400
Received: from mail.blilly.com (dhcp-0-8-a1-c-fa-f7.cpe.townisp.com [216.49.158.220]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "marty.blilly.com", Issuer "Bruce Lilly" (not verified)) by ns1.townisp.com (Postfix) with ESMTP id 06E3229947 for <ietf@ietf.org>; Tue, 30 Aug 2005 22:00:09 -0400 (EDT)
Received: from marty.blilly.com (marty.blilly.com [192.168.99.98] (may be forged)) by mail.blilly.com with ESMTP id j7V2085u016796(8.13.1/8.13.1/mail.blilly.com /etc/sendmail.mc.mail 1.26 2005/06/24 20:47:59) (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) ; Tue, 30 Aug 2005 22:00:09 -0400
Received: from marty.blilly.com (localhost [127.0.0.1]) (authenticated (0 bits)) by marty.blilly.com with ESMTP id j7V202n4016795(8.13.1/8.13.1/blilly.com submit.mc 1.3 2005/04/08 12:29:31) (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) ; Tue, 30 Aug 2005 22:00:08 -0400
From: Bruce Lilly <blilly@erols.com>
Organization: Bruce Lilly
To: ietf@ietf.org
Date: Tue, 30 Aug 2005 21:59:59 -0400
User-Agent: KMail/1.8.2
References: <4amb37$aq7och@mx02.mrf.mail.rcn.net>
In-Reply-To: <4amb37$aq7och@mx02.mrf.mail.rcn.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200508302159.59785@mail.blilly.com>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 31247fb3be228bb596db9127becad0bc
Content-Transfer-Encoding: quoted-printable
Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ietf@ietf.org
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Sender: ietf-bounces@ietf.org
Errors-To: ietf-bounces@ietf.org

>  Date: 2005-08-28 16:25
>  From: Frank Ellermann <nobody@xyzzy.claranet.de>

> That's a last call, if you have better ideas than those in the
> draft speak up.  Your Content-Script idea is good, but won't
> help e.g. in encoded words (2047+2231).

Encoded-words have several characteristics, one of which is limited
length (in octets).  That has two implications w.r.t. script:
1. specifying script explicitly is unnecessary; it can be determined
   from the charset (always specified in an encoded-word) and the
   specific octets of the encoded text (ISO-8859-1 is latin script,
   KOI8 is Cyrillic, etc.).
2. an encoded-word has limited space available.  of a maximum of 76
   octets in an encoded-word specifying language, there are 8 for
   overhead, at least one (currently exactly one) for specification
   of encoding method, a charset specification (registered charsets
   have names up to 45 octets in length), the language tag, and some
   encoded text.  The encoded text must be at least one octet for Q
   encoding and a simple (unshifted) charset; for B encoding (and an
   unshifted charset) it has to be a multiple of 4 octets, and a typical
   charset with shift sequences will require on the order of 6 octets
   minimum (for Q encoding; 8-12 minimum for B encoding). Specifying
   (unnecessarily; see above) script reduces the available space for
   actual (encoded) text; possibly to the point of impossibility in
   pathological cases.

Specification of script is only a performance enhancement for long texts
(not the case for encoded-words) where a multi-script charset is in use.

While the Content-Script (or similar feature/filter mechanism) would not
be applicable to encoded-words, specification of script is unnecessary
for encoded-words (and undesirable due to impact on the available text
space).

Specification of script is only possible where a given text uses a single
script, and that limitation applies to any of the methods of indication
mentioned above, including the addition to language tags proposed by the
draft under discussion.

Script is a characteristic of written text; it is not applicable to (e.g.)
audio media types.  It really should be a text media type parameter (or
feature).

> This is a ready-for-Bruce's-review draft as far as I can judge
> this, but for obvious reasons only you can really judge it. ;-)

As I mentioned in an earlier message, without a concrete specification
for negotiation, it is not possible to fully assess the proposed syntax
changes.
 
> > Addressing the language range issue is not a WG work item
> > and, unfortunately, the algorithm issue is scheduled to be a
> > later work item than the registry issue.
> 
> Only my personal view of course, but the matching draft offers
> a syntactical form for ranges,

There is no such draft in Last Call at this time, as far as I know.

> if ISO 3166-1 pulls another CS 3066bis will handle it
> better than 3066 (no potential worldwide retagging confusion).

I am unaware of any "worldwide retagging confusion" w.r.t. language
tags and "CS".
 
> > it appears that management of WG participant conduct has been
> > rather lax
> 
> IBTD, the WG Chairs and the responsible AD did a very good job.

As an affected party, I disagree.

> > Revision to move the syntax specification to a separate
> > document, as mentioned above, would permit evaluation of the
> > registration procedures per se
> 
> You can also read chapter 3 per se, the mentioned 14 pages
> plus 3.1 as introduction (5 pages, format of the registry).

But a single section isn't being Last Called; it is the entire document,
and lacking specification of negotiation mechanisms it is not possible
to fully assess the document as it stands.

_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf