[rfc-i] For v3: language tagging, but only where useful

duerst at it.aoyama.ac.jp ( "Martin J. Dürst" ) Tue, 14 January 2014 09:43 UTC

From: "duerst at it.aoyama.ac.jp"
Date: Tue, 14 Jan 2014 18:43:40 +0900
Subject: [rfc-i] For v3: language tagging, but only where useful
In-Reply-To: <52D40EEF.30903@gmx.de>
References: <589696F8-FB69-4126-98BC-32F2A083504E@vpnc.org> <CAK3OfOjZXh5VJsDWyTc-ov2PN_jT=gUYeWmk84GKFNRTx6oh9w@mail.gmail.com> <52D40E92.5010402@gmx.de> <52D40EEF.30903@gmx.de>
Message-ID: <52D506CC.5060909@it.aoyama.ac.jp>

On 2014/01/14 1:06, Julian Reschke wrote:
> On 2014-01-13 17:04, Julian Reschke wrote:
>> On 2014-01-13 16:53, Nico Williams wrote:

>>> Unicode does have language tag codepoints. Often their use is not
>>> appropriate, but here I think they would be (especially for names or
>>> addresses involving multiple languages, which is not something I'd
>>> expect frequently, but also not something I'd want to preclude ut of
>>> hand).
>>
>> We already have xml:lang; we just need to allow it in more places. Isn't
>> that sufficient?
>>
>> Best regards, Julian
>
> Also,
> <http://en.wikipedia.org/wiki/Unicode_control_characters#Language_tags>
> says:
>
> "The tag characters have become deprecated in Unicode 5.1 (2008)."

Yes. And there's even an RFC for that:
     http://tools.ietf.org/html/rfc6082

To give a bit more history, these now deprecated tag characters were 
proposed by the Unicode Consortium as an alternative to MLSF, a proposal 
related to ACAP which tried to squeeze language information into byte 
sequences not used in UTF-8
(see http://tools.ietf.org/html/draft-ietf-acap-mlsf-01).

It was quickly realized that this was a very bad idea (see also 
http://tools.ietf.org/html/draft-ietf-acap-langtag-00).

The language tag characters were on purpose exiled into plane 14, where 
each character would take 4 bytes, and a full language tag could easily 
take 20 or more bytes, as a clear hint saying "we don't really recommend 
these, but we prefer these over even worse stuff (such as ACAP MLSF)". 
If the Unicode consortium really had thought that this was something to 
be used widely, they would have worked out a scheme needing less bytes.

As time went by, it turned out that ACAP itself didn't go very far (I 
still think this is a pity; I'd really have liked to use e.g. portable 
keyboard layouts), and no other IETF protocol that we knew about was 
picking up on these tags, so the Unicode Consortium decided to deprecate 
them, and we issued RFC 6082 to bury and obsolete RFC 2482.

That PDF (at least in some versions) is stuck with them isn't our problem.

Regards,    Martin.