[rfc-i] For v3: language tagging, but only where useful
duerst at it.aoyama.ac.jp ( "Martin J. Dürst" ) Tue, 14 January 2014 09:43 UTC
From: "duerst at it.aoyama.ac.jp"
Date: Tue, 14 Jan 2014 18:43:40 +0900
Subject: [rfc-i] For v3: language tagging, but only where useful
In-Reply-To: <52D40EEF.30903@gmx.de>
References: <589696F8-FB69-4126-98BC-32F2A083504E@vpnc.org> <CAK3OfOjZXh5VJsDWyTc-ov2PN_jT=gUYeWmk84GKFNRTx6oh9w@mail.gmail.com> <52D40E92.5010402@gmx.de> <52D40EEF.30903@gmx.de>
Message-ID: <52D506CC.5060909@it.aoyama.ac.jp>
On 2014/01/14 1:06, Julian Reschke wrote: > On 2014-01-13 17:04, Julian Reschke wrote: >> On 2014-01-13 16:53, Nico Williams wrote: >>> Unicode does have language tag codepoints. Often their use is not >>> appropriate, but here I think they would be (especially for names or >>> addresses involving multiple languages, which is not something I'd >>> expect frequently, but also not something I'd want to preclude ut of >>> hand). >> >> We already have xml:lang; we just need to allow it in more places. Isn't >> that sufficient? >> >> Best regards, Julian > > Also, > <http://en.wikipedia.org/wiki/Unicode_control_characters#Language_tags> > says: > > "The tag characters have become deprecated in Unicode 5.1 (2008)." Yes. And there's even an RFC for that: http://tools.ietf.org/html/rfc6082 To give a bit more history, these now deprecated tag characters were proposed by the Unicode Consortium as an alternative to MLSF, a proposal related to ACAP which tried to squeeze language information into byte sequences not used in UTF-8 (see http://tools.ietf.org/html/draft-ietf-acap-mlsf-01). It was quickly realized that this was a very bad idea (see also http://tools.ietf.org/html/draft-ietf-acap-langtag-00). The language tag characters were on purpose exiled into plane 14, where each character would take 4 bytes, and a full language tag could easily take 20 or more bytes, as a clear hint saying "we don't really recommend these, but we prefer these over even worse stuff (such as ACAP MLSF)". If the Unicode consortium really had thought that this was something to be used widely, they would have worked out a scheme needing less bytes. As time went by, it turned out that ACAP itself didn't go very far (I still think this is a pity; I'd really have liked to use e.g. portable keyboard layouts), and no other IETF protocol that we knew about was picking up on these tags, so the Unicode Consortium decided to deprecate them, and we issued RFC 6082 to bury and obsolete RFC 2482. That PDF (at least in some versions) is stuck with them isn't our problem. Regards, Martin.
- [rfc-i] For v3: language tagging, but only where … Paul Hoffman
- [rfc-i] For v3: language tagging, but only where … Nico Williams
- [rfc-i] For v3: language tagging, but only where … Julian Reschke
- [rfc-i] For v3: language tagging, but only where … Julian Reschke
- [rfc-i] For v3: language tagging, but only where … Nico Williams
- [rfc-i] For v3: language tagging, but only where … Leonard Rosenthol
- [rfc-i] For v3: language tagging, but only where … Nico Williams
- [rfc-i] For v3: language tagging, but only where … Julian Reschke
- [rfc-i] For v3: language tagging, but only where … "Martin J. Dürst"